All Forums UDA
iammai85 5 posts Joined 08/07
31 Aug 2007
EXISTS vs. IN

Is EXISTS faster than IN? For example, table B has millions of records.Select A.student_id from Awhere EXISTS (select student_id from B where A.student_id=B.student_id)vs.Select student_id from Awhere A.student_id IN (select B.student_id from B)student_id is NOT NULL

sachinp17 53 posts Joined 11/06
07 Sep 2007

Hi, u r query Select A.student_id from Awhere EXISTS (select student_id from B where A.student_id=B.student_id)is equivalent to Select student_id from Awhere A.student_id IN (select B.student_id from B where B.student_id is not null)also the performance wise first query is better.regds,sachin

iammai85 5 posts Joined 08/07
13 Sep 2007

Thanks Sachin.

member 16 posts Joined 02/08
17 Feb 2008

Hi,Can u pls explain why 'Exists' performance is better than 'in'? Regards,abc

Balamurugan B 81 posts Joined 09/07
18 Feb 2008

Hi,The performance of ‘EXISTS’ and ‘IN’ will be same for queries having fewer members in their expression list. But if the members in the expression list is more, then the performance of ‘EXISTS’ is better than ‘IN’.This is because, while using ‘IN’ if the set of constants consists of 70 or fewer members, the system uses hashing to retrieve the rows. If the set consists of more than 70 members, the system does a full-table scan. Note that full-table scans are much more time consuming than row hashed accesses.Regards,Balamurugan

Regards,
Balamurugan

foxbat 27 posts Joined 06/07
22 Feb 2008

The SQL that uses EXISTS may be processed as a correlated subquery. Experience shows that certain correlated subqueries take longer to process than similar SQL's which use a JOIN to achieve the same result.I'd recommend the IN form of the SQL over the EXISTS form of the SQL.--Foxbat

dnoeth 4628 posts Joined 11/04
22 Feb 2008

In almost any case there will be no difference between an IN-subquery and an EXISTS-correlated subquery, the optimizer will rewrite both to a join. Compared to a JOIN there's also hardly any difference, if the subquery is the unique part of the 1:m relation.If the subquery is the m-part, there's an automatic DISTINCT added to the subquery (in Explain that step will be executed before the join), but you have to add it manually to the JOIN (after the join within Explain).If that subquery column is very non-unique that automatic DISTINCT might be overridden by a GROUP BY to enhance performance.If it's NOT IN vs. NOT EXISTS there might be a huge difference, because NOT IN has to deal with NULLs via three-way-logic, whereas NOT EXISTS simply ignores NULL. If there's any NULLable column (inner or outer) used by the subquery, it is recommended to use NOT EXISTS instead of NOT IN. And NOT EXISTS is always more efficient than an Outer Join solution filtering for NULLs.Dieter

Dieter

rupert160 131 posts Joined 09/10
30 Oct 2011

I also read a good explaination on the following site. Noting this is an Oracle site I believe logically teradata adhears to this method however functionally it will "create joins" like dieter said:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:953229842074

You Asked Tom:

can you give me some example at which situation
IN is better than exist, and vice versa.

and we said...

Well, the two are processed very very differently.

Select * from T1 where x in ( select y from T2 )

is typically processed as:

select *
  from t1, ( select distinct y from t2 ) t2
 where t1.x = t2.y;

The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then joined to
the original table -- typically.

As opposed to

select * from t1 where exists ( select null from t2 where y = x )

That is processed more like:

   for x in ( select * from t1 )
   loop
      if ( exists ( select null from t2 where y = x.x )
      then
         OUTPUT THE RECORD
      end if
   end loop

It always results in a full scan of T1 whereas the first query can make use of an index
on T1(x).

So, when is where exists appropriate and in appropriate?

Lets say the result of the subquery
    ( select y from T2 )
is "huge" and takes a long time.  But the table T1 is relatively small and executing (
select null from t2 where y = x.x ) is very very fast (nice index on t2(y)).  Then the
exists will be faster as the time to full scan T1 and do the index probe into T2 could be
less then the time to simply full scan T2 to build the subquery we need to distinct on.

Lets say the result of the subquery is small -- then IN is typicaly more appropriate.

If both the subquery and the outer table are huge -- either might work as well as the
other -- depends on the indexes and other factors.

Jigar 70 posts Joined 09/11
09 Nov 2011

@dnoeth

Can you please elaborate on

"And NOT EXISTS is always more efficient than an Outer Join solution filtering for NULLs.

"

@Topic

Yes, typically In ,Exists and Join would give similar results. However as dnoeth mentioned in case join is creating dups .Eilimination in join has to be done exclusively ;In which de dup logic will be applicable for more columns and possibly more rows making it more expensive.

(Unless you write a sub query and use in join )

 

 

 

You must sign in to leave a comment.