All Forums Database
prateek.saxena 12 posts Joined 08/13
21 Jan 2014
Which is efficient?

I have written a query as follows :
select
a.x1,
a.x2,
b.x3
from
(select x1,x2,x4 from table1 where 'some conditions') as a
inner join
(select x3,x4 from table2) as b
on
a.x3=b.x3
 
I have another version as :
select
a.x1,
a.x2,
b.x3
from
(select x1,x2,x4 from table1 where 'some conditions') as a
inner join
 table2 as b
on
a.x3=b.x3
 
table1 has 12 columns and table2 has 20 columns.
Can someone tell me which version is better and why?

Regards, Prateek
Santanu84 122 posts Joined 04/13
21 Jan 2014

Hi Prateek
Looking at your query, my first response is, this query will give error. You are joining table b with table a on x3 column but you are not selecting x3 column in your sub-query. Instead you have slected x4.
Assuming it is a type-o, the joining will not depend on the number of columns. It is optimizer's decision.
If x3 is not the indexing column for both tables, in your subquery for table1 you have given some conditions. If this makes selection of very few records compared to table2, then optimizer can decide to duplicate "a" into all-AMPs and go for product join for both the queries.
You can always run Explain to see which one is performing better.
Other experts in this forum may have extra points to add.
Thanking You
Santanu

prateek.saxena 12 posts Joined 08/13
21 Jan 2014

Thanks Sanatanu. Yes it was a typo, i intended to join these tables on x4 only. :)

Regards,
Prateek

sgarlapa 88 posts Joined 03/13
21 Jan 2014

Hi Prateek,
In my opinion the first option is good. As you have restricted the number required columns from table b. The spool and the other resources consumed will be less.
Can you check the spool and number of steps required in both the ways and let us know.
Regards,
Sri

sgarlapa 88 posts Joined 03/13
21 Jan 2014

I mean in the explain plan - the spool and number of steps required in both the ways.

Raja_KT 1246 posts Joined 07/09
21 Jan 2014

My statement is subject to correction :).
 

You can check the explain plan for both.I feel that for small amount of data , they are almost the same. ResTree to ResTree' conversion,in Query Rewrite for the first case, I feel that it does not need to push projections into spooled views, since the columns are specified. For the second case, since the table2 has 20 columns, so Query Rewrite will take time.

 

Again, Query Rewrite may or will go for folding the derived table. So it may or may not take time here . It all boils down to the optimizer.

 

Cheers,
 

Raja K Thaw
My wiki: http://en.wikipedia.org/wiki/User:Kt_raj1
Street Children suffer not by their fault. We can help them if we want.

M.Saeed Khurram 544 posts Joined 09/12
22 Jan 2014

Parteek,
The first version will be better. The reason is to get performance you need to reduce IO as much as you can. While selecting from tables, if you select only columns requied and restrict rows as well, then the IO will be termnedously reduced and less data block will be accessed. 
 
In first query you are selecting only two columns from table2, but in second 20 column will be read. so second one will be more IO intensive.
You can check explain plan to confirm the difference. or you ca paste the explain plan for discussion. 
 

Khurram

dnoeth 4628 posts Joined 11/04
22 Jan 2014

Derived Tables are only materialzed when there's something like DISTINCT/GROUP BY/OLAP. 
Otherwise they are already resolved by the parser. Also any unused columns are removed from the parse tree, simply compare the estimated spool sizes.
When you explain both queries they should be exactly the same as the optimizer should get the same parse tree.

 

Dieter

TD_Raj 50 posts Joined 05/10
22 Jan 2014

If you are just selecting records then it does not matter whether you use Derived table or the table directly.
Teradata PE is enough smart to include only those columns which are required by the query.
You can check the explain plan and run both the queries.
I believe you will see same spool usage too in both cases.
 
Raj

Harpreet Singh 101 posts Joined 10/11
15 Apr 2014

Does it mean that Tables can have any number of columns and this will not impact performance as long as select is retrieving few columns?
select a1,a2,b3 from A join B on A.a1=B.b1
--A has 200 columns and B has 10 columns.  50 mil rows in A and 200 mil in B ---same performance ?
--A has 10 columns and B has 10 columns.  50 mil rows in A and 200 mil in B -----same performace?

Harpreet Singh 101 posts Joined 10/11
15 Apr 2014

and spool will remain same also?  in TD14.

ToddAWalter 316 posts Joined 10/11
15 Apr 2014

Spool will remain the same, execution cost after the initial table reads will be the same. Overall performance of the query will not be the same because the underlying table has to be read in order to extract the columns desired - making the wider tables cost more to read initially. But after the base tables have been read, all the non-participating columns will be eliminated and the rest of the work of the query will be very similar or identical.
Note that if this is a common pattern, Columnar partitioning of the table may be a good choice. Columnar partitioning makes the reading of the small number of columns from a wide table more efficient by only reading the columns required by the query.

You must sign in to leave a comment.