All Forums Database
tmatthews 1 post Joined 02/12
13 Feb 2012
Joining large data on ranged indexies


Hey Guys,


I'm running into speed issue with one of my joins on large datasets.



  table_a_id INTEGER,


  lookup_idx INTEGER

) PRIMARY INDEX(table_a_id)



  table_b_id INTEGER,

  start_lookup_idx INTEGER,

  end_lookup_idx INTEGER,

  some_data VARCHAR(50),


) PRIMARY INDEX(table_b_id)


I am trying to join a large table (10,000,000+ rows) with a lookup table in the most efficient way possible with a BIGINT (lookup_idx).  However the lookup table isn’t indexed by lookup_idx, but by two columns that give a range (start_lookup_idx, end_lookup_idx).  There are no overlaps for any of the ranges.





from table_A a

  left join table_B b on a.lookup_idx >= b.start_lookup_idx 

    and a.lookup_idx <= b.end_lookup_idx


This query runs far too slow.  I know that algorithmically, teradata shouldn’t have to do a full table scan for every lookup if it's using ordered indexies, but that’s what it’s doing.  Does anyone know how to make this query work correctly?


Currently I have a work around where I have built an intermediate table which denormalizes all possible lookup_idx values and has a foreign key into table_B.  This query runs fast enough, but the intermediate table is much to large for me to hold onto indefinitely.


Any thoughts?


ulrich 816 posts Joined 09/09
13 Feb 2012

Can you share the explain of your query?

Which stats are collected?

just for my understanding...


all  lookup_idx between start_lookup_idx and end_lookup_idx will have the same lookup value. Where and how is lookup_idx set in table_a?

If you all lookup_idx between all  between start_lookup_idx and end_lookup_idx  would be set to  start_lookup_idx (or all to  end_lookup_idx ) you would be able to do an equal join with the same result - and maybe benefit from stats...

feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud

VBurmist 96 posts Joined 12/09
14 Feb 2012

This is most probably a product join.    Is there a way to introduce "table_b_id" field into TableA, from the LDM perspective?  

It might be tricky to update the tableA during ETL when the lookup tableB is updated, but it might be better than lots of queries with product joins.

p.s. 10mln rows is not a large table for Teradata in most cases. ;)

ulrich 816 posts Joined 09/09
14 Feb 2012

By the way - how many rows contain your lookup table?

feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud

You must sign in to leave a comment.