All Forums Database
marius 21 posts Joined 09/05
08 May 2007
Performance of view containing a Union clause

Hi,I have 2 annual tables each with the same columns and the same unique primary index and stats collected.eg. Replace View VIEWDB.X As Locking $Star_Ship.Star_BL_Item for access Locking $Star_Hist_04.Star_BL_Item_04 for access Select col1 (named col1), col2 (named col2) from $Star_Ship.Star_BL_Item Union Select col1, col2 from $Star_Hist_04.Star_BL_Item_04;A select * from ViewDB.X, or select count(*) from ViewDB.X is extremely slow most likely due to the redistribution, my question is should there be redistribution or am I doing something incorrectly in building the view, see expalin output below? I basically want all the rows from both tables eliminating duplicates. Explanation 1) First, we lock $Star_Hist_04.Star_Bl_Item_04 for access, and we lock $STAR_Ship.STAR_BL_Item for access. 2) Next, we do an all-AMPs RETRIEVE step from $STAR_Ship.STAR_BL_Item by way of an all-rows scan with no residual conditions into Spool 1 (all_amps), which is redistributed by hash code to all AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated with high confidence to be 3,474,353 rows. The estimated time for this step is 56.51 seconds. 3) We do an all-AMPs RETRIEVE step from $STAR_Hist_04.STAR_BL_Item_04 by way of an all-rows scan with no residual conditions into Spool 1 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 1 by the sort key in spool field1 eliminating duplicate rows. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated with high confidence to be 1,745,720 rows. The estimated time for this step is 17.09 seconds. 4) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of an all-rows scan into Spool 2 (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 2 is estimated with high confidence to be 4,778,903 rows. The estimated time for this step is 58.45 seconds. 5) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 2 are sent back to the user as the result of statement 1. The total estimated time is 2 minutes and 12 seconds.

marius 21 posts Joined 09/05
08 May 2007

Well I add the ALL qualifier to the Union clause and performance is significantly improved, I suppose the redistribution was to eliminate the duplicates.

Barry-1604 176 posts Joined 07/05
09 May 2007

Yes, that is the case. If you know that you don't have duplicates between the two queries, it's best to include the ALL.

rluebke 65 posts Joined 11/05
24 May 2007

I am doing something very similar (CURRENT data in one table and HIST data in another). Is there anyway to tell the system that the data is split by a given column's value? The column it is split on is not the primary index. I am looking for something similar to the concept of partitioned tables in MSSQL.R

Barry-1604 176 posts Joined 07/05
25 May 2007

Ron,Do you mean that you are keeping a view that unions the data between your CURRENT table and your HIST table or do you want to combine your CURRENT and HIST table into one table?You can have a partitioned table (PPI) in Teradata whereby you partition on a date or some other field. I can't tell by your question what you are trying to do.Barry

rluebke 65 posts Joined 11/05
25 May 2007

Your first scenario is the correct one. Can you tell me where in the documentation I could find out how to use partitioned tables. Or if possible, show me an example.R

Barry-1604 176 posts Joined 07/05
25 May 2007

This would be a quick example of a PPI if you had a CURR_HIST_IND on your combined table:CREATE TABLE mytable (CURR_HIST_IND CHAR(1) ...other columns)PRIMARY INDEX(?????)PARTITION BY CASE_N ( CURR_HIST_IND = 'C', CURR_HIST_IND = 'H', NO CASE OR UNKNOWN);Then, if you query had a constraint that said "CURR_HIST_IND = 'C'", the optimizer would only look at the currrent partition. Likewise, if your query said "CURR_HIST_IND = 'H'", it would only look at the history partition.I would recommend that you read the Orange Book by Jerry Klindt and Paul Sinclair on PPI.

rluebke 65 posts Joined 11/05
25 May 2007

Thank you. I will read it right away.R

Luckyhansh 30 posts Joined 08/06
31 May 2007

union will eliminate the duplicate rows and redistributionunion all won't,that's the difference

You must sign in to leave a comment.