All Forums Database
sp230071 29 posts Joined 10/06
25 Apr 2007
Required help

I have a table of about 1,79,94,12,009 rows and the size of 133GB .Now i had created another table want to store the aggregateInsert Into dbatemp.O_ACNT_BAL_SPSELECTOAN_A , OAN_ODT_J , OAN_OBT_J ,OAN_OAA_J ,OAN_OCU_J , OAN_OPD_J , OAN_OSS_J , OAN_OAD_J , OAN_DOMCL_OCN_J , OAN_OMS_J , OAN_CRETN_ODT_J , OAN_FILE_OSX_N , Min(OAN_PARTN_KEY_1_X ) OVER (PARTITION BY OAN_PARTN_KEY_1_X ORDER BY OAN_PARTN_KEY_1_X),-PPI COLUMNMax(OAN_PARTN_KEY_1_X)OVER(PARTITION BY OAN_PARTN_KEY_1_X ORDER BY OAN_PARTN_KEY_1_X) -- PPI COLUMNFROM dbatemp.O_ACNT_BAL_test_spGroup By OAN_A,OAN_ODT_J , OAN_OBT_J ,OAN_OAA_J ,OAN_OCU_J , OAN_OPD_J , OAN_OSS_J , OAN_OAD_J , OAN_DOMCL_OCN_J , OAN_OMS_J , OAN_CRETN_ODT_J , OAN_FILE_OSX_N,OAN_PARTN_KEY_1_X ====================================================== =======Attached explain 1) First, we lock a distinct dbatemp."pseudo table" for read on a RowHash to prevent global deadlock for dbatemp.O_ACNT_BAL_test_sp. 2) Next, we lock dbatemp.O_ACNT_BAL_test_sp for read. 3) We do an all-AMPs SUM step to aggregate from dbatemp.O_ACNT_BAL_test_sp by way of an all-rows scan with no residual conditions, and the grouping identifier in field 1. Aggregate Intermediate Results are computed globally, then placed in Spool 4. The input table will not be cached in memory, but it is eligible for synchronized scanning. The aggregate spool file will not be cached in memory. The size of Spool 4 is estimated with no confidence to be 1,349,044,830 rows. The estimated time for this step is 1 hour and 4 minutes. 4) We do an all-AMPs STAT FUNCTION step from Spool 4 (Last Use) by way of an all-rows scan into Spool 8 (Last Use), which is assumed to be redistributed by value to all AMPs. The result rows are put into Spool 1 (group_amps), which is built locally on the AMPs. 5) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. This INSERT is taking very huge time . More than 2 hours but not getting executed . I am using PPI column . Can any body has faced this situatation in the past.? Please help in optimize the same as in future we need to test the same with higher table size with aggregation.

aravind-3019 39 posts Joined 08/06
25 Apr 2007

Hi Sushant,The all amp stat functions MIN() over (Partition by) and Max() over (Partition by) on billions of rows is the reason for this query to run for long time.Avoid this function.Simple Min and Max functions should serve your purpose.And if you have partitioned this table,you should make use of that PPI in your queries. Here the full table scan(FTS) is happening. FTS on this much big table is not recommended. Also collect stats on the non aggregate columns.Thanks,Aravind Hegde

sp230071 29 posts Joined 10/06
25 Apr 2007

Thanks ArvindBut that's also not worked . Earlier the query has been written without the grouping function. Query Before using the Analystical FunctionsSELECTOAN_A , OAN_ODT_J , OAN_OBT_J ,OAN_OAA_J ,OAN_OCU_J , OAN_OPD_J , OAN_OSS_J , OAN_OAD_J , OAN_DOMCL_OCN_J , OAN_OMS_J , OAN_CRETN_ODT_J , OAN_FILE_OSX_N , Min(OAN_PARTN_KEY_1_X ) ,Max(OAN_PARTN_KEY_1_X)FROM dbatemp.O_ACNT_BAL_test_spGroup By 1,2,3,4,5,6,7,8,9,10,11,12 1) First, we lock a distinct dbatemp."pseudo table" for read on a RowHash to prevent global deadlock for dbatemp.O_ACNT_BAL_test_sp. 2) Next, we lock dbatemp.O_ACNT_BAL_test_sp for read. 3) We do an all-AMPs SUM step to aggregate from dbatemp.O_ACNT_BAL_test_sp by way of an all-rows scan with no residual conditions, and the grouping identifier in field 1. Aggregate Intermediate Results are computed globally, then placed in Spool 3. The input table will not be cached in memory, but it is eligible for synchronized scanning. The aggregate spool file will not be cached in memory. The size of Spool 3 is estimated with no confidence to be 1,349,559,007 rows. The estimated time for this step is 1 hour and 3 minutes. 4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no confidence to be 1,349,559,007 rows. The estimated time for this step is 3 minutes and 9 seconds. 5) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 1 hour and 6 minutes. Still it was takeing the time This has earlier taken 24hrs to insert the rows.One more thing we are using the compression for the columns of the tables.

aravind-3019 39 posts Joined 08/06
25 Apr 2007

Hi,I don't think you can get this query to be finished in less than 30 minutes.Explain Plan itself telling the time of 1hr 3 min for sum step.We also have this kind of scenario in our project,so we always make use of range partitioning on big tables.(More than 10 million records).Also estimate the time for the select query in your case and then go for Insert.Compression doesn't have any bad effect in retrieving the rows. But Insert may become bit slower because of compression.Collect statistics on the table and see that you can retrive the rows of the table with high confidence or atleast low confidence.Explain plan is telling 'no confidence'Thanks,Aravind Hegde

sp230071 29 posts Joined 10/06
26 Apr 2007

After using the simple MIN and MAX WITHOUT OVER(PARTITION BY ..) Clause still it is taking long long time.

joedsilva 505 posts Joined 07/05
26 Apr 2007

Since your original post mentioned about insert taking long time, I think the places to look for will be Transient Journal (ie the destination table is not empty), a skewed PI on dest table, an NUPI on dest table with too many duplicate values for PI cols.So the solution would be based on the what is the case and other factors like size of destination table etc.commmon solutions for TJ includes creating a new table and then doing a UNION[ALL] of the dest table contents and the SEL query and insert that to the new table.for skewed PI, you need to weigh your options on making some other columns for PI. (big topic)for NUPI collisions, you can try making the table MULTISET (beware of the catches in the form of dup records).So the solution to your issue could be a combination of these ...

sp230071 29 posts Joined 10/06
26 Apr 2007

Arvind My Skew factor for the destination table is very samll 1 percent but still do you think that it might be one of the reason.I took the count for rows per amp. it is almost same. But still i am confused why it is havin an issue. And as you have seent that explain is also showing the 1hr 30min extimated time for the query . Is there any thing related to DBScontrol parameter i need to check for PPI cache threshold which is 500 by default0 214464631 214430062 214466493 214373624 214466915 214466516 214535617 214419038 214439879 2144149210 2143870811 2143966612 2144023513 2143887314 2144446815 2145106916 2142497717 2141079318 2141300219 2142191120 2142053821 2141321522 2140780123 2141761924 2141553625 2141623826 2141289227 2142175228 2141759729 2141165230 2141679931 2141978332 2141454233 2141958234 2141258435 2140317336 2141554437 2141198538 2142046539 2141447640 2141695041 2141961242 2141630443 2140613844 2141991545 2141726146 2141081547 2142020348 2140513749 2141854750 2140945451 2141830052 2141668153 2142214654 2141710955 2142597656 2141253757 2141741458 2141866459 2141549360 2141434861 2141353562 2142365363 2141919664 2142004965 2141908666 2141635867 2141486568 2141259369 2141735070 2141925371 2141265772 2141441373 2141065474 2141525075 2142023076 2141597877 2140574378 2141349079 2141809180 2142104881 2142369682 2142523283 21425345

Barry-1604 176 posts Joined 07/05
26 Apr 2007

Joe mentioned hash collisions. What is the primary index of your target table and if it is not a unique primary index, how unique is it (how many rows do you expect to have per primary index value)?A high number of hash collisions caused by a "not very unique" primary index will make a fairly large number of inserts take a very long time. It looks like you have a lot of columns in your "GROUP BY" clause. Are most (if not all) of these columns in the primary index of your target table?If you're not able to create a pretty unique primary index, changing the table to MULTISET will improve performance. It would allow dups in your table, but if this is the only insert happening to the table, you wouldn't have duplicates because your GROUP BY will eliminate duplicates.

sp230071 29 posts Joined 10/06
26 Apr 2007

Barry I am using the PI (Primary INDEX and not the UPI)But my explain shown me that estimated time for the same is 1hr and 30 sec but when we execute the INSERT it takes longer time and it is not inserting at all.

joedsilva 505 posts Joined 07/05
27 Apr 2007

The time displayed by explain are meager optimizer's best guesses (based on the statistics "insider" information, if there's any). They are not the final authority in deciding how long the SQL is going to run.Besides the explain you posted is for SEL query.And even if you did an explain for INS it won't tell you much about Hash collisions and all that stuff.So there are two things here...How long does your SEL query takes to run (if the INS is not included).What's the time difference in time when the INS is included.That's the kind of info which would help you figure out where's the bottle neck and what you need to do about it.kind of divide and conquer approach.

Barry-1604 176 posts Joined 07/05
27 Apr 2007

The optimizer never (to my knowledge) takes into account the number of hash collisions planned when doing an insert or update. That's simply outside the scope of what it takes into account. So, it won't reflect that in the estimated time.Follow the steps that Joe outlined and that will tell you whether this is a hashing collisions problem. If you've selected primary index is not very unique (say more than 100 rows per PI value), then that is probably your problem. Also, keep in mind that even though the average number of hash collisions is low, you could have a few values that have a very high number of hash collisions. You have to watch out for this as well. Although if the number of these is low enough, you will tend to see skewing when the insert is taking place.By the way, you can monitor how quickly the inserts are happening by locking the table for access and doing a "select count(*)" from it while the insert/select is running. This will also tell you when the query has moved from the "Select" portion to the "Insert" portion. If you have any secondary indices, you'll want to put a false constraint on another column in order to get the correct count. This is because the indices are built at the end when you do an insert/select into an empty table, and a "Select count(*) from table;" without a WHERE clause will simply scan the index to count the rows. Since the index entries haven't been built yet, it will tell you there are 0 rows. By putting a false constraint on the query (something that will qualify all rows) on a non-indexed column, it will force the count to scan the base table rather than the index. So, you'll get an accurate count while it's running.Good luck!

You must sign in to leave a comment.