All Forums UDA
Yuva 5 posts Joined 11/06
10 Nov 2006
Collect Stats n Refresh Stats

What are factors to decide how much time it should take for "collect statistic" and "refresh statistic"?My question is related to time it takes for "collect statistic" and "refresh statistic"?

leo.issac 184 posts Joined 07/06
10 Nov 2006

It is how frequently your data in the tables is added/modified/deleted. If the data changes very frequently, you might need to collect /refresh statistics often to get best performance for your queries.Ofcourse,remember that collect stats is resource intensive !

GogulM 32 posts Joined 08/06
13 Nov 2006

WHEN EVER THERE IS EXTENSIVE INSERTs, DELETEs, UPDATEs, OR ANY OTHER MAINTANENCE TO ANY COLUMNS WHICH IS CONTAINING STATISTICS.WE SHOULD REFRESH THE STATS.THE CHANGE OF DATA SHOULD OF 5 TO 10 %

~Gogul

BBR2 96 posts Joined 12/04
13 Nov 2006

Another parameter to check is the confidence level in the explain plan.Vinay

leo.issac 184 posts Joined 07/06
14 Nov 2006

one more hint. use "diagnostic helpstats" to get recommendations on collect stats on particular columns. This will be displayed at the end of explain text

member 16 posts Joined 02/08
19 Feb 2008

Hi all,Could u pls tell me something more about "diagnostic helpstats" .....Thanks,abc

member 16 posts Joined 02/08
19 Feb 2008

Hi all,Is the same procedure followed to do the collect stats when it is done for the first time and when done for any successive times on the same table (i,e when 5% - 10% of data is being altered)Thanks,abc

joedsilva 505 posts Joined 07/05
20 Feb 2008

diagnostic helpstats is used to understand the optimizer recommendations of what stats it thinks might be useful in making a (probably) better query execution plan.you can do it by typing the following command in sql assistant and doing the explain on your query after wards.diagnostic helpstats on for session;explainselect a, b, cfrom t1, t2where t1.a = t2.b....When you recollect stats on a table, teradata internally does all the same operations it did the first time you collected stats on it. so depending on whether it was a full stats or a sampled stats it would either scan the full table (exceptions are if there are index subtables which would give a faster results) or scan a percentage of the table to collect stats information (this is across the whole table and not just the additional 5% or 10% of the records that changed)

member 16 posts Joined 02/08
20 Feb 2008

Hi all,Thanks for that...Can Diagnostic statistics be activated at a still higher level? so that when an explain is done we get the optimiser recommendations?Also in most of the explain statements we get something like.....We do an all-AMPs RETRIEVE step from "CS".employee1 by way of an all-rows scan with no residual conditions into Spool 1 (group_amps), which is built locally on the AMPs. what is meant by .....no residual conditions into Spool 1and (group_amps)???In another explain statement...... which is partially listed out here.....The size of Spool 2 is estimated with high confidence to be 54,555,839 rows. The estimated time for this step is 8.41 seconds. The size of Spool 3 is estimated with high confidence to be 1,852,824,064 rows. The estimated time for this step is 7 minutes and 56 seconds. We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a RowHash match scan, which is joined to Spool 3 (Last Use) by way of a RowHash match scan. Spool 2 and Spool 3 are joined using a merge join, with a join condition of ((T1.col1 > t2.col1) and ((t1.col2 < tab2.col2) and (tab1.col3 = tab2.col3))) The result goes into Spool (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no confidence to be 3,868,443,675 rows. The estimated time for this step is 3 minutes and 36 seconds. Note : collect stats is done on all the columns used in the join condition then why in the second last statement above states as NO CONFIDENCE ???Kindly explainThanks,abc

Someshnr 53 posts Joined 06/07
21 Feb 2008

Q. what is meant by .....no residual conditions into Spool 1Ans: All applicable conditions are applied.Q. and (group_amps)???Ans: A subset of AMPs will be used instead of All AMPs.Q. Note : collect stats is done on all the columns used in the join condition then why in the second last statement above states as NO CONFIDENCE ???Ans: Your join has inequality conditions. ( I am not 100% sure of this answer).

gander_ss 74 posts Joined 02/07
20 Apr 2009

what is the syntax to refresh all stats collectd on a particular table?Regards,Subhash

mukeshbela 1 post Joined 01/11
25 Feb 2011

what is the syntax to refresh all stats collectd on a particular table?

Ans: collect stats on databasename.tablename;

drmkd17 54 posts Joined 10/12
26 Jul 2016

I have billions of history data in tables. The jobs do an insert/ append into these tables. Can anybody recommend what is the right time to refresh stats :
After each insert/append job?
At the end of all jobs?
Since there is huge data the Refresh stats takes a lot of time!
Please suggest. 

M.Saeed Khurram 544 posts Joined 09/12
27 Jul 2016

The first thing is collect only the stats required, not all stats. It will save a lot of time and provide you performance.
As stats collection after every insert will take time and prolong the batch time, I would suggest to schedule the stats collection in non business hours after all jobs.

Khurram

drmkd17 54 posts Joined 10/12
27 Jul 2016

Thanks Khurram
ok.
Will the insert append lag if stats are not collected between Runs?

Fred 1096 posts Joined 08/04
27 Jul 2016

Stats are important for source tables in the INSERT/SELECT query. Stats on the target table are not (unless that table is also a source).

You must sign in to leave a comment.