All Forums General
dsnaveen 6 posts Joined 07/10
12 Oct 2013
Collecting Stats

Hi Guys,
Please need your help in understanding when to collect stats.
I know if the no. of count in a table is increased by 10%, we need to collect stats.
How to validate the count has been increased or there's no change in the table.
Thank you.
 
 
 

Sun_shine_jgd 39 posts Joined 07/13
15 Oct 2013

The best practice is to collect the stats once your data load is done.You do not have to calculate the % of data changed.
_aravind

Sun_shine_jgd 39 posts Joined 07/13
15 Oct 2013

in fact YEs.. collect stats shd be taken when a certain % of data changes.
if you know its very few you donot have to take it. But make it a best practice to take stats in a specific interval for effictive plan generation by the parser.

dsnaveen 6 posts Joined 07/10
20 Oct 2013

Thanks for the reply aravind.
How to validate the count has been increased or there's no change in the table.
Thanks,
 
 
 

Sun_shine_jgd 39 posts Joined 07/13
23 Oct 2013

I would say, better have a practice of collecting stats after the table load has done.
Count shd be validated by b4 and after.But if the change is by updating the key columns this wont be suitable.So keep taking stats on a regular interval(weekly or monthly) depending on the table load.
-Aravind

nkardos 12 posts Joined 10/10
24 Oct 2013

It is a complicated problem, and there are different strategies based on data type, frequency and usage.
Some of the stats are need to be collected right after load, some of them is enough daily (in offpeak hours), and some of them can be recollected even less frequently.
You can waste a lot of resource on collecting unnecessary stats, so be careful!
I would recommend PRISE StatMan to automate statistics collection, you can get a full functional free trial here: https://www.prisetools.com/productrequest
 

18 Nov 2013

I have a table of around 5M rows and in that one column is used frequently in where clause.
 
But that one column has only 2 possible values. It can either be null or 'Alpha'. Is it recommended to collect stats on this column or not?

Regards,
Muhammad Fahad.

21 Nov 2013

Generally speaking; Yes you should collect stats on the column, so that optimiser is aware of the low cardinality of the column
You can also compare Explain plan in both cases -- In specific situations, it may not be required, in case there is no difference in join stratergy, redistribution etc

 

M.Saeed Khurram 544 posts Joined 09/12
21 Nov 2013

Fahad,
I have a few recomnedations:

  1. Try to collect stats always, as it helps the optimizer to choose best plans for query execution.
  2. As the column has only two possible values, you can use compression on the column and for 5 million rows it will save space and improve processing.
  3. as the column is being use in where clause frequantly but the values are not unique, you can crate a NUSI to improve access.

 

Khurram

alevai 4 posts Joined 10/11
28 Nov 2013

Statistics is highly recommended if frequently used in where clauses.
The neccessity of re-collecting the statistics is basically determined by 2 factors:
* Does the table size change significantly? (Teradata suggests +-10% record count change as a trigger)
* Does the Null / 'Alpha' ratio change significantly?
 
As this statistics will lay on low cardinality column its cost is not too high, you can collect regularly (eg. weekly). Concerning Full / Sample question I would use sample statistics, however in earlier versions (V12) I experienced strange results in case of low cardinality cases.
If you use management tool hopefull it will contain those heuristics that determines if statistics actually has to be refreshed or not.
Ákos
 

ToddAWalter 316 posts Joined 10/11
28 Nov 2013

Stats on this column are strongly indicated if the column is used in the where clause regularly. If you do not have stats, the optimizer will assume approximately 5% selectivity which will be far off from the 50% (assuming the values have even representation).
A NUSI on this very non-unique column will not be used. A NUSI needs to have values that are represented in fewer rows than there are data blocks in the table.
Sample stats are not optimal for extremely non-unique columns like this. It is hard to get an accurate sample that correctly represents the distribution. But this is the cheapest type of data to collect full stats on. Plus it is very likely that these stats only need to be collected very infrequently since the distribution is unlikely to change quickly.
Compression for this column is strongly indicated.
 

You must sign in to leave a comment.