The accuracy of sample statistics has been greatly improved over the last Teradata releases. So I usually try to use sample stats on most of the big tables and I found them to be reliable on many columns, not only the officially recommended unique or nearly-unique columns, e.g. on DATEs.

But there is a specific scenario when sample stats result in worst case optimizer plans:

The table is partitioned and there’s a dependency between the partitioning and the sampled column.


The process used for sampling stats is not the same as SAMPLE in a SELECT, which is a truly random sample. It’s similar to a TOP, i.e. it simply starts scanning the first percents of the table. This would result in wrong data when the stats are on the partitioning column, so the optimizer is smart enough to recognize that and switches to scan the first percent of each partition.

Now consider following scenario:

CREATE MULTISET TABLE statstest(dt DATE, yearmonth INT)

INSERT INTO statstest 
  ,EXTRACT(YEAR FROM calendar_date) * 100
  +EXTRACT(MONTH FROM calendar_date)
FROM sys_calendar.calendar 
WHERE calendar_date BETWEEN DATE '2010-01-01' AND CURRENT_DATE


SELECT MIN(yearmonth), MAX(yearmonth), COUNT(DISTINCT yearmonth) FROM statstest;
201001    201410    58

SHOW STATS VALUES COLUMN(yearmonth) ON stats test;
 /* MinVal                */ 201001, 
 /* MaxVal                */ 201002, 
 /* ModeVal               */ 201001, 
 /* NumOfDistinctVals     */ 2, 

Ops, there are 58 months but according to the stats there are only two.


monthyear is directly correlated to dt and because all rows within a table are sorted by partition (in this case by date) only the oldes dates are fetched.

Any query with a WHERE-condition based on yearmonth outside of the estimated range might result in a really bad plan. For equality the optimizer will assume the average number of rows (50,000) but a BETWEEN will result in estimated with high confidence to be 1 row. You can imagine the performance of such a plan.


To solve this problem switch to full stats whenever you need to collect stats on a dependent column.

Caution: In TD14.10 the optimizer might automatically switch to sample stats causing this problem to appear a few weeks delayed. In that case better force full stats using the NO SAMPLE option.


Gyanendra 6 comments Joined 03/11
03 Oct 2014

Hi Dieter,
I first collected sample (2% in this case) stat on table and then tried collected full stat using below SQL.

COLLECT STATS  COLUMN(yearmonth)  ON statstest ;

Now when I do help stat on table it still showing 2 distinct value for yearmonth. Do we need to first drop the STAT before switiching from sample stat to full stat?

dnoeth 86 comments Joined 11/04
03 Oct 2014

Hi Gyanendra,
in TD14+ the SAMPLE is automatically applied when you collect the next time. You need to switch it off explicitly using NO SAMPLE.


22 Oct 2014

Hi Dieter,
Thanks for the valuable information. Actually we are planning to enable Auto Stats from lower to higher environment. As part of testing, I did following testing on following cases to ensure whether everything is working as expected or not.
Case 1: Insert, update, delete operation - During collection, once system data change pattern reached, whether it is collecting or skipping for this DML operations.
Case 2: Analyze job - Whether it is giving right recommendations or not for all type of tables especially PPI tables, join indexes, etc
Case 3: Collect Job - Whether it is running the job as per the schedule or not and refreshing daily the PARTITION, Partition columns, NUSIs, sample statistics, DBC tables or not
Case 4: If it is a big table (>1000 rows/AMP) and already collected full statistics on full table, whether it is going for Sample stats or not
Case 5: If it is a small table (<1000 rows/AMP), whether it is going for full stats or not
Case 6: Whether it is giving recommendation on timestamp and collecting stats on Timestamp column or not
As per your experience, do you think, is there any other cases i need to do a test, so that we can have a smoother implementation of Autostats.
Thanks in advance.

23 Oct 2014

Hi Dieter,
Is there any solution to handle the Auto stats (via Auto scheduler) during maintenance window. the question here is, assume the stats are scheudled to run during this maintenance window where the system is down. Is there any solution to address this loss of stats.
Thanks in advance.

dnoeth 86 comments Joined 11/04
24 Oct 2014

Sorry, but I don't have expertise in using AutoStats, yet.


Shock 1 comment Joined 03/13
13 Nov 2014

Hi Harsha,
in the Autostats module in Viewpoint, you can choose which type of Sampling setting you want, by going to Actions -> Edit Collect Settings.

Mooli 1 comment Joined 12/14
01 Dec 2014

Hello, would you be available for 1:1 training in Teradata?

karteeky 1 comment Joined 02/11
07 Mar 2015

Hi Dieter,
Although I did Range partition a Date column (say Trx_Dt) and collected Partition stats, a simple "SELECT MIN(Trx_Dt) or MAX(Trx_Dt)" is doing FULL table scan (all-row scan)? I'm trying to understand why Teradata isn't knowing min/max values while it's a range partition. Through stats it has all the information - definitely it doen't have to scan through every partition for max(trx_dt). We have 7 billion record table with partition on Trx_Dt and each partition holds about 80 million records.
Thanks for your help with this!

eejimkos 8 comments Joined 01/12
11 Mar 2015

Thank you once more for all these new tricky updates on evert TD version.
One question ,  so we have to check all the PPI tables where the PI column is also a level of partition ?
Furthermore , this scenario is even more strange with Volatiles tables , there it works normally  :)
Thank you.

TDThrottle 11 comments Joined 11/11
10 Feb 2016

Hi Dieter,
How to disable stats extrapolation in 15.00. Truning off All-AMP dynamic Amp sampling will help?

dnoeth 86 comments Joined 11/04
10 Feb 2016

Afaik you can't turn of stats extrapolation and imho there's no reason to do so.
It's just a try to adjust the stats to the new count and if it's not working as expected you can simple recollect them.
All-AMP sampling simply increases the accuracy of the Random AMP Sample.


TDThrottle 11 comments Joined 11/11
11 Feb 2016

Thank you Dieter

You must sign in to leave a comment.