An earlier blog post focused on simple steps to get started using the Teradata 14.10 Automated Statistics Management (AutoStats) feature.  One of the new capabilities that AutoStats relies on when it streamlines statistics collection is the new “Threshold” option.  Threshold applies some intelligence about when statistics actually need to be re-collected, allowing the optimizer to skip some recollections.  

Although you will probably want to begin relying on AutoStats when you get to 14.10, you don’t have to be using AutoStats to take advantage of threshold, as the two features are independent from one another.  This post will give you a simple explanation of what the threshold feature is, what default threshold activity you can expect when you get on 14.10, and what the options having to do with threshold do for you.  And you’ll get some suggestions on how you can get acquainted with threshold a step at a time.  

For more thorough information about statistics improvements in 14.10, including the threshold functionality, see the orange book Teradata Database 14.10 Statistics Enhancements by Rama Korlapati.

What Does the Threshold Option Do?

When you submit a COLLECT STATISTICS statement in 14.10, it may or may not execute.  A decision is made whether or not there is a value in recollecting these particular statistics at the time they are submitted.  That decision is only considered if threshold options are being used.

Threshold options can exist at three different levels, each of which will be discussed more fully in their own section below.  This is a very general description of the three levels:

  1. System threshold:   This is the default approach for applying thresholds for all 14.10 platforms.  The system threshold default is not a single threshold value.  Rather this default approach determines the appropriate threshold for each statistic and considers how much the underlying table has changed since the last collection.
  2. DBA-defined global thresholds:  These optional global thresholds override the system default, and rely on DBA-defined fixed percentages as thresholds.   Once set, all statistics collection statements will use these global threshold values, unless overridden by the third level of threshold options at the statement level.
  3. Thresholds on individual statements:  Optional USING clauses that are attached to COLLECT STATISTICS statements can override the system default or any global DBA-defined thresholds when there is a need for customization at the individual statistic level.    

Whichever threshold level is being used, if the optimizer determines that the threshold has not been met, no statistics will be collected, even though they have been requested.  When a collection has been asked for but has not been executed, a StatsSkipCount column in the DBC.StatsTbl row that represents this statistics will be incremented.    

StatsSkipCount appears as an explicit column in the view, but in the base DBC.StatsTbl  StatsSkipCount is carried in the Reserved1 field.  When StatsSkipCount is zero it means that the most recent COLLECT STATISTICS request was executed.

Ways That a Threshold Can Be Expressed

The system setting (level 1) for threshold logic is not one threshold value applied to all statistics collections.  Rather, when enabled, the setting tells the optimizer to hold back the execution of a collection submission based on whatever it deems as an appropriate threshold for this statistics at this point in time.  This high-level setting uses a “percent of change” type of threshold only.

Statistics collection thresholds are explicitly specified when using DBA-defined global settings or individual statement thresholds are used.  These explicit thresholds can be expressed as a percent of change to the rows of the table upon which statistics are being collected, or as time (some number of days) since the last collection.  

The most reliable way to express thresholds is by means of a percent of table change.  That is why the highest level system setting, the one that is on by default, only supports percent of change thresholds.  Time as a threshold must be explicitly specified in order to be used.

Importance of DBQL USECOUNT Logging

The recommended percent of change thresholds rely on having DBQL USECOUNT logging turned on.  See my earlier blog on AutoStats for an explanation of USECOUNT DBQL logging.  USECOUNT logging is a special type of DBQL logging that is enabled at the database level.  Among other things, USECOUNT tracks inserts, deletes and updates made to tables within a database, and as a result, can provide highly accurate information to the optimizer about how the table has changed since the last statistics collection.

The default system threshold functionality is able to be applied to a statistic collection only if USECOUNT logging has been enabled for the database that the statistics collection table belongs to.   In the absence of USECOUNT data, the default threshold behavior will be ignored. However, both DBA-defined global thresholds and statement-based thresholds are able to use percent of change thresholds even without USECOUNT logging, but with the risk of less accuracy.

 In the cases where USECOUNT logging is not enabled, percent of change values are less reliable because the optimizer must rely on random AMP sample comparisons.  Such comparisons consider estimated table row counts (the size of the table) since the last statistics collection.  This can mask some conditions, like deletes and inserts happening in the same timeframe.  Comparisons based strictly on table rows counts are not able to detect row updates, which could change column demographics.  For that reason, it is recommended that USECOUNT logging be turned on for all databases undergoing change once you get to 14.10.

Percent of change is the recommended way to express thresholds when you begin to use the threshold feature in 14.10.   Time-based thresholds are offered as options primarily for sites that have evolved their own in-house statistics management applications at a time when percent of change was unavailable, and wish to continue to use time.

The next three sections discuss the three different levels of threshold settings.

More about the System Threshold Option

All 14.10 systems have the system threshold functionality turned on by default.  But by itself, that is not enough.   USECOUNT logging for the database must also be enabled.   If USECOUNT DBQL logging is turned on, then each COLLECT STATISTICS statement will be evaluated to see if it will run or be skipped.    

During this evaluation, an appropriate change threshold for the statistic is established by the optimizer.  The degree of change to the table since the last collection is compared against the current state of the table, based on USECOUNT tracking of inserts, deletes and updates performed.   If the change threshold has not been reached, and enough history has been collected for this statistics (usually four of five full collections) so that the optimizer can perceive a pattern in the data such that extrapolations can be confidently performed, then this statistics collection will be skipped.  

Even if the percent of change threshold has not been reached (indicating that statistics can be skipped), if there are insufficient history records, the statistics will be recollected.  And even with 10 or 20 history records, if there is no regular pattern of change that the optimizer can rely on to make reasonable extrapolations, statistics will be recollected.

There is a DBS Control record parameter called SysChangeThresholdOption which the behavior of the system threshold functionality.  This parameter is set at zero by default.  Zero means that as long as USECOUNT logging in DBQL is enabled for the database that the table belongs to, then all statistics collection statements will undergo a percent of change threshold evaluation, as described above.

If you want to maintain the legacy behavior, threshold logic can be turned off completely at the system level by disabling the SysChangeThresholdOption setting in DBS Control (set it to 3).  This field, along with parameters to set DBA-defined global parameters, can be found in the new Optimizer Statistics Fields in DBS Control.  

It is important to re-emphasize that the DBQL USECOUNT logging must be enabled for all databases that you want to take advantage of the system threshold functionality.  In addition, all other lower-level threshold settings must remain off (as they are by default) in order for the system threshold to be in effect.

More about DBA-Defined Global Thresholds

While it is recommended that the system threshold setting be embraced as the universal approach, there are some sites that have established their own statistics management processes prior to 14.10.  Some of these involve logic that checks on the number of days that have passed since the last collection as an indicator of when to recollect. 

In order to allow those statistics applications to continue to function as they have in the past within the new set of threshold options in 14.10, global thresholds parameters have been made available.  These options are one step down from the system threshold and will cancel out use of DefaultUserChangeThreshold.    

There are two parameters in the same section of DBS Control Optimizer Statistics Field that allow you to set DBA-defined thresholds:

DefaultUserChangeThreshold:   If this global threshold is modified with a percent of change value (some number > 0), then the system default threshold will be disabled, and the percent defined here will be used to determine whether or not skip statistic collections globally.

Unlike the system default, if DBQL USECOUNT logging has not been enabled, random AMP samples will be used instead if this global setting has been enabled.  The approach of using random AMP sample is somewhat less reliable, particularly in cases where there are updates, or deletes accompanied by inserts, rather than just inserts.      

DefaultTimeThreshold:    This global setting provides backward compatibility with home-grown statistics management applications that rely on the passage of time.  Using a time-based threshold offers a less precise way of determining when a given statistic requires recollection.   Some tables may undergo large changes in a 7-day period, while others may not change at all during that same interval.  This is a one-size-fits-all lacks specificity and may result in unneeded resource usage.

More about Statement-Level Thresholds

USING THRESHOLD syntax can be added manually to any COLLECT STATISTIC statement. 


When you use USING THRESHOLD, it will override any default or global threshold settings that are in place.   See the Teradata Database 14.10 Statistics Enhancements  orange book for detailed information about the variations of statement-level options you can use for this purpose.

For statement-based percent of change thresholds, the optimizer does not require that there be a history of past collections.  If data change is detected over the specified threshold statistics will be collected, otherwise they will be skipped.

Statement-level thresholds are for special cases where a particular statistic needs to be treated differently than the higher level default parameters dictate.  They can also be useful when you are getting starting with threshold, and you want to limit the scope to just a few statistics.

Getting Started Using Threshold

Here are some suggestions for sites that have just moved to 14.10 and want to experience how the threshold logic works on a small scale before relying on the system and/or global options:

  1. Pick a small, non-critical database.
  2. Enable DBQL USECOUNT logging on that database:
  1. Disable the system threshold parameter by setting DBS Control setting: 

SysChangeThresholdOption = 3

  1. Leave the global parameters disabled as they are by default:

DefaultUserChangeThreshold = 0

DefaultTimeThreshold = 0

  1. Add USING THRESHOLD 10 PERCENT to the statistics collection statements just for the tables within your selected database:
  1. Insert a few rows into the table (less than 10% of the table size) and run an Explain of the statistics collection statement itself, and it will tell you whether or not skipping is taking place.  

See page 39 of the Teradata Database 14.10 Statistics Enhancement orange book for some examples.

Summary of Recommendations for Threshold Use

The following recommendations apply when you are ready to use the threshold functionality fully:

  1. If your statistics are not under the control of AutoStats, make no changes and rely on the system threshold functionality to appropriately run or skip your statistics collection statements.
  2. Always turn on USECOUNT logging in DBQL for all databases for which statistics are being collected and where you are relying on system threshold.
  3. If you have your own statistics management routines that rely on executing statistics only after a specific number of days have passed, set the DefaultTimeThreshold to meet your threshold criteria.  You should experience similar behavior as you did prior to 14.10.  Over time, consider switching to a change-based threshold and re-establishing the system threshold, as it will be more accurate for you.
  4. Don’t lead with either of the DBA-defined global parameters DefaultUserChangeThreshold or DefaultTimeThreshold unless there is a specific reason to do so.
  5. Use the statement-level threshold only for special statistics that need to be handled differently from the system or the DBA-defined global defaults.
  6. Favor percent of change over number of days as your first choice for the type of threshold to use.  
  7. But if USECOUNT is not being logged, then rely on time-based thresholds and set DefaultTimeThreshold at your preferred number of days.


Sandeepyadav 27 comments Joined 09/13
17 Mar 2015

Hi Carrie,
There are 980 Database exists in my system. If i enable usecount logging for each database then is there any impact ?
DBQLRULES table will have 1000 rows. so Is it fine ?
Is it good to make different Collect Jobs or one is enough to collect the auto stats ?

Thanks, -Sandeep.

carrie 595 comments Joined 04/08
20 Mar 2015

I have not heard of any issues with UseCount logging overhead being problematic.  UseCount logging uses the same techniques as other DBQL collections.  There is a UseCount DBQL cache that is flushed at pre-set intervals and the cached entries are inserted into the database in a very efficient way.  However, every site is different, so cannot tell you for sure that you will not experience some overhead.  But based on feedback from other customers, it is not likely.
StatsUsage and XMLPlan logging, on the other hand, do come with some overhead.  For that reason, we recommend you only have that type of logging on for limited periods of time to capture representative queries when analysis is going to be performed.  StatsUsage has less impact as it is at the request level, while XMLPlan is at the step level and will have more overhead.
I can not think of any reason why 1000 rows in the DBQLRules table would be an issue.  These new options will cause the number of rows to increase, and I think that is expected.
In terms of AutoStats, a Collect job currently is limited to a single session.  However the collect lists generated by a Collect job group all the stats on a table into a single statement, so there will be parallelism and overlap achieved there.
If you have more available cycles you want to put into stats collection there is no reason you cannot create multiple Collect jobs with non-overlapping object scopes and schedule them to run at the same time.   Try to stay away from too many Collect jobs running at the same time, as that could become difficult to manage and monitor.
The AutoStats orange book (availabe in the orange book repository) has a lot of helpful tips in using AutoStats if you need more information in that area.
Thanks, -Carrie

Sandeepyadav 27 comments Joined 09/13
20 Mar 2015

Thanks Carrie :)

Thanks, -Sandeep.

15 May 2015

Hi Carrie,
A very nice article about thresholds.
Actually, We are in 14.10 DBMS Version and thought to implement AutoStats, but since we don't have a bug (DR's) free version in 14.10, we thought to go with our own custom built stats process (handle both change as well Time but Manual work for DBA) to manage Statistics instead of depending on AutoStats. we would like to take an advantage of USING THRESHOLD option to embed in our stats process but since we can't depend on USECOUNT Logging due to open DR and only option we have is TIME BASED THRESHOLD to collect the stats.
Is my interpretation correct? please correct me if i am wrong.

carrie 595 comments Joined 04/08
20 May 2015

If you believe that you cannot use USECOUNT logging, then you will not have update/delete/insert counts that the percent of change threshold relies on.  If that is the case, you are correct that you can still use the time threshold.  You might want to call the support center and see if you can get any fixes that would allow you to benefit from USECOUNT logging.  Most of initial problems with USECOUNT that I am aware of have been fixed in current software releases.
Thanks, -Carrie

26 May 2015

Thank you, Carrie.
Currently, there are open DR's for USECOUNT logging for 14.10 DBMS version. Our SSM is following up with Teradata Engineering to find the tentative date for bug free DBMS version.

24 Oct 2015

Hi Carrie
Is there a command to ignore the global threshold just once. Im afraid if use the statemnet level theshold I will change the stats collection threshold for that column for all later collections


Needless to say I am quite new to the new way of stats collection in td 14.
Also, just for my understanding, any reason optimizer will recollect stats on individual columns but skip the same columns combination in multicolum form. Usecount is enabled and we are at system default settings. The unique value for collect stats (for this particular combination ) is 13 million in dbc.statsV while actual unique values for these combination of columns is approx 17 million which I calculate it from a query.

carrie 595 comments Joined 04/08
27 Oct 2015

If you wish to ignore the default global threshold capability (or your own established threshold percent or number of days) for just a single statistics collection statement, you can specify USING NO THRESHOLD FOR CURRENT.  
This is documented in the orange book titled:  Teradata Database 14.10 Statistics Enhancements, on page 38:
========== begin text =================
 How to override the existing thresholds?
Specifying a new THRESHOLD option for statistics recollection overrides the current setting, if any. The optimizer applies the new THRESHOLD value to the current request, and remembers the updated THRESHOLD option for future statistics recollection. For example, to switch from a user-specified threshold to a system-determined threshold, use the SYSTEM THRESHOLD option in the new collect statistics statement.
If a threshold needs to be overridden only for the current collect statistics statement, use the FOR CURRENT option. This option uses the specified threshold in the current statement but does not replace the existing threshold option.
========== end text =================
It is also documented on page 188 of the SQL Data Definition Language Detailed Topics manual, in the section titled:  FOR CURRENT Option.
In terms of why the HELP STATS tells you the number of distinct values in your multi-column statistic is only 13 million while a query that calculates the actual number of of distinct values is 17 million, that could be due to the value length of the individual columns in the multi-column statistic. 
Each column in a multi column stat will get up to 32 bytes represented when the histogram aggregation is built.   If one of the multi columns in the stat is GT 32 bytes, that could be a reason for the number of distinct values being under-reported.…some of the distinctness within that one lengthy column may be being  truncated, and therefore is looks like fewer distinct values.   That will result in the multi column stats histogram to report fewer unique values that are actually present if you consider the full length of all the columns in the stat.
If you look at the table definition you can get the data lengths of each of those multi-column stats.  Then you can compare that to what is represented in the histogram.  Increasing MAXVALUELENGTH when you recollect can help here. 
The other question having to do with skipping a multi-column stat but not one of the columns in the multi-column stat:   It's possible that could happen under very specific circumstances.  For example, if any of the columns in multi-column stat is fairly unique or has a consistent predictable pattern of unique values across multiple history records, that multi-column stat inherits these attributes and can become eligible for skipping.  On the other hand, individual columns may not have consistent predictable pattern which leads to non-skipping. 
Thanks, -Carrie

23 Dec 2015

Hi Carrie,
Great informative article.
I am trying to find what are the default system settings [when does the system decides to collect stats ? ]
In DBS control, everything by default is set to 0
DBS Control Record - Optimizer Statistics Fields:
    1.  DefaultTimeThreshold        = 0 (Disabled; Default value)
   2.  DefaultUserChangeThreshold  = 0 (Disabled; Default value)
   3.  SysChangeThresholdOption    = 0 (Enabled with DBQL ObjectUsage; Default value)
   4.  SysSampleOption             = 0 (Enabled; Default value)
Besides above settings are ok for any system ? or need to modify them ?
Laeeq -

carrie 595 comments Joined 04/08
04 Jan 2016

The default settings are all zeros, your information below is correct.   The first two parameters are described above in the blog posting under the heading. 
                More about DBA-Defined Global Thresholds
They are turned off by default, and should stay off for most sites.  They are useful if the site has its own stats management application and wants to replace the global threshold setting (setting #3) with their own customized thresholds.
The other two settings generally should stay on by default, unless you have a specific reason to turn off the global threshold feature, or the automatic downgrade to sampling feature.  
So, basically you do not need to modify these parameters, unless for some reason you do not want the default behavior.  See the orange book Teradata database 14.10 Statistics Enhancements page 40 and 41 for a fuller explanation of these options.  
I'm not sure this is what you intended to say, but those statistics parameters do not give the system the power to determine when to collect stats.  You still have to issue collect statistics statements, or if you are using the Stats Manager portlet, then Viewpoint needs to submit the statements.  The 3rd setting, SysChangeThresholdOption, gives the optimizer the ability to NOT collect statistics (to skip collections) once the collection statements have been submitted to the database, based on internally determined thresholds being reached or not being reached.
Thanks, -Carrie

07 Jan 2016

Thanks Carrie. 
You almost covered which i asked.
Cheers :)
Laeeq -

sk6752 3 comments Joined 05/16
10 May 2016

In 14.10, I understand System Threshold option is enabled when SysChangeThresholdOption = 0 (Enabled with DBQL usage). I understand that optimizer does not skip re-collections when use count logging is not enabled and SYSTEM THRESHOLD keywords are  not specified in STATS statements. Does optimizer skip recollections when usecount logging is enabled and USING SYSTEM SAMPLE or SAMPLE N PERCENT options are specified in STATS statements ? 

carrie 595 comments Joined 04/08
12 May 2016

You are correct that usecount logging must be enabled for the database involved before skipping will be considered.  You don't have to specify USING SYSTEM THRESHOLD on the collect stats statement for skipping to be considered.  If the default SysChangeThresholdOption is on (which it is by default), and usecount logging is happening to that database, then skipping will be considered without specifying it on the collect stats statement. 
In the case of your question, the default system threshold will be applied first, even if SYSTEM THRESHOLD is not specified on the collect statement (for the reasons stated above). If the optimizer determines to skip recollections, then no sampling will be considered.  If the optimizer determines not to skip, then whichever of those two SAMPLE options is on the statement will be applied.   Only a single SAMPLE option is allowed per statement.
You can see whether a particular statement is eligible for either threshold or sampling by doing a SHOW STATISTICS COLUMN (col-name) ON table-name;  Look for any comment lines (lines that begin with “—“) that refer to either threshold or sampling, that is what will tell you what to expect.  For example:
Assume the Global threshold default is set to SYSTEM THRESHOLD
ON TPCD.OrderTbl0 ;
Thanks, -Carrie

geo590 1 comment Joined 07/09
18 May 2016

Hi Carrie,
Great article.  Is there an easy way to refresh ALL existing statistics on a table USING NO THRESHOLD?  It looks like you need to specify an explicit column / index when the USING command is added and can't do this at the table level...
Thanks for your help,
- George

carrie 595 comments Joined 04/08
20 May 2016

Hi George,
Offhand, I don't know of a way to apply USING NO THRESHOLD to all the stats for a table.  You could turn off SysChangeThresholdOption if you don't want threshold to be applied to any stat systemwide, or you could turn DBQL USECOUNT logging for the database the table is part of.  If you turn off USECOUNT logging for the database, that will prevent the threshold feature from being considered for the entire database, not just one table though.  So that may be too large a scope for you.
Maybe there is way of automating the rescripting of the collect statistics statements for the table.  Once you specify USING NO THRESHOLD for one collection for one or all stats of a table, you can recollect at the table level and the optimizer will remember that specification and honor it for each stat where you previously specified it, without your having to repeat the using clause each time.   
If I can think of anything else you can do, I'll post another message.
Thanks, -Carrie

sk6752 3 comments Joined 05/16
31 May 2016

Carrie, Thanks for great info. In 14.10, Will optimizer do system sampling when System Threshold option (SysChangeThresholdOption = 0) and Usecount logging is turned  on ? Should System Threshold option be disabled for optimizer to go with system sampling like how it does in 14.0 ? I ran few tests by enabling System Threshold option (SysChangeThresholdOption = 0) and Usecount logging and did not notice optimizer doing System Sampling.

carrie 595 comments Joined 04/08
03 Jun 2016

If you want the system to consider downgrade to sampling, you have to turn off the system threshold option.  As you say you have found out by trying it, if SysChangeTresholdOption is on, the optimizer will not consider system-determined sampling.
However, even if thresholding is off, the optimizer may not do system-initiated sampling.   There must be adequate history records that indicate that there is a pattern to growth over time.  In addiiton, the statistics "usage type" must be 'S".   Usage type of 'S' means that in the past the optimizer has noted that this statistic's summaryinfo portion of the histogram has primarily been used.  If the detailed portion of the historgram has been used (which it would be for single-table selection, where actual values for the column are needed by the optimizer), then system sampling will not be attempted.
A usage type of 'S' is usual if the statistic is used for join planning or for aggregations, where SummaryInfo detail such as the number of distinct values, number of NULLs, etc. are required to build the plan.
In addition, sampling will not be considered for dictionary tables, small tables, or very skewed tables, or PARTITION column.
Thanks, -Carrie

sk6752 3 comments Joined 05/16
06 Jun 2016

Carrie, Thanks for the explanation. 

You must sign in to leave a comment.