Package icon stats_20121124a.zip108.38 KB

Collected statistics are a valuable source of information in Teradata, not only for the optimizer, but also for developers or DBAs. Spotting outdated statistics (which might lead to bad optimizer plans) and implementing/monitoring a stats recollection process are common tasks, which need the current statistics as input.

The HELP STATS function returns information about stats on a table or column level, but there's no built-in way to extract info about all stats on all tables in a single query. The DBC views ColumnsStatistics, MultiColumnStats, IndexStats expose the stats data, but those statistics are stored in a binary format.

This internal format is documented in the "SQL Request and Transaction Processing" manual, although the exact details and the differences between 32- and 64-bit are somewhat vague. However utilizing that knowledge to decode the binary stats is limited by a restriction in Teradata SQL: BYTEs can't be casted to any other type. You know those 8 bytes represent a float, but you can't convert them to a readable value. Therefore the most important step to implement my stats query was discovering how to cast bytes to numeric values using the HASHBUCKET function.

I started to develop a query to decode Teradata's binary stats way back in 2003 on V2R4, over the following years I enhanced and fixed it several times (with a lot of Trial & Error) to accomplish new internal stats versions and the changes from 32 to 64-bit. This old stats query is available in the attachment area of John Hall's TeradataForum (btw still the best independent source of Teradata knowledge) and is used at many customer sites.

The latest release posted here extracts even more details without much additional overhead and now includes info about an index or a column from the table's partitioning expression without collected statistics. This StatsInfo view provides following columns, an asterisk "*" indicates information similar to HELP STATS output and "+" an enhanced or new column:



"ColumnName" is a comma-separated list of the names of the columns for which statistics are reported.

To remove those double quotes do a global Search & Replace within the source code replacing '"' with '', but this will also remove the double quotes from "CollectStatement" and "HelpStatement"






IndexName Name of the index, if it exists


The number of columns in the stats or index definition. The "FieldCount" of HELP STATS is not exactly the same, it might be less, as this is the number of columns actually stored in the stats.

By default only 16 columns are supported although Teradata allows up to 64. Hopefully there are no indexes or stats with more than 16 columns, but additional columns are already in the source code, just commented out. Search for "uncomment" when you need to modify it.



Indicates an index or a column from the table's partitioning expression without collected statistics: Y/N
+ DateIncluded

"ColumnName" includes a DATE or TIMESTAMP: Y/N



Stats include a column which is used in the table's partitioning expression: Y/N



Number of levels in the table's partitioning expression, zero means not partitioned



The number of partitions defined within that table

Only calculated for single level partitioning and TD12, previous releases will always show 65535 instead of the correct number. Might be 65535 on current releases, too, due to (yet) unknown reasons sometimes the TableCheck column in dbc.IndexConstraints isn't created correctly.



TempTbl --> Global Temporary Table
Tbl     --> Table
JoinIdx --> Join Index
HashIdx --> Hash Index
NoPITbl --> No Primary Index Table



Stats collected on:

UPI  --> Unique Primary Index
NUPI --> Non-Unique Primary Index
USI  --> Unique Secundary Index
NUSI --> Non-Unique Secondary Index
VOSI --> Value Ordered NUSI
Part --> Pseudo column PARTITION
Col  --> Single non-indexed column
MCol --> Multiple non-indexed columns



Date and time when statistics were last collected






How long did the Collect Stats run.

This is not always correct as the calculation is based on the LastAlterTimestamp, which is not only set when stats are collected, but also during a ALTER TABLE modifying the column attributes (this should be a rare case, but can't be fixed).

Additionally on some systems there's a fixed deviation (due to some unknown reason, Time zone settings on TD vs. OS level?), which can easily be spotted when comparing "LastAlterTimestamp" and "CollectTimestamp". Search for "TimeAdjust" to add or subtract that fixed value.

Old statistics might generate another deviation when the system time zone was changed since collecting. Recollecting those stats should fix it.



Indicates single column stats on a numeric or date column: Y/N


Sample size used for collect stats, NULL if not sampled



Internal version of statistics:

1 --> pre-V2R5
2 --> V2R5
3 --> TD12
4 --> TD12.0.3.1/TD13.0.0.21

* NumAMPs

The number of AMPs from which statistics were collected, usually the number of AMPs in the system, 1 for an empty table



The number of intervals in the frequency distribution histogram containing the column statistics.



Overall average of the average rows per value from each AMP, only for NUSIs, otherwise zero



The estimated cardinality of the table based on a single-AMP sample



The estimated cardinality of the table based on an all-AMP sample



The number of rows containing NULLs for one or more columns in "ColumnName"



The number of rows containing nulls for all columns in "ColumnName"



The cardinality of the table, i.e. the number of rows


The number of unique values



The frequency of the most common value


The most common value, only for single column numeric or date stats


The minimum value, only for single column numeric or date stats



The maximum value, only for single column numeric or date stats



The size in bytes needed to store the statistics.

If greater than 16 only partial data is stored.

For single column stats the actual size for any numeric column is 8, as it's always stored as float.



The datayte of a single column stats, NULL for multiple columns


The last time when statistics have been collected (or this column/index was modified by an ALTER TABLE)



The last time and the number of times this column/index was used in queries, the same info is found in dbc.TablesV and dbc.Indices

Of course this is only available when it has been enabled via dbscontrol ObjectUseCountCollectRate





To facilitate additional joins to other system tables



From dbc.Indexes - 1..128: Internal index number, >128: pseudo index = multicolumn stats


From dbc.TVFields - NULL for multi-column or PARTITION stats


COLLECT STATS statement to (re-)collect the stats


I got several request for syntax changes:
- use the legacy syntax "COLLECT STATS ON table COLUMN ..." (deprecated in TD14) instead of the recommended "COLLECT STATS COLUMN ... ON table"
- use an existing "IndexName" instead of "ColumnList"


Finally i included all four possible variations in the source code, now it's your choice, just uncomment the preferred syntax:
- Recommended syntax with column names (default)
- Recommended syntax with index or column names
- Legacy syntax with column names
- Legacy syntax with index or column names



HELP STATS statement to get the stats details on column or index level

TD14 stores statistics in a more readable format in a system table (dbc.StatsTbl already exists in TD13, but wasn't populated). The new dbc.StatsV returns much of the information i extracted here, but i like to get more details, so i wrote a new version for TD14

Please report any issues as this version has not been tested that much, i.e. you're my beta testers :-)

I'm open for enhancement requests, too, but I don't think there's much to add to this query without joining additional tables. You can do that on your own, that's why there's the TableId/FieldId/IndexNumber  column.


  Already fixed the first bug, zero PartitionLevels for PARTITION stats.

  Added PartitionsDefined


  Fixed 'Invalid Date' error for empty tables


  Added IndexName

  Additional syntax variations for "CollectStatement"


  Fixed wrong calculation of ColumnCount/DataSize/DateIncluded for some stats involving PARTITION

  Added IndexNumber and FieldId


  Fixed a wrong calculation of PartitionColumn when a double quoted ColumnName is used in the partition definition


  Fixed: Wrong calculation of PartitionColumn when a ColumnName is used in different upper/lower case in the partitioning expression (dbc.TableConstrains.TableCheck is defined as CASESPECIFIC)

Attached files:


A short description of the binary format


For 32-bit systems: MP-RAS and Teradata Express for Windows


For 64-bit systems: SLES, Windows Server and Teradata Express VMWare


For 64-bit systems running TD13.10: SLES, Windows Server and Teradata Express VMWare

Replaces the ugly HASHBUCKET calculations with SQL UDFs, the source code looks much cleaner now

Needs a Search & Replace to change '<dbname>' with a valid database name:
SQL UDFs must be fully qualified (unless they are created in the "syslib" database) because the database name is NOT resolved when the view is created.


ulrich 51 comments Joined 09/09
28 Dec 2011

Thank for your effort to decode the stats info!
I used the older versions regularly in my daily work and and will test the latest verison as well.
It gives really good information and helps also to indentify stats which might not be usefull but are resoucre intensive in the collection process.

feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud

VasuKillada 12 comments Joined 10/11
11 Jan 2012

I looked into the scripts and got lost. Can you please help me understand which one should I be running for a TD13?


dnoeth 86 comments Joined 11/04
11 Jan 2012

Assuming a 64-bit TD13: stats_64bit.sql



Kevin Leach 2 comments Joined 11/04
20 Jan 2012

Thanks for the update(s) Dieter. When v13 came out and we moved to 64-bit Linux, I also tailored the old script to recognize bitness and version to decode properly. I did this all in one script, but it did require a function to be created which provided the bitness. Let me know if anyone wants that single code stream.

As you mention, this all goes away in TDv14. I beta tested that and love the changes!

madhugraju 6 comments Joined 05/11
26 Jan 2012

I have used your stats sql extensively over the years. We base our stats collection process on this view. I look forward to using this latest update.

Thank you very much for your continued support over the years in this area. I am glad to see that Teradata has taken this up and is introducing their own version in TD14.

SPOLISETTI 2 comments Joined 06/09
28 Jan 2012



I tweaked your query and used it for quick compression analysis.

I used the ratio of ModeFreq to NumRows, to do compression analysis.

Also, your query is really helpful to clone the stats across the environments.

Thanks a ton!!!!!

WeiZheng 1 comment Joined 03/10
16 May 2012

TD14 introduced a new syntax SHOW STATISTICS, which will present you the detailed stats. Still, appreciate your great efforts!

teradatauser2 29 comments Joined 04/12
29 Jul 2012

Hi Diether,
I just tried one code, i saw in some of your post and tweaked a little to macth my requirement. Its gives the same result as you statsinfo view..just wanted to share..comments welcome

SELECT c.databasenamei AS databasename , b.tvmnamei AS tablename
FROM dbc.tvfields a , dbc.tvm b , dbc.dbase c
WHERE a.tableid = b.tvmid
AND b.databaseid = c.databaseid
AND a.fieldstatistics IS null
AND databasename = 'databasename'
-- and b.tablekind='T'
and (databasename,tablename)
not in (
SELECT c.databasenamei AS databasename , b.tvmnamei AS tablename
FROM dbc.tvfields a , dbc.tvm b , dbc.dbase c
WHERE a.tableid = b.tvmid
AND b.databaseid = c.databaseid
AND a.fieldstatistics IS not null
AND databasename = 'databasename'
-- and b.tablekind='T'


sel tp.databasename,tp.tablename
SELECT d.databasenamei AS databasename , t.tvmnamei AS tablename,
MAX(SUBSTR(i.IndexStatistics, 1, 80)) AS STATS
dbc.dbase d
JOIN dbc.tvm t
ON d.databaseid = t.databaseid
JOIN dbc.tvfields c
ON t.tvmid = c.tableid
JOIN dbc.Indexes i
ON c.tableid = i.tableid
AND c.fieldid = i.fieldid
/*** Add list of DBs to exclude***/
WHERE d.DatabaseName IN ('databasename')

(i.IndexType IN ('M','S','K','U','H','O','V','P','Q'))

-- and t.tablekind='T'
group by 1,2
) tp
group by 1,2

miajones 1 comment Joined 09/12
17 Sep 2012

Thank for your effort to decode the stats info!
I used it once.

dnoeth 86 comments Joined 11/04
23 Oct 2012

I posted a new version of my stats query fixing some bugs.


dnoeth 86 comments Joined 11/04
24 Nov 2012

I posted a new version of my stats query fixing another bug.


ashikmh 2 comments Joined 02/12
07 Mar 2013

Hi Dnoeth,
This is the best information regarding stats, so far I can seen.

ana.sipaque 1 comment Joined 02/11
14 Mar 2013

Thanks, this is really helpful

rr186010 2 comments Joined 01/11
08 May 2013

Just installed on So far, so good! I'll let you know of any issues.

TDMonica 2 comments Joined 01/13
10 May 2013

This is a very helpful article.
I want add the intervals (basically this portion from SHOW STATS  /** Interval: MaxVal, ModeVal, ModeFreq, LowFreq, OtherVals, OtherRows **/ ). What are the tables/views that I should look at?
The goal is to scan the dbc tables/views to get the detailed histogram so it can be used on TD14 as well as an earlier version when necessary.
Thank you!

dnoeth 86 comments Joined 11/04
12 May 2013

The interval details are still stored in a binary format in TD14.
This new format is much more complicated, a BLOB instead of VARBYTE, i wouldn't try to reverse-engineer it (at least not in SQL).
You can get the main info from dbc.StatsV or my StatsInfo and then use HELP/SHOW STATS for the details.


TDMonica 2 comments Joined 01/13
14 May 2013

Got it, thank you for the reply!

sg186048 4 comments Joined 09/12
11 Jun 2013

Hi Dnoeth,
I was trying to select from the view and I was seeing the following error.
2616-Numeric overflow occured during computation. I isolated the problem to the following piece of code
CAST( (-1**(NumNullsw1 / 32768))
* (2**((NumNullsw1/16 MOD 2048) - 1023))
* (1 + ((NumNullsw1 MOD 16) * 2**-4) + (NumNullsw2 * 2**-20)
+ (NumNullsw3 * 2**-36) + (NumNullsw4 * 2**-52)) AS FLOAT),
I have tried casting it to FLOAT and DECIMAL but the error persists.
Please advise.

dnoeth 86 comments Joined 11/04
11 Jun 2013

Which TD release? 32 or 64bit?
I usually encountered that error when the new statistics version 4 was introduced.
Could you check if there's a really old statistic with version 2 or 3?
Are you 100% shure you're running the latest version? (Sorry to ask, but this was the cause for most problems like this)
Could you show some rows (mainly the STATS binary data) from running the nested Derived Table (before the HASHROW...)?


sg186048 4 comments Joined 09/12
12 Jun 2013

Hi Dnoeth,
Apologies , There was a small mistake from my side as I was running the 32bit script on a 64 bit operating system.
I executed the 64 Bit script and its working great now!
Thanks for your quick response!!

teradatauser2 29 comments Joined 04/12
14 Jun 2013

Hi Dnoeth,
I have few questions on the o/p of stats view :
1. What is the meaning of Col in status type column ?
2. If i have a index on col1 and col2 combination and my stats is on col1,col2 combination and i dont have on individual columns col1 and col2, it shows as status type=col. Secondly, do i need to stats take stats on individual columns or a combination would suffice ?
3. If i have a col say col1 , which is PPI in the table. Then do i need to take stats on col1 or a stats on "partition" is enough.
Thanks !

dnoeth 86 comments Joined 11/04
14 Jun 2013

#1: should indicate statistics on a single non-indexed column
#2: If it's a multi-column stats it should be "MCol" instead of "Col", seems to be a buggy calculation, could you show the DDL and the COLLECTs for this table?
#3: You should collect stats on both, column and PARTITION. Check Carrie Ballinger's blogs for details:


JHuk 2 comments Joined 11/13
20 Nov 2013

Hi Dieter
I need to drop the table statistics and then rebuild them.
To find out what statistics to rebuild I am using the view StatsInfo.
1. Selecting rows where MissingStats='N', would I only get the
statistics that already exist?
2. When I create a simple table (create, table tab( col varchar(32)))
and collect statistics (COLLECT, STATS USING SAMPLE COLUMN(col) ON tab),
the view returns (COLLECT, STATS INDEX("col") ON "user"."tab").
How can I get the information about 'using sample'?
Thank you!

dnoeth 86 comments Joined 11/04
22 Nov 2013

#1: yes
#2: This is by purpose.
You don't need to specify SAMPLE anymore in TD14, the collect stats process automatically remembers previous Sample/MaxIntervals/MaxValueLength settings.


mikecyl 3 comments Joined 01/12
30 Jun 2014

I just want to log on for saying thank you. This script help me in an urgence. Thanks a lot!

rt5258 1 comment Joined 07/14
24 Jul 2014

Thank You Dieter Nöth. I hope there comes a day when I can return the favor. Robert Turner


Hibellm 1 comment Joined 11/14
01 Jul 2015

I would love to use this code. However I am not allowed to access the DBC tables, only the DBC.views.
Teradata Managed services seem extremely reluctant to give me any access to the tables.
So is there any guide how I could update this code to use information that is stored in the views only?

dnoeth 86 comments Joined 11/04
01 Jul 2015

You only need this query if you're on a release lower than TD14.
Since TD14 there's a builtin view returning similar information, dbc.StatsV, try this instead, every user (=PUBLIC) should have access rights.


You must sign in to leave a comment.