

How to decode the binary statistics stored in dbc tables
Attachment | Size |
---|---|
![]() | 108.38 KB |
Collected statistics are a valuable source of information in Teradata, not only for the optimizer, but also for developers or DBAs. Spotting outdated statistics (which might lead to bad optimizer plans) and implementing/monitoring a stats recollection process are common tasks, which need the current statistics as input.
The HELP STATS function returns information about stats on a table or column level, but there's no built-in way to extract info about all stats on all tables in a single query. The DBC views ColumnsStatistics, MultiColumnStats, IndexStats expose the stats data, but those statistics are stored in a binary format.
This internal format is documented in the "SQL Request and Transaction Processing" manual, although the exact details and the differences between 32- and 64-bit are somewhat vague. However utilizing that knowledge to decode the binary stats is limited by a restriction in Teradata SQL: BYTEs can't be casted to any other type. You know those 8 bytes represent a float, but you can't convert them to a readable value. Therefore the most important step to implement my stats query was discovering how to cast bytes to numeric values using the HASHBUCKET function.
I started to develop a query to decode Teradata's binary stats way back in 2003 on V2R4, over the following years I enhanced and fixed it several times (with a lot of Trial & Error) to accomplish new internal stats versions and the changes from 32 to 64-bit. This old stats query is available in the attachment area of John Hall's TeradataForum (btw still the best independent source of Teradata knowledge) and is used at many customer sites.
The latest release posted here extracts even more details without much additional overhead and now includes info about an index or a column from the table's partitioning expression without collected statistics. This StatsInfo view provides following columns, an asterisk "*" indicates information similar to HELP STATS output and "+" an enhanced or new column:
DatabaseName |
"ColumnName" is a comma-separated list of the names of the columns for which statistics are reported. To remove those double quotes do a global Search & Replace within the source code replacing '"' with '', but this will also remove the double quotes from "CollectStatement" and "HelpStatement" |
|
TableName |
||
* |
ColumnName |
|
+ |
IndexName | Name of the index, if it exists |
* |
ColumnCount |
The number of columns in the stats or index definition. The "FieldCount" of HELP STATS is not exactly the same, it might be less, as this is the number of columns actually stored in the stats. By default only 16 columns are supported although Teradata allows up to 64. Hopefully there are no indexes or stats with more than 16 columns, but additional columns are already in the source code, just commented out. Search for "uncomment" when you need to modify it. |
+ |
MissingStats |
Indicates an index or a column from the table's partitioning expression without collected statistics: Y/N |
+ | DateIncluded |
"ColumnName" includes a DATE or TIMESTAMP: Y/N |
+ |
PartitionColumn |
Stats include a column which is used in the table's partitioning expression: Y/N |
+ |
PartitionLevels |
Number of levels in the table's partitioning expression, zero means not partitioned |
+ |
PartitionsDefined |
The number of partitions defined within that table Only calculated for single level partitioning and TD12, previous releases will always show 65535 instead of the correct number. Might be 65535 on current releases, too, due to (yet) unknown reasons sometimes the TableCheck column in dbc.IndexConstraints isn't created correctly. |
+ |
TableType |
TempTbl --> Global Temporary Table |
+ |
StatsType |
Stats collected on: UPI --> Unique Primary Index |
* |
CollectDate |
Date and time when statistics were last collected |
* |
CollectTime |
|
CollectTimestamp |
||
CollectDuration |
How long did the Collect Stats run. This is not always correct as the calculation is based on the LastAlterTimestamp, which is not only set when stats are collected, but also during a ALTER TABLE modifying the column attributes (this should be a rare case, but can't be fixed). Additionally on some systems there's a fixed deviation (due to some unknown reason, Time zone settings on TD vs. OS level?), which can easily be spotted when comparing "LastAlterTimestamp" and "CollectTimestamp". Search for "TimeAdjust" to add or subtract that fixed value. Old statistics might generate another deviation when the system time zone was changed since collecting. Recollecting those stats should fix it. |
|
* |
NumericStats |
Indicates single column stats on a numeric or date column: Y/N |
* |
SampleSize |
Sample size used for collect stats, NULL if not sampled |
* |
StatsVersion |
Internal version of statistics: 1 --> pre-V2R5 |
* | NumAMPs |
The number of AMPs from which statistics were collected, usually the number of AMPs in the system, 1 for an empty table |
* |
NumIntervals |
The number of intervals in the frequency distribution histogram containing the column statistics. |
* |
AvgAmpRPV |
Overall average of the average rows per value from each AMP, only for NUSIs, otherwise zero |
* |
OneAMPSampleEst |
The estimated cardinality of the table based on a single-AMP sample |
* |
AllAmpSampleEst |
The estimated cardinality of the table based on an all-AMP sample |
* |
NumNulls |
The number of rows containing NULLs for one or more columns in "ColumnName" |
* |
NumAllNulls |
The number of rows containing nulls for all columns in "ColumnName" |
* |
NumRows |
The cardinality of the table, i.e. the number of rows |
* |
NumValues |
The number of unique values |
* |
ModeFreq |
The frequency of the most common value |
*+ |
ModalValue |
The most common value, only for single column numeric or date stats |
*+ |
MinValue |
The minimum value, only for single column numeric or date stats |
*+ |
MaxValue |
The maximum value, only for single column numeric or date stats |
+ |
DataSize |
The size in bytes needed to store the statistics. If greater than 16 only partial data is stored. For single column stats the actual size for any numeric column is 8, as it's always stored as float. |
+ |
FieldType |
The datayte of a single column stats, NULL for multiple columns |
+ |
LastAlterTimestamp |
The last time when statistics have been collected (or this column/index was modified by an ALTER TABLE) |
+ |
LastAccessTimestamp |
The last time and the number of times this column/index was used in queries, the same info is found in dbc.TablesV and dbc.Indices Of course this is only available when it has been enabled via dbscontrol ObjectUseCountCollectRate |
+ |
AccessCount |
|
+ |
TableId |
To facilitate additional joins to other system tables |
+ |
IndexNumber |
From dbc.Indexes - 1..128: Internal index number, >128: pseudo index = multicolumn stats |
+ |
FieldId |
From dbc.TVFields - NULL for multi-column or PARTITION stats |
+ |
CollectStatement |
COLLECT STATS statement to (re-)collect the stats
I got several request for syntax changes:
Finally i included all four possible variations in the source code, now it's your choice, just uncomment the preferred syntax: |
+ |
HelpStatement |
HELP STATS statement to get the stats details on column or index level |
TD14 stores statistics in a more readable format in a system table (dbc.StatsTbl already exists in TD13, but wasn't populated). The new dbc.StatsV returns much of the information i extracted here, but i like to get more details, so i wrote a new version for TD14
Please report any issues as this version has not been tested that much, i.e. you're my beta testers :-)
I'm open for enhancement requests, too, but I don't think there's much to add to this query without joining additional tables. You can do that on your own, that's why there's the TableId/FieldId/IndexNumber column.
2011-12-28:
Already fixed the first bug, zero PartitionLevels for PARTITION stats.
Added PartitionsDefined
2011-12-31:
Fixed 'Invalid Date' error for empty tables
2012-02-01:
Added IndexName
Additional syntax variations for "CollectStatement"
2012-10-03:
Fixed wrong calculation of ColumnCount/DataSize/DateIncluded for some stats involving PARTITION
Added IndexNumber and FieldId
2012-10-23
Fixed a wrong calculation of PartitionColumn when a double quoted ColumnName is used in the partition definition
2012-11-24
Fixed: Wrong calculation of PartitionColumn when a ColumnName is used in different upper/lower case in the partitioning expression (dbc.TableConstrains.TableCheck is defined as CASESPECIFIC)
Attached files:
teradata_statistics.pdf |
A short description of the binary format |
stats_32bit.sql |
For 32-bit systems: MP-RAS and Teradata Express for Windows |
stats_64bit.sql |
For 64-bit systems: SLES, Windows Server and Teradata Express VMWare |
stats_64bit_udf.sql |
For 64-bit systems running TD13.10: SLES, Windows Server and Teradata Express VMWare Replaces the ugly HASHBUCKET calculations with SQL UDFs, the source code looks much cleaner now Needs a Search & Replace to change '<dbname>' with a valid database name: |
Thank for your effort to decode the stats info!
I used the older versions regularly in my daily work and and will test the latest verison as well.
It gives really good information and helps also to indentify stats which might not be usefull but are resoucre intensive in the collection process.
feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud
I looked into the scripts and got lost. Can you please help me understand which one should I be running for a TD13?
Thanks,
Vasu
Assuming a 64-bit TD13: stats_64bit.sql
Dieter
Dieter
Thanks for the update(s) Dieter. When v13 came out and we moved to 64-bit Linux, I also tailored the old script to recognize bitness and version to decode properly. I did this all in one script, but it did require a function to be created which provided the bitness. Let me know if anyone wants that single code stream.
As you mention, this all goes away in TDv14. I beta tested that and love the changes!
I have used your stats sql extensively over the years. We base our stats collection process on this view. I look forward to using this latest update.
Thank you very much for your continued support over the years in this area. I am glad to see that Teradata has taken this up and is introducing their own version in TD14.
dnoeth,
!!!!YOU ARE THE BEST!!!
I tweaked your query and used it for quick compression analysis.
I used the ratio of ModeFreq to NumRows, to do compression analysis.
Also, your query is really helpful to clone the stats across the environments.
Thanks a ton!!!!!
TD14 introduced a new syntax SHOW STATISTICS, which will present you the detailed stats. Still, appreciate your great efforts!
Hi Diether,
I just tried one code, i saw in some of your post and tweaked a little to macth my requirement. Its gives the same result as you statsinfo view..just wanted to share..comments welcome
SELECT c.databasenamei AS databasename , b.tvmnamei AS tablename
FROM dbc.tvfields a , dbc.tvm b , dbc.dbase c
WHERE a.tableid = b.tvmid
AND b.databaseid = c.databaseid
AND a.fieldstatistics IS null
AND databasename = 'databasename'
-- and b.tablekind='T'
and (databasename,tablename)
not in (
SELECT c.databasenamei AS databasename , b.tvmnamei AS tablename
FROM dbc.tvfields a , dbc.tvm b , dbc.dbase c
WHERE a.tableid = b.tvmid
AND b.databaseid = c.databaseid
AND a.fieldstatistics IS not null
AND databasename = 'databasename'
-- and b.tablekind='T'
union
sel tp.databasename,tp.tablename
from
(
SELECT d.databasenamei AS databasename , t.tvmnamei AS tablename,
MAX(SUBSTR(i.IndexStatistics, 1, 80)) AS STATS
FROM
dbc.dbase d
JOIN dbc.tvm t
ON d.databaseid = t.databaseid
JOIN dbc.tvfields c
ON t.tvmid = c.tableid
JOIN dbc.Indexes i
ON c.tableid = i.tableid
AND c.fieldid = i.fieldid
/*** Add list of DBs to exclude***/
WHERE d.DatabaseName IN ('databasename')
AND
(i.IndexType IN ('M','S','K','U','H','O','V','P','Q'))
-- and t.tablekind='T'
group by 1,2
HAVING STATS IS not NULL
) tp
)
group by 1,2
Thank for your effort to decode the stats info!
I used it once.
I posted a new version of my stats query fixing some bugs.
Dieter
I posted a new version of my stats query fixing another bug.
Dieter
Hi Dnoeth,
This is the best information regarding stats, so far I can seen.
-Thanks
Thanks, this is really helpful
Just installed on 14.10.00.00. So far, so good! I'll let you know of any issues.
This is a very helpful article.
I want add the intervals (basically this portion from SHOW STATS /** Interval: MaxVal, ModeVal, ModeFreq, LowFreq, OtherVals, OtherRows **/ ). What are the tables/views that I should look at?
The goal is to scan the dbc tables/views to get the detailed histogram so it can be used on TD14 as well as an earlier version when necessary.
Thank you!
The interval details are still stored in a binary format in TD14.
This new format is much more complicated, a BLOB instead of VARBYTE, i wouldn't try to reverse-engineer it (at least not in SQL).
You can get the main info from dbc.StatsV or my StatsInfo and then use HELP/SHOW STATS for the details.
Dieter
Dieter
Got it, thank you for the reply!
Hi Dnoeth,
I was trying to select from the view and I was seeing the following error.
2616-Numeric overflow occured during computation. I isolated the problem to the following piece of code
CAST( (-1**(NumNullsw1 / 32768))
* (2**((NumNullsw1/16 MOD 2048) - 1023))
* (1 + ((NumNullsw1 MOD 16) * 2**-4) + (NumNullsw2 * 2**-20)
+ (NumNullsw3 * 2**-36) + (NumNullsw4 * 2**-52)) AS FLOAT),
I have tried casting it to FLOAT and DECIMAL but the error persists.
Please advise.
Which TD release? 32 or 64bit?
I usually encountered that error when the new statistics version 4 was introduced.
Could you check if there's a really old statistic with version 2 or 3?
Are you 100% shure you're running the latest version? (Sorry to ask, but this was the cause for most problems like this)
Could you show some rows (mainly the STATS binary data) from running the nested Derived Table (before the HASHROW...)?
Dieter
Dieter
Hi Dnoeth,
Apologies , There was a small mistake from my side as I was running the 32bit script on a 64 bit operating system.
I executed the 64 Bit script and its working great now!
Thanks for your quick response!!
Hi Dnoeth,
I have few questions on the o/p of stats view :
1. What is the meaning of Col in status type column ?
2. If i have a index on col1 and col2 combination and my stats is on col1,col2 combination and i dont have on individual columns col1 and col2, it shows as status type=col. Secondly, do i need to stats take stats on individual columns or a combination would suffice ?
3. If i have a col say col1 , which is PPI in the table. Then do i need to take stats on col1 or a stats on "partition" is enough.
Thanks !
#1: should indicate statistics on a single non-indexed column
#2: If it's a multi-column stats it should be "MCol" instead of "Col", seems to be a buggy calculation, could you show the DDL and the COLLECTs for this table?
#3: You should collect stats on both, column and PARTITION. Check Carrie Ballinger's blogs for details:
http://developer.teradata.com/blog/carrie/2012/04/teradata-13-10-statistics-collection-recommendations
Dieter
Dieter
Hi Dieter
I need to drop the table statistics and then rebuild them.
To find out what statistics to rebuild I am using the view StatsInfo.
1. Selecting rows where MissingStats='N', would I only get the
statistics that already exist?
2. When I create a simple table (create, table tab( col varchar(32)))
and collect statistics (COLLECT, STATS USING SAMPLE COLUMN(col) ON tab),
the view returns (COLLECT, STATS INDEX("col") ON "user"."tab").
How can I get the information about 'using sample'?
Thank you!
#1: yes
#2: This is by purpose.
You don't need to specify SAMPLE anymore in TD14, the collect stats process automatically remembers previous Sample/MaxIntervals/MaxValueLength settings.
Dieter
I just want to log on for saying thank you. This script help me in an urgence. Thanks a lot!
Thank You Dieter Nöth. I hope there comes a day when I can return the favor. Robert Turner
Robert
I would love to use this code. However I am not allowed to access the DBC tables, only the DBC.views.
Teradata Managed services seem extremely reluctant to give me any access to the tables.
So is there any guide how I could update this code to use information that is stored in the views only?
@Hibellm:
You only need this query if you're on a release lower than TD14.
Since TD14 there's a builtin view returning similar information, dbc.StatsV, try this instead, every user (=PUBLIC) should have access rights.
Dieter