
New StatsInfo query for TD14
Attachment | Size |
---|---|
![]() | 212.2 KB |
![]() | 50.26 KB |
Beginning with TD14 statistics are no longer stored in dbc.TVFields and dbc.Indexes, they have been moved into dbc.StatsTbl to facilitate several enhancements. A new view dbc.StatsV returns much of the information previously extracted in my StatsInfo query.
But of course this is still not enough information, at least not for me ;-)
dbc.StatsV vs. StatsInfo TD13/14
I tried to get the best of both worlds and so i wrote a new version for TD14 to extract as much additional data as possible. I had to remove a few columns (they're no longer needed or it's no longer possible to get that info) and i renamed some to match the new names in dbc.StatsV. Most of the new columns were simply not available before TD14.
Following table describes the new StatsInfo view and details the differences to the previous version and dbc.StatsV:
dbc.StatsV TD14 |
New StatsInfo TD14 |
StatsInfo TD13.10 |
Remarks |
Description |
---|---|---|---|---|
|
(Column added/removed/changed) |
|
|
|
DatabaseName |
DatabaseName |
DatabaseName |
|
|
TableName |
TableName |
TableName |
|
|
ColumnName |
ColumnName |
ColumnName |
StatsId <> 0 |
List of comma-separated column names |
FieldIdList |
FieldIdList |
FieldId |
StatsId <> 0 |
List of semicolon-separated field ids |
StatsName |
StatsName |
StatsName |
StatsId <> 0 |
Alias name of the statistics (if assigned) |
|
IndexName |
IndexName |
StatsId <> 0 |
Name of the index (if assigned) |
|
DateIncluded |
DateIncluded |
StatsId <> 0 |
DATE or TIMESTAMP column included, Y/N |
|
PartitionColumn |
PartitionColumn |
StatsId <> 0 |
Column included which is used in the table's partitioning expression: Y/N |
|
PartitionLevels |
PartitionLevels |
|
Number of levels in the table's partitioning expression, zero means not partitioned |
|
ColumnPartitioningLevel |
|
|
Level number for the column partitioning level, > 0 indicates columnar table |
|
PartitionsDefined |
PartitionsDefined |
|
The number of partitions defined |
ExpressionCount |
ExpressionCount |
ColumnCount |
StatsId <> 0 |
The number of columns in the statistics |
StatsId |
StatsId |
|
|
StatsId = 0 → Summary Stats |
StatsType |
StatsType |
|
|
Statistics collected on: |
|
StatsTypeOld |
StatsType |
|
Statistics collected on: |
|
TableType |
TableType |
|
TempTbl → Global Temporary Table |
StatsSource |
StatsSource |
|
|
The method this statistic is acquired: |
ValidStats |
ValidStats |
|
|
TD14.10: Indicates whether the statistics are valid or not: Y/N |
DBSVersion |
DBSVersion |
|
|
Database version statistics collected on |
SampleSizePct |
SampleSizePct |
SampleSize |
StatsId <> 0 |
Sample size used for collect stats, NULL if not sampled |
SampleSignature |
SampleSignature |
|
StatsId <> 0 |
Sample option encoded as a 10 character signature |
ThresholdSignature |
ThresholdSignature |
|
StatsId <> 0 |
THRESHOLD options encoded as a 17 character signature (not used before TD14.10) |
MaxIntervals |
MaxIntervals |
|
StatsId <> 0 |
User-specified maximum number of intervals |
StatsSkipCount |
StatsSkipCount |
|
StatsId <> 0 |
TD14.10 only: How many times the statistis collection has been skipped based on the THRESHOLD |
MaxValueLength |
MaxValueLength |
|
StatsId <> 0 |
User-specified maximum value length |
LastCollectTimestamp |
LastCollectTimestamp |
CollectTimestamp |
|
Date and time when statistics were last collected |
|
LastCollectDate |
CollectDate |
|
|
|
LastCollectTime |
CollectTime |
|
|
RowCount |
RowCount |
NumRows |
|
The cardinality of the table, i.e. the number of rows |
UniqueValueCount |
UniqueValueCount |
NumValues |
StatsId <> 0 |
Distinct Values. Estimated when sampled |
PNullUniqueValueCount |
PNULLUniqueValueCount |
|
StatsId <> 0 |
Number of unique values from rows with partial NULLs (multicolumn stats) |
NullCount |
NULLCount |
NumNULLs |
StatsId <> 0 |
Number of partly NULL and all NULL rows, estimated when sampled |
AllNullCount |
AllNULLCount |
NumAllNULLs |
StatsId <> 0 |
Number of all NULL rows (multicolumn stats), estimated when sampled |
HighModeFreq |
HighModeFreq |
ModeFreq |
StatsId <> 0 |
Frequency of the most common value, estimated when sampled |
PNullHighModeFreq |
PNULLHighModeFreq |
|
StatsId <> 0 |
Highest frequency of values having partial NULLs (for multicolumn stats), stimated when sampled |
AvgAmpRPV |
AvgAmpRPV |
AvgAmpRPV |
StatsId <> 0 |
Overall average of the average rows per value from each AMP, only for NUSIs, otherwise zero |
|
MinValue |
MinValue |
StatsId <> 0 |
Minimum data value (only for single column numeric or datetime stats) |
|
ModalValue |
ModalValue |
StatsId <> 0 |
Most common data value (only for single column numeric or datetime stats) |
|
MaxValue |
MaxValue |
StatsId <> 0 |
Maximum data value (only for single column numeric or datetime stats) |
|
OneAMPSampleEst |
OneAMPSampleEst |
StatsId = 0 |
Estimated cardinality based on a single-AMP sample |
|
AllAmpSampleEst |
AllAmpSampleEst |
StatsId = 0 |
Estimated cardinality based on an all-AMP sample |
DelRowCount |
DelRowCount |
|
StatsId = 0 |
Deleted rows count??? used in 14.10 ??? |
PhyRowCount |
PhyRowCount |
|
StatsId = 0 |
Seems to be the same as AllAMPSampleEst – used in 14.10 ??? |
AvgRowsPerBlock |
AvgRowsPerBlock |
|
StatsId = 0 |
Average number of rows per datablock??? |
AvgBlockSize |
AvgBlockSize |
|
StatsId = 0 |
Average datablock size??? |
BLCPctCompressed |
BLCPctCompressed |
|
StatsId = 0 |
Blockcompression in percent??? used in 14.10 ??? |
BLCBlkUcpuCost |
BLCBlkUcpuCost |
|
StatsId = 0 |
CPU cost for Blockcompression??? used in 14.10 ??? |
BLCBlkURatio |
BLCBlkURatio |
|
StatsId = 0 |
??? used in 14.10 ??? |
AvgRowSize |
AvgRowSize |
|
StatsId = 0 |
Average record size??? |
Temperature |
Temperature |
|
StatsId = 0 |
populated in 14.10??? |
NumOfAMPs |
NumOfAMPs |
NumAMPs |
|
The number of AMPs from which statistics were collected, usually the number of AMPs in the system, 1 for an empty table |
CreateTimeStamp |
CreateTimeStamp |
|
|
Statistics creation timestamp |
LastAlterTimeStamp |
LastAlterTimeStamp |
LastAlterTimeStamp |
|
Different meaning: Last user updated timestamp, i.e. Collect stats was submitted but skipped by optimizer due to threshold not reached |
|
LastAccessTimestamp |
LastAccessTimestamp |
|
The last time this column/index was used in queries, the same info is found in dbc.TablesV and dbc.IndicesV |
|
AccessCount |
AccessCount |
|
How often this column/index was used in queries, the same info is found in dbc.TablesV and dbc.IndicesV |
|
TableId |
TableId |
|
To facilitate additional joins to other system tables |
|
IndexNumber |
IndexNumber |
StatsId <> 0 |
Index number of the index on which statistics are collected |
|
FieldType |
FieldType |
|
Single column stats: dbc.TVFields.FieldType, NULL for multi-column |
|
Version |
StatsVersion |
|
Internal version of statistics: |
OriginalVersion |
OriginalVersion |
|
StatsId <> 0 |
Probably version when stats were migrated from older releases, but not yet recollected |
NumOfBiasedValues |
NumOfBiasedValues |
|
StatsId <> 0 |
Number of biased values in the histogram |
NumOfEHIntervals |
NumOfEHIntervals |
|
StatsId <> 0 |
Number of equal height intervals in the histogram |
NumOfRecords |
NumOfRecords |
|
|
Number of history records in the histogram |
|
CollectStatement |
CollectStatement |
|
COLLECT STATS statement to collect the stats. |
|
ShowStatement |
HelpStatement |
|
SHOW STATS VALUES statement to get the stats details. |
|
|
MissingStats |
|
Was a side-product of the old query, too much overhead to add |
|
|
NumIntervals |
|
Replaced by NumOfBiasedValues & NumOfEHIntervals |
|
|
CollectDuration |
|
Not (yet) possible, i don't know if this is stored somewhere |
|
|
NumericStats |
|
No longer neccessary |
|
|
DataSize |
|
Too much overhead to calculate, not really needed as the limitation of 16 bytes is removed in TD14 |
Please report any issues or obviously wrong output to dnoeth@gmx.de.
Attached files:
StatsInfo_vs_StatsV.pdf |
Describes the new StatsInfo view and details the differences to the previous version and dbc.StatsV - added, modified and removed columns (same as above table) |
Teradata Statistics TD14.pdf |
Partial description of the new internal stats format based on some reverse engeneering of the binary data stored in a BLOB in dbc.StatsTbl.Histogram. Luckily the internal storage maps almost 1:1 to the output of a SHOW STATISTICS VALUES :-) |
stats_td14_yyyymmdd.sql |
StatsInfo source code. To keep the code clean it's based on SQL-UDFs |
ReverseBytes.sql ReverseBytes.c |
Can be used to replace the ReverseBytes SQL-UDF with a C-UDF which uses way less CPU (but most DBA's don't like C). |
Uploaded stats_td14_20140605.txt, fixing two bugs:
2014-02-03 dn fixed a bug resulting in a Syntax error for a „SELECT * FROM StatsInfo14;" in ANSI mode
We're still at 13.10 and use the column: "missing stats" to find all those tables with no stats on them and add pi and parition stats on them. Is there a simple way to find this info?
How about finding all those tables that are not in the below list:
(
SELECT DATABASENAME,TABLENAME FROM DBC.COLUMNSTATS WHERE FIELDSTATISTICS IS NOT NULL
UNION
SELECT DATABASENAME,TABLENAME FROM DBC.INDEXSTATS WHERE INDEXSTATISTICS IS NOT NULL
UNION
SELECT DATABASENAME,TABLENAME FROM DBC.MULTICOLUMNSTATS WHERE COLUMNSSTATISTICS IS NOT NULL
)
;
I want to ensure I do not break my script once we go to TD 14
Hi Dnoeth,
I was able to compile the Functions and the views without any problems , but after that when I executed
sel * from km84.StatsInfo14 , I get the following error
8500 : Object not found
Please advise
@suhailmemon84:
The old Field/Index/MultiColumnStats views don't work anymore in TD14, they're always empty. In fact it's much easier, you just have to check for those names not in dbc.StatsV
@sg186048:
What's your release? I never encountered this before, it's indicating a problem with a LOB and the only LOB is the stats Histogram.
Can you run the select directly instead of querying the view?
Sorry for the late answer, but i was on vacation :-)
Dieter
Dieter
Alright thanks Dieter. Thats relieving.
Regards,
Suhail
Hello Dieter,
Another quick question. The previous pdcrinfo.statsinfo view(of 13.10) showed details of all tables: regardless of whether they had stats in them or not. Is the new view (of 14) doing the same too? Or it just reports those tables that have stats in them already?
Regards,
Suhail
Hi Suhail,
the StatsInfo view is not part of PDCR, it was just created within the PDCRInfo database at your site.
And the new view doesn't show indexes without stats anymore, it was just a nice a side-product of the old query, but adding it to the StatsInfo14 would result in a much more complicated query.
Dieter
Dieter
I'm sorry for confirming this again on more time.
If i create a table and add no statistics to it, the statsinfo14 view will not show it. Correct?
-Suhail
Hi Suhail,
yes :-)
Dieter
Dieter
Hello Dieter,
I am getting an error while executing view " REPLACE VIEW Failed. 3706: Syntax error: Data Type "ColumnName" does not match a Defined Type name. "
We are in 14.00.04.02.
-Rao
Hi Rao,
this error is afaik returned when the optimizer couldn't match the datatypes.
Strange, is this at OREPLACE(ColumnName, ....)?
Do you have a pre-TD14 version of OREPLACE in your default database which doesn't support unicode?
Dieter
Dieter
Hi Rao,
i just checked it, this error is returned when there an unknown function, seems like the optimizer didn't find OREPLACE.
SELECT * FROM dbc.functionsV WHERE functionname = 'OREPLACE'
should return 2 rows:
DatabaseName FunctionName SpecificName
TD_SYSFNLIB OREPLACE oreplace2
TD_SYSFNLIB OREPLACE oreplace
Dieter
Dieter
Thx Dieter.
Yes, this is at OREPLACE(ColumnName, ....).
Here is the 2 rows returned when I submit "SELECT * FROM dbc.functionsV WHERE functionname = 'OREPLACE' "
DatabaseName FunctionName SpecificName FunctionId NumParameters ParameterDataTypes
TD_SYSFNLIB OREPLACE oreplace_cu 00-00-43-07-00-00 3 COCOCO
TD_SYSFNLIB OREPLACE oreplace_cu2 00-00-44-07-00-00 3 CVCOCO
- Rao
On 14.00.04.05 it's:
What's your exact release?
'CO' means CLOB, '++' is TD14's TD_ANYTYPE.
Dieter
Dieter
We are in: 14.00.04.02
InfoKey InfoData
VERSION 14.00.04.02
RELEASE 14.00.04.02
- Rao
Hi Rao,
strange, very strange. If this were functions in SYSLIB there would be an explanation, you simply created a non-standard version on your own. But only TD R&D can add functions to TD_SYYFNLIB
You should ask your Teradata support for assistance.
Btw, does a simple oReplace('blablablablablablablablablablablabla', 'bl', '') work?
Dieter
Dieter
Thx Dieter. I will try.
-Rao
Hi Dieter
hi .
i need to convert orcale package which have many storedprocedure into to teradata , but there is no package concept in teradata so how can i use something which can hold more than one StoredProcedure ??
please reply..
how can i use %rowtype is teradata?
is there any similar type in teradata like oracle %rowtype?
@tas_58: Could you please post totally unrelated questions to the forum.
Dieter
Hi Dieter,
I am trying to find unused stats, I tried to use your view by keeping filter on Lastaccesstimestamp. But it is showing as Null for all the multi column stats?
Please let me know if there is any other way to find unused stats.
we are on TD 13.10.
Hi Sivakunar,
as you're on 13.10,did you use my old version?: StatsInfo pre-TD14
Is the access count feature enabled on your system, i.e. do other dbc views like dbc.TablesV return values?
Dieter
Yes Dieter, I am using the old version.
And yes, access feature is enabled. I can use the counts for the views you mentioned.
In fact, I can see the count for the single column stats through view. It is just multi column stats which is showing null for lastaccesstimestamp and Accesscount for all multi coulmn stats.
Hi Sivakunar,
it was quite late yesterday, so I didn't read your post thoroughly.
LastAccessTimestamp does not carry info about when stats where used, it's just the last time the object was accessed in a query. Thus for multicolumn stats it's always NULL.
Dieter
Thanks Dieter for the clarification.
Is there any other way to identify the unused stats.
Hi Sivakunar,
afaik in TD13.10 there's no way to tell if stats are not used by the optimizer except reading explains and checking the confidence level.
In TD14.10 there's a STATSUSAGE option in BEGIN QUERY LOGGING as a part of the AutoStats feature.
Dieter
k, thanks Dieter
Hi Dieter,
Just curious if I can do help stats in a sp on TD 13.10?
Thanks,
@TusharGade:
No, you can't use any kind of HELP or SHOW in a SP...
BUT:
There was a nice article on how to bypass that restriction recently:
Running Unsupported Queries from a Stored Procedure
I didn't test it, yet, but this sounds great and you should be able to change the source code to support HELP, too.
Dieter
Hi Dieter,
We are using 14.10 with Sles11 and when i tried executing view " REPLACE VIEW Failed. 3706: Syntax error: Data Type "ColumnName" does not match a Defined Type name. "
Below is output of
"SELECT * FROM dbc.functionsV WHERE functionname = 'OREPLACE' "
DatabaseName FunctionName SpecificName FunctionId NumParameters ParameterDataTypes FunctionType ExternalName SrcFileLanguage NoSQLDataAccess ParameterStyle DeterministicOpt NullCall PrepareCount ExecProtectionMode ExtFileReference CharacterType Platform InterimFldSize RoutineKind ParameterUDTIds AuthIdUsed MaxOutParameters GLOPSetDatabaseName GLOPSetMemberName
1 TD_SYSFNLIB OREPLACE oreplace 00-00-C3-07-00-00 3 ++++++ F oreplace P Y I N Y N S ? 1 UDTFP 0 R ? ? 0 ? ?
2 TD_SYSFNLIB OREPLACE oreplace2 00-00-C4-07-00-00 2 ++++ F oreplace2 P Y I N Y N S ? 1 UDTFP 0 R ? ? 0 ? ?
Please suggest
Hi Dieter,
When i do Show Stats on one column it gives following values
/* MinVal */ ‘AAAA’, 'Truncated',
/* MaxVal */ ‘AAAA, 'Truncated',
/* ModeVal */ ‘AAAA, 'Truncated',
I want to identify all the columns where the Histogram data is truncated and then may be increase the maxvaluelength for those columns to see if i get a better plan.
I queried Statsinfo14 but it did not report any.
Statsinfo14 returns NULL (?) for MinVal, MaxVal, and ModeVal whereever we have the above pattern (‘AAAA’, 'Truncated’,)
Can you please let me know why does statsinfo14 returns NULL instead of the (‘AAAA’, 'Truncated’,)
- Abhishek Jadhav
Thanks
Abhishek Jadhav
Hi Abhishek,
yep, it's returning NULL because it's a VarChar, read the description of Min/Max/ModeVal:
(only for single column numeric or datetime stats)
The reason is simple, I don't know how to extract strings from the stats-BLOB :-)
You might calculate the SUM(ColumnLength) of all columns in a statistic, if it's less greater than MaxValueLength (the default is afaik 26) you can flag it as possible truncated.
I've never tried that and since ColumnLength is not always the physical size it might need some additional tweaks...
Dieter
Hello Dieter,
We recently upgraded to 14.10 from 13.10,
One of our procedure contains below code,
SELECT
"DatabaseName"
,"TableName"
,"IndexNumber"
,"IndexName"
,"UniqueFlag"
,"ColumnPosition"
,"ColumnName"
,"IndexStatistics"
FROM
"DBC"."IndexStats"
WHERE
"DatabaseName" = :inDatabaseName
AND "IndexType" IN ('K', 'P', 'Q', 'S')
There are 4 columns which do not exist in DBC.StatsV from above code.
What can be the replacement?
I could see DBC.IndexStatsV but again source is same.
ColumnPosition is not needed anymore as you get all ColumnNames in a comma-delimited list.
IndexName and UniqueFlag would need a join to dbc.IndicesV using DatabaseName/TableName/IndexNumber plus a condition WHERE ColumnPosition = 1.
Dieter
Thanks Dieter, Could you please elaborate on "IndexStatistics"?
What can be the replacement?
There's no other replacement than IndexStatsV.
What else do you need from the old view?
Dieter
I am looking for replacement of column "IndexStatistics" from old view.
There's no replacement.
Dbc.StatsTbl.Histogram holds the binary stats, similar to dbc.Indexes.IndexStatistics, but the internal structure is totally different.
Dieter
Hi Diether,
We have migrated to TD 14.10 few days back. Previously we had TD 13.10. In our collect stats architecure, we have the below query that fails now saying that DBC.MULTICOLUMNSTATS doesn't exist. Could you please tell us what is the new parallel view/table to use in place of this.
SELECT DATABASENAME,TABLENAME
FROM DBC.INDEXSTATS
WHERE INDEXSTATISTICS IS NOT NULL
UNION
SELECT DATABASENAME,TABLENAME
FROM DBC.MULTICOLUMNSTATS
WHERE COLUMNSSTATISTICS IS NOT NULL
--Samir
Basically i want to find tables having stats on them - single column, index or multicolumn. below query gives me the o/p , but i want to know if there is any view like in 13.10 (as we used in above query )that has this info :
SELECT db.DATABASENAME,t.TVMName , s.expressionlist
FROM DBC.StatsTbl AS s
JOIN DBC.Dbase AS db
ON s.DatabaseId = db.DatabaseId
JOIN DBC.TVM AS t
ON s.ObjectId = t.TVMId
WHERE ExpressionCount > 1
and db.databasename = 'DBname'
and t.TVMName ='tablename'
Hi Diether,
I think i got it. I saw your post above regarding this. I just need to check the presence of that databasename and tablename in view dbc.StatsV. This view contains all the columns on which stats has been collected. Please let me know if my understanding is correct.
--Samir
Hi Samir,
yep, this is correct :)
Dieter
Hi Diether,
in our production environment I found out a strange situation that may have revealed a bug of the view.
Maybe you want this behavior, if so, can you explain the reason?
We have a table, let's call it TABLE_A, that has some constraints defined on it in this way:
The three constraints plus the partition constraint are all stored in the dictionary table DBC.TableConstraints.
In the view the table DBC.TableConstraints is in LEFT JOIN to take the fields DefinedCombinedPartitions and ColumnPartitioningLevel.
Let's imagine that TABLE_A has 4 stats defined.
When doing the left join with DBC.TableConstraints the result is that these 4 stats are joined with the 4 rows inside TableConstraints causing a duplication of the result of the view.
In fact, we have 16 rows in the answerset!
From what I understand you are interested only in the partitioning constraints.
Why not put a filter in the LEFT JOIN with TableConstraints to only make participate in the join the rows that has ConstraintType = 'Q' (that means partitioning constraints)?
Thanks in advance and regards
Giovanni
Hi Giovanni,
that's a bug, it was fixed half a year ago, but I forgot to upload it.
Now it's attached, thanks
Dieter
Dieter
Hello .. We are on 14.10. The SHOW STATISTICS ON <table> gives all the stats on the table, in a COLLECT STATS syntax. Is there a way to capture this result in a stored procedure ??
We are attempting to compress the tables, and TD doesn't allow compression if there is a stats on the column. So I am attempting to capture the statement, drop all the stats, compress the table and recollect stats using the results saved earlier. If the SHOW STATS results cannot be saved, I guess my option is to select columns from DBC.STATSV and generate COLLECT STATS script.
Thanks .. John Abraham
Regards
John Abraham
Hi John,
to run a SHOW STATS in a SP you might try Glenn McCall's approach:
Running Unsupported Queries from a Stored Procedure
But I woud prefer creating an empty copy of the table.
Then use this to store the stats temporarily using COLLECT STATS FROM.
After compression you simply copy it back.
Dieter
Thanks Dieter. I will explore the option of calling XSPs. For now, I wrote the code to capture all columns which have stats, then generate a script to collect stats, and executing at the end.
SHOW STATISTICS shows stats for all columns and indices. Even for the Index column, it shows as COLUMN (col name). I know we cannot collect stats on the same column twice as INDEX and also as COLUMN. Assumig the PI is ACCT_NBR, shouldn't there be a difference between collecting the stats as INDEX (ACCT_NBR) vs COLUMN (ACCT_NBR) ?
Regards
John Abraham
Hi Dieter,
Great Work..!!! many thanksss..!!
I faced the same issue that the OREPLACE was not working on select view. However simple OREPLACE is function is working. Anyways, I used the non OREPLACE version and its working great.
Id like to know how we would incorporate the missing stats for tables?
thanks.
Hi Dieter, first of all many thanks for your work, we have used a lot your StatInfo view on TD12.
And now, on TD14.10, I cannot create the function Byte2Float, the query ends with this error: "Function BITAND called with an invalid number or type of parameters".
Can you help me? The exect version of TD is 14.10.05.02.
Thanks, bye
Simone
Hi Dieter,
I have just created all the functions and view (StatsInfo14).
How can I execute the "C" code which you have provided.
"ReverseBytes.c"
Is below error related to this "C" code?
Regards, Arpit
Pages