0 - 45 of 45 tags for statistics

This is a new one for me.  When I run the command below, I get a spool error.  The table has about 620 million rows in it.  Can someone please suggest what I need to do in order to resolve this?
COLLECT STATISTICS
  USING THRESHOLD 5 PERCENT
    AND THRESHOLD 7 DAYS  

Hi, let me start by saying I love the stats managment facility in Viewpoint.  The only problem I have is in scheduling. My users are running mini batches 24X7 from Informatica and managed by Tivoli Scheduler and I've had to integrate my backups into the schedule to avoid locking/performance issues.

Hi,
I would like to understand the performance improvement we might get when using secondary index vs collecting Stats. 
If Stats are collected on a singe/multiple columns used often in queries, isn't that sufficient? When should we go for secondary indices, apart from achieving uniqueness on a combination of columns.
Thanks

Scenario:
Stats were collected in Teradata 12
Teradata upgraded to Teradata 14
What happens to the stats from Teradata 12?

Here's a clever way to force the query optimizer to follow the same *stable* plan for a query regardless of the data set it runs against.  Useful when the data set cardinalities and row count profiles are well understood.  Also useful when ID numbers increase and the query optimizer would have to extrapolate past the right-most interval.

New DBQL logging options USECOUNT and STATSUSAGE, introduced in Teradata Database 14.10, enable the logging of used and missing statistics.  The output of this logging can be utilized to find used, unused, and missing statistics globally (for all queries) or for just a subset of queries.

This session focuses on statistics enhancements in Teradata Database 14.10, particularly the new threshold functionality, skipping statistics, and downgrade to sampling.

We have a Customer asking this question:
-----Original Message-----

Statistical information is vital for the optimizer when it builds query plans. But collecting statistics can involve time and resources. By understanding and combining several different statistics gathering techniques, users of Teradata can find the correct balance between good query plans and the time required to ensure adequate statistical information is always available.

This recently-updated compilation of statistics collection recommendations are intended for sites that are on any of the Teradata Database 14.10, 15.0, or 15.10 software release levels. Some of these recommendations apply to releases earlier than Teradata Database 14.10 and some rely on new features available starting in Teradata Database 14.10.  Statistics collection functionality in the Teradata Database works the same in 14.10, 15.0 and 15.10 releases.

For greater detail on collecting statistics for Teradata Database 14.10, see the orange book titled:  Teradata Database 14.10 Statistics Enhancements by Rama Krishna Korlapati.

Contributors: Carrie Ballinger, Rama Krishna Korlapati, Paul Sinclair, September 11, 2014

Managing Teradata Optimizer statistics can be labor intensive and error prone. Many users struggle to know what columns and indexes to collect statistics on, how to detect and prevent statistics from becoming stale, and how to know if collected statistics are being used.

Hello fellow Teradata-ers,

An earlier blog post focused on simple steps to get started using the Teradata 14.10 Automated Statistics Management (AutoStats) feature.  One of the new capabilities that AutoStats relies on when it streamlines statistics collection is the new “Threshold” option.  Threshold applies some intelligence about when statistics actually need to be re-collected, allowing the optimizer to skip some recollections.  

Although you will probably want to begin relying on AutoStats when you get to 14.10, you don’t have to be using AutoStats to take advantage of threshold, as the two features are independent from one another.  This post will give you a simple explanation of what the threshold feature is, what default threshold activity you can expect when you get on 14.10, and what the options having to do with threshold do for you.  And you’ll get some suggestions on how you can get acquainted with threshold a step at a time.  

Hello, 
Consider the following situation: 
Table Alpha has the following columns:
Col1
Col2
Col3
Col4
Col5
Primary Index (Col1, Col2, Col3, Col4)
 
Table Beta has the following columns:
Col1
Col2
Col3
Col6
Col7
Primary Index (Col1, Col2, Col3)
 

Besides collecting statistics on your columns on your Teradata database, the compressing of the data to save disk space is a very important maintaining task. So why not connect these two tasks? The idea is to extract the values for the multi value compression of the columns out of the collected statistics.
 

The "SHOW STATISTICS VALUES COLUMN col ON db.tab; " shows the Biased values of the statistics. In my opinion this is exactly a very good list for multi value compression.

Everyone with a Teradata Database collects statistics.  No way around it.  Good query plans rely on it, and we’ve all gotten used to it.

But starting in Teradata Database 14.10 there’s some big changes.  Numerous enhancements to the optimizer, some important database changes, a new set of APIs, an added data dictionary table, and a brand new Viewpoint portlet all combine together to produce the Automated Statistics (AutoStats) feature.  Used fully, AutoStats can identify statistics the optimizer looked for and found missing, flag unused statistics that you no longer need to collect on, prioritize and execute the stats collection statements, and organize/run your statistics collection jobs in the most efficient way possible.

There’s a lot going on and a lot of choices involved.  So what’s the easiest way to start getting familiar with AutoStats?   And what’s required with the new DBQL logging options?  If you like to build up confidence in a new feature at your own pace, this blog posting offers a few simple steps designed to introduce you to key features and benefits AutoStats.  

Statistics provides valuable information to the optimizer to make decisions on generating optimal explain plan.  The information about the statistics can be obtained by using the “Help Statistics” command however the displayed data cannot be used to join with other tables.  The article provides a java stored procedure and DDL which extracts the output of

Database Version: Teradata 13.10.0609  13.10.06.09A
Provider Version: ODBC 14.00.00.04
 
CREATE TABLE MY_TABLE AS (
SELECT COL1
 , COL2
 , COL3
FROM DB_TABLE
)WITH DATA
PRIMARY INDEX (COL1);
COLLECT STATISTICS MY_TABLE INDEX(COL1);
 

Hi Experts,
I am bit unclear about the OCES-3 (Optimizer Cost Estimation Subsystem version 3) in the stats area  and BMSMS (bit map set manipulation set) in the Query execution.
Can any one please suggest some pointers for the above topics to deep dive in it..
 
Cheers!
Nishant
 
 

Collecting statistics on data dictionary tables is an excellent way to tune long-running queries that access multi-table dictionary views.  Third party tools often access the data dictionary several times, primarily using the X views.  SAS, for example, accesses DBC views including IndicesX and TablesX for metadata discovery.  Without statistics, the optimizer may do a poor job in building plans for these complex views, some of which are composed of over 200 lines of code.

In an earlier blog posting I discussed the value of collecting statistics against data dictionary tables, and provided some suggestions about how you can use DBQL to determine which tables and which columns to include.  Go back and review that posting.  This posting is a more comprehensive list of DBC statistics that is updated to include those recommended with JDBC.

I have few doubts regarding statistics collection.
 
1. Does the order of columns used in collect stats command important? 
2. Consider the following scenario 
 
Table A joins with Table B based on Col1 and Col2 and Table A joins with Table C based on Col1, Col2 and Col3.

Statistics about tables, columns and indexes are a critical component in producing good query plans.

Good Day Folks...
3 Q's bunched up on stats ...
<> When should you collect Index stats  and column stats for the component columns , both as against just Index stats. How does index stats help and how does stats on component columns that make up the Ix help 

When you start designing a statistics collection process topics include:

Teradata 14.0 offers some very helpful enhancements to the statistics collection process.   This posting discusses a few of the key ones, with an explanation of how these enhancements can be used to streamline your statistics collection process and help your statistics be more effective.

Hi all,

        I would like to know to what level statistics on a column(s) is useful when the column(s) is used in a non-equality constraint (using <> operator)?

I believe the optimizer can derieve the record count that will qualify by using the following expression

Statistical information is vital for the optimizer when it builds query plans.  But collecting statistics can involve time and resources.  By understanding and combining several different statistics gathering techniques, users of Teradata can find the correct balance between good query plans and the time required to ensure adequate statistical information is always available.

The following statistics collection recommendations are intended for sites that are on any of the Teradata 13.10 software release levels.  Most of these recommendations apply to releases earlier than Teradata 13.10, however some may be specific to Teradata 13.10 only.

A new R package for Red Hat Linux has been added to the teradataR 1.0.1 release. This new package provides the same functionality as in the previously released Windows and Mac OS X packages, but is built for Red Hat Linux. This version was built and tested on Red Hat Linux 6.2 32-bit. (The R version for Red Hat Linux is 2.14.1)

Installing this package is the same as any normal R package; just extract it into your R library area, or use the install.packages command with the file path.

Accurately identifying how much a table has grown since statistics on a column were last collected is an important prerequisite for the statistics extrapolation process.  Once table growth has been detected, then stale statistics can be adjusted accordingly.  Extrapolations based on table growth will be be more accurate if statistics are being regularly collected on the system-derived column PARTITON, whether or not a table is partitioned.

Collected statistics are a valuable source of information in Teradata, not only for the optimizer, but also for developers or DBAs. Spotting outdated statistics (which might lead to bad optimizer plans) and implementing/monitoring a stats recollection process are common tasks, which need the current statistics as input.

What is the best way to extract the output of the 'Help statistics DbName.TblName column Partition;", for the purpose of extracting the value of the column 'Number of Rows'. If you have attempted this through SQL, please share your thoughts. What I've already have in place is the following methodology...
Run 'Collect Statistics on DbName.TblName Column (Partition);
Run 'Help Statistics DbName.TblName Column (Partition);' through BTEQ with SideTitles on

Is there a table or view that contains the NumberOfRows, NumberOfColumns, DataBaseName, TableName?

In Oracle its called all_tables and in Netezza its called _v_table_only_storage_stat.

Here is where I looked:
DBC.ColumnStats
DBC.MultiColumnStats
DBC.IndexStats
DBC.TABLESV

Is it a good practice to define a Primary Index on VARCHAR column ? Although I dont see any strange (data type conversions) in explain plans between Join operations on VARCHAR (Primary Index) to CHAR column (Primary Index) since they are from same domain, I just want to understand if this is as efficient as Joins between CHAR to CHAR columns.

Statistics about tables, columns and indexes are a critical component in producing good query plans. This session will examine the differences between random AMP sampling, full statistics collection, and USING SAMPLE collection, and will share details about how each of these 3 approaches works internally.  Statistics histograms, their uses and limitations, will be explored.  You'll get some insights into collecting statistics on multiple columns, on indexes, on and on the system-derived column PARTITION.   Statistics extrapolation will be touched on briefly, and the session will wrap up with some general recommendations for collection statistics.

Hi ,
I am using the following table to collect performance statistics on a query, but getting no rows.

SELECT CollectTimeStamp,
SessionID,
UserName,
DefaultDatabase as DatabaseName,
StartTime,
FirstRespTime,
(FirstRespTime - StartTime) hour to second(4) as FirstRespElapsedTime,
TotalIOCount,
AMPCPUTime+ParserCPUTime TotalCPUTime,
SpoolUsage/(1024*1024*1024) as Spool_GB,
100-(nullifzero(AMPCPUTime/HASHAMP())/(MaxAMPCPUTime)*100) "Skew Factor",
delaytime,
errorcode,
ErrorText,
QueryText
FROM dbc.dbqlogtbl
where
sessionid=123567

Hi,

How can I query for all the Primary Indexes that doesn't have statistics collected? (for all tables of all databases).

Thanks in advance,

Regards

I've found this package while researching a connection between R and teradata. Has anyone seen documentation/reference manual for the package? Since it's a 3rd party package (vs. residing in CRAN), I haven't found a lot of people talking about/using it.

Any experience with or knowledge about the TeradataR would be appreciated!

Thanks!

Please note, we are no longer supporting teradataR since the decision was made for Teradata to focus on our partnership with Revolution for R integration with Teradata.

R  is an open source language for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques, and is highly extensible. This free package is designed to allow users of R to interact with a Teradata database.  Users can use many statistical functions directly against the Teradata system without having to extract the data into memory.

Collecting full statistics involves scanning the base table and performing a sort to compute the number of occurrences for each distinct value. For some users, the time and resources required to adequately collect statistics and keep them up-to-date is a challenge, particularly on large tables.

This article delves into the potential excessive use of Multi-Column (MC) Statistics and a process for identifying those that are not providing benefit and should be removed. This will reduce complexity and save resources on the re-collection of these redundant statistics.

I gave a presentation at the Teradata Partners Conference last week on the fundamentals of collecting statistics, where I touched briefly on the rules behind confidence levels. I’m using this article to go into further depth, and offer you more examples of how these confidence levels come about.

Our Operations and Development teams finished the TD 12 upgrades of our two production systems several weeks back.  We have been wanting TD 12 to get to the extrapolated stats available so we could stop "munging" stats.

Have you ever had what looks like a great query plan turn into life in the slow lane at run time?

I’ve been telling you for years to transform your short all-AMP queries into single-AMP queries, whenever you can. I’ve even given you pointers on using stored procedures, join indexes and smart application design to achieve that goal.