All Forums Database
prakhar 101 posts Joined 05/08
06 Jun 2009
Stats on non index column

Hi ,Having some confusions about how collect stats really worksSuppose i am running this queryselect * from table where column=300Basically stats collected on columns used for selection will never result in 1 AMP or 2 AMP access…that can only be done through index access. My question is what really is use of stats collected on set selection non-index column like abovea)is it solely used for estimation of spool file size used in full table scansI don't think we have stats based on per AMP , so that we can have a 1 AMP operation sort of thing if optimizer knows that this particular value exists on which AMP......Please clarify this thing as how stats on non-index column are useful in query access.....apart from spool size estimationRegards,Prakhar Agarwal

prakhar 101 posts Joined 05/08
08 Jun 2009

Hi ,Can anybody throw some light on this....pls

anadi.bhalla 5 posts Joined 04/09
08 Jun 2009

Stats will help in pointing out the data demographics of the tables involved. This will ensure that the least data is brought in for processingfor example : if there is a join on a column what and there is a qualifier existing on it as well ie where clause to filter data based on the same column, the stats will determine the table on which the stats are applied based on where the maximum data will be eliminated.similarly if a copy of a table is to be created across all amps for joining, stats will help in choosing the table to be replicated based on the data demographics.

prakhar 101 posts Joined 05/08
08 Jun 2009

Thankx for the reply Anadi...But i am not exactly looking for that My question was what really is use of stats collected on set selection non-index column ,even in case of JOIN it will be used to eliminate rows that will participate in JOIN but that is also related to estimation of spool file size.I don't think we have stats based on per AMP , so that we can have a 1 AMP operation sort of thing if optimizer knows that this particular value exists on which AMP......So why do we really need stats on non-index column having predicate....Please revert back.......

robpaller 159 posts Joined 05/09
11 Jun 2009

Prakhar, may I refer you to an article by Carrie Ballinger on the Teradata Developer Network: The reason for collecting single column and multi-column stats where appropriate is to provide the optimizer with the most accurate histogram of the data being used to satisfy a particular query. (This includes join columns, WHERE conditions, etc.) Statistics alone can take a poorly performing query plan and drastically improve the query plan without the need and overhead associated with a NUSI.If you haven't done so already, consider using the following DIAGNOSTIC hint:DIAGNOSTIC HELPSTATS ON FOR SESSION;Then take a few queries that are performing below your expectations and run an EXPLAIN. At the end of the EXPLAIN will be suggestions on which statistics may help the optimizer improve the query plan. Focus your attention on those that should yield a high confidence. Capture the query plan before and after your collect the stats to see what impact was made. It should be noted there is a bug in Teradata 12 (depending on your current patch release) where the recommended stats to be collected are shown even if the stats exist. It should also be noted that you may not need to collect all the recommended stats in order to yield the best possible query plan. Once you collect stats, you also need to maintain them. Stale statistics can be dangerous if the underlying data changes in volume or the demographics of the data shifts drastically.Hope this helps.

prakhar 101 posts Joined 05/08
11 Jun 2009

Hi Rob,Thankx for your reply...As you have said "The reason for collecting single column and multi-column stats where appropriate is to provide the optimizer with the most accurate histogram of the data being used to satisfy a particular query"But ultimately even if correct stats are presented for a query likeselect * from table where column=300Full table scan will happen......Using stats optimizer can find out how many rows are expected to have value as 300 and build estimate spool size...Apart from that is there anything else that helps optimizer in this case...Regards,Prakhar

Fred 1096 posts Joined 08/04
12 Jun 2009

For a simple query, you are correct that stats on non-index columns merely give you a better cardinality estimate (and don't change the query plan). But it may be important to have that more accurate cardinality if you are doing some initial workload management classification based on the estimated query cost.For more complex queries, an improved estimate for a subquery may be important in coming up with the optimal plan for the outer query. Or if the non-indexed column is used as a join predicate, it may be important for the optimizer to know the distribution & skew of values (i.e. the histogram or demographics) to pick a reasonable "join geography" / hash distribution for spool.

You must sign in to leave a comment.