All Forums Database
StevenSchmid 33 posts Joined 07/11
14 Aug 2014
Determine new table skew for a different PI

Hi
I was wondering if there was a way to calculate the skew of an existing table with a different choice of PI.  I am aware of the hashing functions, such that the following query will show the distribution of rows across the amps based on the new PI, however with a a system with hundreds of AMPs, it would be nice to determine the skew factor value:
SELECT 
  hashamp(hashbucket(hashrow( new_PI_column_list ))) as ampnum
  ,count(*)
from <database>.<tablename>
group by 1
order by 2
Cheers
Steven

Steven Schmid Teradata DBA Canberra, Australia
Raja_KT 1246 posts Joined 07/09
15 Aug 2014

I am not aware of any dictionary table. However, if it is me, I will write a script which will loop through a table, reading fields and (also composite if need be) and then redirect the output to a file preferably and not table. In this way, an automation script can read for all databases and tables required and provide outputs for all fields or composite  fields if required.

Raja K Thaw
My wiki: http://en.wikipedia.org/wiki/User:Kt_raj1
Street Children suffer not by their fault. We can help them if we want.

dnoeth 4628 posts Joined 11/04
15 Aug 2014

Hi Steven,
how do you define the skew factor?
I use this for calculating the percent deviation from average:

SELECT
   HASHAMP(HASHBUCKET(HASHROW(col))) AS vproc,
   COUNT(*) AS cnt,
   100 * (cnt - AVG(cnt) OVER ()) / AVG(cnt) OVER () (DEC(8,2)) AS deviation
FROM tab
GROUP BY 1

And based on the count per AMP you can do the skew calclation:

SELECT
   SUM(cnt) AS RowCount 
   ,MAX(SkewedAMP) AS SkewedAMP
   -- skew factor, 1 = even distribution, 1.1 = max AMP needs 10% more space than the average AMP
   ,MAX(cnt) / NULLIF(AVG(cnt),0) (DEC(5,2)) AS SkewFactor
   -- skew factor, between 0 and 99.  Same calculation as WinDDI/ TD Administrator
   ,(100 - (AVG(cnt) / NULLIF(MAX(cnt),0) * 100)) (DEC(3,0)) AS SkewFactor_WINDDI
FROM
 (
   SELECT
      HASHAMP(HASHBUCKET(HASHROW(col))) AS vproc,
      COUNT(*) AS cnt,
      100 * (cnt - AVG(cnt) OVER ()) / AVG(cnt) OVER () (DEC(8,2)) AS deviation,
      CASE WHEN cnt = MAX(cnt) OVER () THEN vproc END AS SkewedAMP 
   FROM tab
   GROUP BY 1
 ) AS dt

And for big tables you might better use a SAMPLE of a few percent instead of aggregating all rows

Dieter

VandeBergB 182 posts Joined 09/06
15 Aug 2014

and this works too...:)
select
sum(tallyset.rowtally)
,min(tallyset.rowtally)
,max(tallyset.rowtally)
,avg(tallyset.rowtally)
,100 - (avg(tallyset.rowtally)/max(tallyset.rowtally) *100) as skewfactor
from
(select 
            hashamp(hashbucket(hashrow(<candidate columns>))) as hashedamp
            ,count(*) as rowtally
from 
            <dbname.tablename>
group by 1) as tallyset

Some drink from the fountain of knowledge, others just gargle.

krishaneesh 140 posts Joined 04/13
15 Aug 2014

Here is a different approach in case this helps you. 
For a given PI of a table the skew factor is based on the count of the values of the records across the amps. Just determine it based on the unique value.  SEL PI COLUMNS, count(*) from <databasename>.<tablename> group by PI columns. usually if the count(*) is not distributed evenly then the table is skewed i.e., if the count of a particular value is in the order of some thousands and the minimum of another PI value is in the order of few, then the table is heavily skewed.
Take the new PI columns you wanted to check for, and check the distribution as explained above. if it is even distribution of the counts across the various values, then you can consider it for changing it to  the new PI.

StevenSchmid 33 posts Joined 07/11
17 Aug 2014

Thanks everybody for your replies.
Steven

Steven Schmid
Teradata DBA
Canberra, Australia

teradatauser2 236 posts Joined 04/12
10 Jun 2015

Hi Dieter,
using your 2nd query above, how much of skewfactor and SkewFactor_WINDDI is accpetable ? i have it for a table as 1.50 and 33.
Thanks !
Samir

You must sign in to leave a comment.