All Forums Database
27 Aug 2008
skew factor

hi what is skew factor????? what is the use of that?

Adeel Chaudhry 773 posts Joined 04/08
27 Aug 2008

Hello,Skewness is the statistical term, which refers to the row distribution on AMPs. If the data is highly skewed, it means some AMPs are having more rows and some very less i.e. data is not properly/evenly distributed. This affects the performance/Teradata's parallelism. The data distribution or skewness can be controlled by choosing indexes.HTH.Regards,Adeel

-- If you are stuck at something .... consider it an opportunity to think anew.

28 Aug 2008

thx how 2 find out skew value?

Roddy 3 posts Joined 07/07
28 Aug 2008

Try something like:SELECT TSIZE.DatabaseName ,TSIZE.TableName ,TDEF.CreateTimeStamp AS Created ,TDEF.LastAlterTimeStamp AS LastAltered ,TDEF.AccessCount ,TDEF.LastAccessTimeStamp AS LastAccess ,SUM(TSIZE.CurrentPerm) AS CurrentPerm ,SUM(TSIZE.PeakPerm) AS PeakPerm, (100 - (AVG(TSIZE.CurrentPerm)/MAX(TSIZE.CurrentPerm)*100)) AS SkewFactorFROM DBC.TableSize TSIZEJOIN DBC.Tables TDEF ON TSIZE.DatabaseName = TDEF.DatabaseName AND TSIZE.TableName = TDEF.TableNameGROUP BY 1,2,3,4,5,6;Lower the skew factor the better - although if you have only a very few rows on a table you are going to have a skewed table no matter what!

Adeel Chaudhry 773 posts Joined 04/08
28 Aug 2008

In Teradata Administrator you can simply right-click the table and select "Space Summary".Regards,Adeel

-- If you are stuck at something .... consider it an opportunity to think anew.

prahladk 4 posts Joined 08/09
26 Aug 2009

What is a acceptable Skew Factor ? Say if one of my table is having skew factor of 16. Is it acceptable?--Thanks in advance.

Bommireddy 4 posts Joined 11/10
23 Nov 2010

how to check the skew factor?
suppose i use Primary index that situation how to distributes the row?
suppose i use unique Primary index that situation how to distributes the row?
normally if we u PI then data will distribute evenly thats not an issue
if we use UPI then how we have to distributes the records evenly

Bommireddy 4 posts Joined 11/10
23 Nov 2010

Hi,
how to distributes the row in mload if we use UPI and how in fload if we use UPI and how if we use NUPI
and finally i have to distributes the row evenly to all AMPs for best skew factor,
any one which best way to distribute the row evenly in mload and fload when we use UPI and NUPI

sivaprasad 1 post Joined 02/11
27 Feb 2011

Skew facter is tell the distribution of rows
if uniformally distribution i.e skew is the zero
if the skew factor is reverse of parllel efficence

Surendhar Reddy 7 posts Joined 11/11
04 Nov 2014

Hi,
Below is the query which is usefull to find out skewfactor.

SELECT
TABLENAME,
SUM(CURRENTPERM) /(1024*1024) AS CURRENTPERM,
(100 - (AVG(CURRENTPERM)/MAX(CURRENTPERM)*100)) AS SKEWFACTOR
FROM
DBC.TABLESIZE
WHERE DATABASENAME= <DATABASENAME>
AND
TABLENAME =<TABLENAME>
GROUP BY 1;

 

Surendhar Reddy 7 posts Joined 11/11
04 Nov 2014

Hi,
The data distribution of table among AMPs is called Skew Factor.....

Generally For Non-Unique PI we get duplicate values so the
more duplicate vales we get more the data have same rowhash
so all the same data will come to same amp, it makes data
distribution inequality,

One amp will store more data and other amp stores less
amount of data, when we are accessing full table, The amp
which is having more data will take longer time and makes
other amps waiting which leads processing wastage
A skew factor of 0 indicates that the data is perfectly
distributed across all the AMPS
In this situation (unequal distribution of data)we get Skew
Factor High........
For this type of tables we should avoid full table scans
Skew factor is distribution of rows of a table among the
available no.of AMP's.
If your table has a chance of using unique primary index,it
is always better to use UPI which ensures the skew factor
around 0%.
If there is no chance of having unique values column in a
table choose a column as PI(primary index) which has less
duplicate values which inturn results in less skew factor.
That is the data will be distributed almost(not exactly
equal percentage) equally to all AMP's.....

ex:
AMP0 AMP1
10000(10%) 9000000(90%)

in this situation skew factor is very high 90%

tejasopalli 3 posts Joined 06/15
08 Oct 2015

could any one please tell how we choose our primary indexes
 

teja sopalli

ODESSA 3 posts Joined 05/16
17 May 2016

Hello,
if the table and views contains the same values in all record in 3 out of 7 columns;  does it impact the performance?
I was told that having all values the same does tend to lead to skewing. Is that correct statment? Does it make sense to remove those 3 columns with the same values for performance and space reazon?
 
thank you.  

dnoeth 4628 posts Joined 11/04
18 May 2016

The table will only be skewed if the Primary Index is based on those columns.
Of course it's strange when there's only a single value in a column, what's the business meaning?
If you drop the column you can add it as a literal to the view-definition. But if there might be other values in future, better apply COMPRESS 'thisvalue' and it's using almost no space.

Dieter

ODESSA 3 posts Joined 05/16
18 May 2016

Thank you Dieter for your reply!
the columns are - effective_dt, expiration-dt, load_ts and the values in effective_dt and load_ts are identical in all rows. The expiration_dt has the same value in all rows as well. And there is not much of a business value at all. It was design that way before. The question is will it improve performance if those columns are dropped. The entire table is only 1.8 gb. So maybe just leave it alone? 
efct_dt       Expr_dt         load_ts
5/16/2016  12/21/9999   5/16/2016

dnoeth 4628 posts Joined 11/04
18 May 2016

The table will be smaller when you drop those columns (or COMPRESS them), at least a Full Table Scan will be faster.

Dieter

ODESSA 3 posts Joined 05/16
18 May 2016

Thank you Dieter 

kamlesh114 7 posts Joined 05/15
04 Aug 2016

A NUPI column with high number of NULL result in high skew factor.
if yes then  why ? as NULL cannot be compared with each other.

LUCAS 56 posts Joined 06/09
04 Aug 2016

This is because of the location of all NULLs on a same (and single) AMP, according to the hash process
Pierre

LUCAS 56 posts Joined 06/09
04 Aug 2016

BTW,
i just notice a case where a small table is displayed with a high skew by TD Administartor (15.00):
Space by AMP gives 8 lines with almost the whole of 1.3 M lines on one AMP, although a query shows a good distribution on all AMPs (60).
after copy on a new temp table TD Administrator display is correct, i mean identical to the query result: how to explain that ?
It looks like TD Admin had lost information on the first table.
Pierre

kamlesh114 7 posts Joined 05/15
04 Aug 2016

Thenk you Pierre..
It happens some time when our database behave unexpectedly..
 
Even i checked practically and found that null values create same hash code ..so more nulls can cause high percentage of skew  on table.
 
 

You must sign in to leave a comment.