All Forums Database
sunny.j 90 posts Joined 10/10
09 Nov 2011
High Skewness

Hi

 

Im trying to execute this query ,but it is being punished by the workload every time with high skewness

 

INSERT INTO sample_tbl

SEL

* FROM product_region_1

WHERE (prod_cd,ord_date,prod_name)

NOT IN (SELECT prod_cd,ord_date,prod_name FROM product_region_2)

 

collected stats on  product_region_1 for columns  prod_cd,ord_date,prod_name .

 

but is not executing ,it having high skew .

i had verified the skew on both the tables are 1.63%

any one can suggest me on the same how to avoid skew . Its quite urgent any one please help me

Woody 16 posts Joined 10/09
09 Nov 2011

Can you attach DDL for the two tables?

sunny.j 90 posts Joined 10/10
09 Nov 2011

any one please comment its urgent please help me

sunny.j 90 posts Joined 10/10
09 Nov 2011

CREATE TABLE sample_tbl

(

ord_date DATE FORMAT 'yyyy-mm-dd',

ord_TIME TIME(0),

cust_num INTEGER,

contact VARCHAR(25) CHARACTER SET LATIN NOT CASESPECIFIC,

prod_code VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC,

quantity INTEGER

)PRIMARY INDEX(ord_date,ord_TIME,cust_num,contact);

 

CREATE TABLE product_region_1

(

ord_date DATE FORMAT 'yyyy-mm-dd',

ord_TIME TIME(0),

cust_num INTEGER,

contact VARCHAR(25) CHARACTER SET LATIN NOT CASESPECIFIC,

prod_code VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC,

quantity INTEGER

)PRIMARY INDEX(ord_date,ord_TIME,cust_num,contact,quantity);

 

 

CREATE TABLE product_region_2

(

ord_date DATE FORMAT 'yyyy-mm-dd',

ord_TIME TIME(0),

cust_num INTEGER,

contact VARCHAR(25) CHARACTER SET LATIN NOT CASESPECIFIC,

prod_code VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC,

quantity INTEGER

)PRIMARY INDEX(ord_date,ord_TIME,cust_num,contact);

sunny.j 90 posts Joined 10/10
09 Nov 2011

changing the query

 

INSERT INTO sample_tbl

SEL

* FROM product_region_1

WHERE (prod_code,ord_date,contact)

NOT IN (SELECT prod_code,ord_date,contact FROM product_region_2)

sunny.j 90 posts Joined 10/10
09 Nov 2011

Those are the tables and the query .

sunny.j 90 posts Joined 10/10
09 Nov 2011

@Woddy : Would you please comment

Jigar 70 posts Joined 09/11
09 Nov 2011

DDLs do not seem to have Prod Name. Please post DDLs again.

ALso if you have Contact ,I don't think you require Cust_name in primary index. Try and make primary index same as join columns and change query to below .

Collect stats on product_region_1 A column  (prod_cd,ord_date,prod_name) ;

Collect stats on product_region_2 A column  (prod_cd,ord_date,prod_name) ;

INSERT INTO sample_tbl

SEL A.* FROM product_region_1 A

LEFT OUTER JOIN product_region_2

ON

A.PROD_CD=B.PROD_CD

AND A.ORD_DATE=B.ORD_DATE

AND A.PROD_NAME=B.PROD_NAME

where

B.prod_cd is null

;

 

Also try and just xecute the select clause.If that works fine ,Issue might be with index of Sample_tnl.

 

Jigar 70 posts Joined 09/11
09 Nov 2011

^ you will have to add coalesce if you are expecting nulls for any of the join columns and may be change where clause accordingly.

dilipnair 3 posts Joined 07/05
09 Nov 2011

Sunny,

Try adding the column 'quantity' on the PI of the table u r inserting into - 'sample_tbl'.

The only reason I can think for the high skew is that the rows that you are selecting from 'product_region_1' that are not on 'product-region_2' have varying number of duplicates on (prod_code,ord_date,contact).

The 'product_region_1' table has the column 'quantity' in its PI and thus may not be skewed. But the 'sample_tbl' is missing that in its PI.

Woody 16 posts Joined 10/09
09 Nov 2011

Run something like these to see where the skew is coming from.

 

sel hashamp(hashbucket(hashrow(prod_cd,ord_date,prod_name))), count(*) FROM product_region_2

group by 1 order by 2 desc;

 

sel hashamp(hashbucket(hashrow(ord_date,ord_TIME,cust_num,contact))), count(*) FROM product_region_1

WHERE (prod_cd,ord_date,prod_name)

NOT IN (SELECT prod_cd,ord_date,prod_name FROM product_region_2)

group by 1 order by 2 desc;

 

 

gotuchintu 32 posts Joined 12/05
09 Nov 2011

Can you please attach the explain plan for the same.

It seems that the problem here is happening due to the NOT IN. The skewness seems to be in the spool. In that case I would suggest you to convert the inner query into a temporary table to have all the unique values & then try to join it with the main table. If the temporary table is small enough, then it will go for a product join & your problem will get resolved.

Somnath Roy

kishore8421 1 post Joined 11/11
10 Nov 2011

"INSERT INTO sample_tbl

SEL

* FROM product_region_1

WHERE (prod_cd,ord_date,prod_name)

NOT IN (SELECT prod_cd,ord_date,prod_name FROM product_region_2)"

In your query you are accessing two tables product_region_1
product_region_2

and you are using NOT IN which is not suggestable as it hampers the performance of the query. Try using a Left outer Join with null in where clause as written by JIgar.

If still you have problem with skew. then you have to check with the primary indexes in your tables and also whether data is distributing evenly or not and accrdingly change the PI columns by creating new tables.

itsmeabhi99 6 posts Joined 08/11
10 Nov 2011

Did you get the issue resolved? In my view, the skewness primarily happens due to "bad" data distribution over amps. Here the PI seems to be fine but apparently you might be getting a lot of null values in PI columns. Can you try filtering out these nulls?

Paul Johnson 12 posts Joined 01/10
15 Dec 2011

Nulls in PI columns = very bad!

You must sign in to leave a comment.