All Forums General
nimish_123 15 posts Joined 08/14
06 May 2016
HASHAMP function and column values

Below query will give me data distribution of the table based on the column ProdtCd

Sel HASHAMP(HASHBUCKET(HASHROW(ProdtCd))) AS primary_index,COUNT(*)
from PLN_EXPLORE.SRI_DSI_Plnt_Snpsht_wk_old
GROUP BY 1 ORDER BY 2 DESC;

I would like to check the data distribution based on all the column one by one, to get this manually I have to execute the sql by changing value for HASROW() function.
I have created one temp table which has all the columns name now I am trying to use this temp table to pass the column name.
What is happening here is that instead of prodtcd the hashamp function is reading it as ‘prodtcd’ (like hardcoded value with single quote, check below sql).
Is it possible to read column value without single quote:

Sel HASHAMP(HASHBUCKET(HASHROW('ProdtCd'))) AS primary_index,COUNT(*)
from PLN_EXPLORE.SRI_DSI_Plnt_Snpsht_wk_old
GROUP BY 1 ORDER BY 2 DESC;

ToddAWalter 316 posts Joined 10/11
06 May 2016

SQL does not allow for the substitution of column names in an expression. Given the table with column names, a SQL can be written with string concatenation a which results in one SQL statement per column name. That result can be written to a file and then executed to provide the desired result. 
 
Trying every column and and combination of columns is probably overkill. It is probably better to focus on the few that would also provide value as join columns or direct access columns. If there are no columns that have access value and all that is desired is even distribution, the a NOPI is a more direct solution. 

nimish_123 15 posts Joined 08/14
11 May 2016

ToddAWalter can you please provide some sample script to use NOPI.
I understand from you comment that NOPI will distributed data better but how will i come to know that based on which column data was distributed for NOPI?

ToddAWalter 316 posts Joined 10/11
11 May 2016

See the documentation under create table, it will tell you all about the syntax for NO PRIMARY INDEX (NOPI).
 
With NOPI, column data is not used for distribution, it is a round robin/random algorithm instead.
 
If you are using Create Table As or Insert Select to populate this table, you may also want to read about the HASH BY RANDOM clause if your source data is skewed.

You must sign in to leave a comment.