All Forums Database
01 Apr 2013
Best way to generate the Sequential numbers : CSUM or IDENTITY columns?

Hi Experts,
My Requirement is to generate the Sequential numbers in Column ..
As It can be done in two ways :
A) define as identity column 
b) CSUM(1,1) 
 
but i got to know from one of my peer that CSUM has some drawbacks like it locks the table. and Identity column has some constraint of upper limit ..  
As i am not sure ,Can any one suggest which would be more good in terms of performance?
Thanks!
Cheers,
Nishant Bhardwaj
 
 
 

dnoeth 4628 posts Joined 11/04
02 Apr 2013

Hi Nishant,
you have to define "sequential" first, do those value have to be:
- really sequential (based on the position in the input file or some other order)
- without any gaps?
- or just unique?
IDENTITY will be be "just unique".
CSUM(1,1) should never be used, because it skewes to a single AMP (replace it with a ROW_NUMBER)
Dieter
 

Dieter

Adeel Chaudhry 773 posts Joined 04/08
03 Apr 2013

Depends on the questions asked by Dieter.
 
Based on that, you can use:
 
IDENTITY
SUM(1) OVER(ROWS UNBOUNDED PRECEDING)
ROW_NUMBER
RANK
....

-- If you are stuck at something .... consider it an opportunity to think anew.

03 Apr 2013

Hi Dieter and Adeel,
My REquirement is to get the Sequential with out any gaps..
does Identity column gives any gaps in between?
which one is prefered for with out gaps?
Cheers
Nishant

dnoeth 4628 posts Joined 11/04
04 Apr 2013

Hi Nishant,
any IDENTITY/system-maintained sequence may result in gaps (due to restarts or rollbacks), but in Teradata it's not one sequence it's multiple parallel sequences, thus there are always gaps.
Without gaps is only possible with ROW_NUMBER plus
- a table maintaining the highest used value used in a transaction, in best case in a SP
- or ROW_NUMBER + (SELECT COALESCE(MAX(value),0) FROM tab)
Dieter

Dieter

d_2013k 1 post Joined 08/11
11 Apr 2013

CSUM(1,1) should never be used, because it skewes to a single AMP
Hi Dieter,
Can you please explain more about this??
Thanks,
Dheem
 

mikesteeves 46 posts Joined 10/11
12 Apr 2013

Hi Dieter,
I've always been told to steer clear of the IDENTITY in the past because it has some limitations in the past.  Such as the gaps in numbers and the limit to the number of values that could be incremented.  Meaning, it would run out of numbers and begin to wrap around and start over, using numbers that may have already been used in the past.  Do you know if these limitations have been corrected by 13.10?  We are considering using this IDENTITY feautre as we just need a unique value for a surragote key and do not care if there are gaps in the sequence.  I also thought there were some ETL limitations possibly when using FASTLOAD or MLOAD to populate a table that has an IDENTITY column.
Thanks,
         Mike

dnoeth 4628 posts Joined 11/04
13 Apr 2013

Hi Dheem,
when you check the QueryLog after running a CSUM(1,1) you'll notice that a single AMP (usually vproc 0) processed all the data, high cpu/io and spool.
For OLAP functions there's a distribution f the data based on PARTTITION and ORDER and for CSUM(1,1) there's only 1 partition and no order.
 
Hi Mike,
when you are running out of numbers it's not a limitation of Teradata's IDENTITY, you just used the wrong datatype.
BIGINT should be enough, otherwise there's DEC(38,0).
And the limitations on load utilities have been removed since a few releases.
Dieter

Dieter

NewAmigo 27 posts Joined 02/14
19 Feb 2014

Hello Nishant,
 
Can i just check what did u end up with? Can you also please pass on the code as i am having difficulties getting it to work in TD12.
 
Cheers
Reddy

Adeel Chaudhry 773 posts Joined 04/08
19 Feb 2014

Reddy, just replied to your post .... please have a look.

-- If you are stuck at something .... consider it an opportunity to think anew.

Somdebroy 6 posts Joined 06/14
26 Jun 2014

Hello,
Based on what Adeel and Dieter mentioned, here is a script I came up with.

/* CREATING SOURCE/STAGE TABLE */
CREATE VOLATILE TABLE tab1
(
Name VARCHAR(10)
)
NO PRIMARY INDEX
ON COMMIT PRESERVE ROWS;
 
/* INSERT DATA INTO SOURCE/STAGE TABLE */
INSERT tab1 VALUES ('a');
INSERT tab1 VALUES ('b');
INSERT tab1 VALUES ('c');
 
/* CREATING TARGET TABLE */
CREATE VOLATILE TABLE tab2
(
ID INTEGER,
Name VARCHAR(10)
)
NO PRIMARY INDEX
ON COMMIT PRESERVE ROWS;
 
/* INSERT DATA INTO TARGET TABLE -- HISTORY LOAD */
INSERT tab2
SELECT
ROW_NUMBER() OVER (PARTITION   BY 0 ORDER BY NAME),
Name
FROM Tab1;
 
SELECT * FROM tab2 ORDER BY 1;
SELECT * FROM tab1 ORDER BY 1;
 
/* NEW RECORD IN SOURCE/STAGE TALBE -- INCREMENTAL LOAD */
INSERT tab1 VALUES ('d');
 
/* INSERT INTO TARGET TABLE -- INCREMENAL LOAD */
INSERT INTO tab2
SELECT ROW_NUMBER() OVER (PARTITION   BY 0 ORDER BY tab1.name) + ID_MX
, tab1.name
FROM tab1
LEFT OUTER JOIN tab2
ON tab1.name = tab2.name
CROSS JOIN (SELECT MAX(tab2.id) as ID_MX from tab2) as MX
WHERE tab2.name IS NULL;
 
SELECT * FROM tab2 ORDER BY 1;
SELECT * FROM tab1 ORDER BY 1;
 
/* A TEST WITH A NEW VALUE */
INSERT tab1 VALUES ('e');
 
SELECT ROW_NUMBER() OVER (PARTITION   BY 0 ORDER BY tab1.name) + ID_MX
, tab1.name
FROM tab1
LEFT OUTER JOIN tab2
ON tab1.name = tab2.name
CROSS JOIN (SELECT MAX(tab2.id) AS ID_MX FROM tab2) AS MX
WHERE tab2.name IS NULL;

 

dnoeth 4628 posts Joined 11/04
26 Jun 2014

There's no need to "PARTITION BY 0", simply remove it.
And you get a more efficient plan when you rewrite the cross join to a Scalar Subquery and the Left Join/IS NULL with a NOT EXISTS:

INSERT INTO tab2
SELECT ROW_NUMBER() OVER (ORDER BY tab1.name) 
   + (SELECT COALESCE(MAX(tab2.id), 0) FROM tab2)
, tab1.name
FROM tab1
WHERE NOT EXISTS
 ( SELECT * FROM tab2 WHERE tab1.name = tab2.name);
 

 

Dieter

sriteradata88 3 posts Joined 02/16
18 Feb 2016

HI
i am trying to find the outstanding amount for column Amount like
date      amount
1                  10
2                  20
3                  30
end date     some_amt
for date 1 i need the total amount till end_date including date 1
and for date 2 , i need total amount till end_date including date 2 and so on.
can you plz help me on that

siddarthy 3 posts Joined 06/15
24 Aug 2016

sriteradata88,
 
Use the below query.
 

SELECT 

    start_date

  , SUM(amt) OVER (ORDER BY start_date

                         ROWS BETWEEN CURRENT ROW  

                          AND UNBOUNDED FOLLOWING 

                  )

FROM databasename.test1

siddarthy 3 posts Joined 06/15
25 Aug 2016

Also do use ASC or DESC with the ORDER BY ;)

You must sign in to leave a comment.