All Forums Database
Naina 2 posts Joined 05/16
24 May 2016
Copying trillions of data to new table

Hello
 
I have recently started working on teradata. I am stuck at a point. I want to move trillions of rows from one table to other. what will be the best way to do this? creating the structure and inserting the data or something like select * into as in sql which is minimally logged. 
Can someone please suggest.

dnoeth 4628 posts Joined 11/04
25 May 2016

Really trillions? There are not many customers with that large amount of data in a single table.
 
The fastest way will utilize a FastPath Insert/Select:
Target table is empty and has exactly the same definition as the source table.
 
If the DDLs are different, but the PI is the same, you should prefer a Merge over InsSel (it avoids spooling).
If PIs are different you will need a spool with the size of the target table.

Dieter

Naina 2 posts Joined 05/16
25 May 2016

Thanks... yes this is what my seniors are saying here... the size can go upto it.......
so i should get the DLL of the soure table and create a new table and then use insert select.... 
 
Thanks a lot

dnoeth 4628 posts Joined 11/04
25 May 2016

Of course there's an obvious question: WHY do you need to copy trillions of rows?

Dieter

kspongix 4 posts Joined 05/16
26 May 2016

Hi Dieter, 
We have a mirror database concept where data from t1 table is copied into t2 & t3 tables.
All  t1, t2 & t3 tables have same DDL definitionand everytime it is a complete delete and re-load into t2 or t3 ( based on view swap)
one of the Main source tables (t1) have around 93 milllion records.
Problem is, this re-load takes around 6 minutes to complete. 
Is there any way we can improve the copy time ?
we are following this same approach for 350 tables, which are deleted and re-loaded in sequence. It takes close to 30 minutes to perform mirror copy.
Is there any better approach for this?
 
 
 

kspongix 4 posts Joined 05/16
26 May 2016

adding to the above:
All the tables have 5+ secondary indices(NUSI).
 
 

Fred 1096 posts Joined 08/04
27 May 2016

For huge volume, generally you want the target table to be empty and same structure as source, but with no SIs / JIs.
Drop SIs, Delete, Insert/Select, re-Create SIs.
The Delete should be done in Teradata mode with no BT/ET, or else the Delete and subsequent ET / Commit should be issued together as a multi-statement request.
You may also want to copy statistics values from the original table rather than re-collecting.

dnoeth 4628 posts Joined 11/04
27 May 2016

The best way to decrease runtime would be removing some NUSIs.
Why do you have so many NUSIs, this did you check if they're actually used?

Dieter

kspongix 4 posts Joined 05/16
30 May 2016

Yeah. Most of them being used in different reports.
Some are multicolumn NUSI, and most are single column NUSI.
Our purpose is to do mirror copy from OLTP to MART.
Instead of loading entire records, I'm thinking to load only the changed records to MART to reduce copying time.
But even MINUS is of no use, as it takes much time and spool. 
Will temporal be of any use ?
 

Rajudeepak 1 post Joined 11/13
10 Jun 2016

Hello,
Help me with a query to purge data from the below table. This table has 670,235,825,680 rows. I'm trying to delete around 164,154,526,390 rows using cal_dt column.
del from pr_us_inv.str_inv_dly where cal_dt < '2010-01-01';
del from pr_us_inv.str_inv_dly where cal_dt between '2012-01-01' and '2013-12-31';
Both the above queries are consuming much time due to huge table size (33 Tb).

Below is the table structure,

 

show table pr_us_inv.str_inv_dly

 

CREATE MULTISET TABLE pr_us_inv.str_inv_dly ,NO FALLBACK ,

     NO BEFORE JOURNAL,

     NO AFTER JOURNAL,

     CHECKSUM = DEFAULT,

     DEFAULT MERGEBLOCKRATIO

     (

      STR_LOC_ID INTEGER NOT NULL,

      ITEM_ID INTEGER NOT NULL,

      CAL_DT DATE FORMAT 'YYYY-MM-DD' NOT NULL,

      LAST_UPD_TS TIMESTAMP(6) NOT NULL,

      CURNC_TYP_CD SMALLINT NOT NULL COMPRESS 7 ,

      CHG_OH_QTY DECIMAL(17,4) COMPRESS 0.0000 ,

      ORD_OH_USAGE_FLG CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC COMPRESS ('N','Y'),

      FRST_RCVD_DT DATE FORMAT 'YYYY-MM-DD' COMPRESS (DATE '1997-05-04',DATE '0002-11-30'),

      ITEM_VLCTY_CD SMALLINT COMPRESS (1 ,2 ,3 ,4 ,5 ,-1 ),

      ITEM_STAT_CD SMALLINT NOT NULL COMPRESS (300 ,600 ,100 ,400 ,200 ,500 ,-1 ),

      RAW_OH_QTY DECIMAL(9,2) COMPRESS (0.00 ,4.00 ),

      CORD_ALLOC_QTY DECIMAL(9,2) COMPRESS 0.00 ,

      CORD_RSVD_QTY DECIMAL(9,2) COMPRESS 0.00 )

PRIMARY INDEX ( STR_LOC_ID ,ITEM_ID ,CAL_DT )

PARTITION BY RANGE_N(CAL_DT  BETWEEN DATE '2007-01-01' AND DATE '2020-12-31' EACH INTERVAL '1' DAY ,

 NO RANGE, UNKNOWN);
 
Thanks,
Raju

dnoeth 4628 posts Joined 11/04
11 Jun 2016

This should run fast as it's based on the partitioning column. 
Is there a Join Index defined on the table, can you post Explain?

Dieter

Pinderpalsingh 1 post Joined 05/09
13 Jun 2016

yea explain plan will help you to check if its going for whole partition(FastPath Delete) or not. 
look for text something like   "We do an all-AMPs DELETE OF a single partition" if its a whole partition delete.
OR "We do an all-AMPs DELETE FROM a single partition" which is actually a row-by-row delete.

-- PinderpalSingh

kspongix 4 posts Joined 05/16
15 Jun 2016

Raj, Do you really need to partition the date for everyday ?
If possible, try to increase the partition to a month or quarters, then try running the delete SQLs.
Also look at the period of dates i.e. see if you have data prior to 2007 and  if so, what is its volume. If you have considerable volume before 2007, then delete will impact as they will reside in one partition

ToddAWalter 316 posts Joined 10/11
16 Jun 2016

It would be good to see the explain.
 
The number/granularity of partitions does not affect the cost of executing this process (and converting the granularity of the partitioning
would be very costly).

You must sign in to leave a comment.