All Forums Database
anandc 11 posts Joined 02/13
27 Feb 2013
Difference between Create table Statements

Hi All
I want to understand if there will be a difference in performace or on any front between
Any inputs will be of a great help. I am dealing with huge fact tables and many volatile tables to be created in a stored procedure.
 
1. Creating a Volatile table / permanent table first and then inserting into it
Ex:
STEP1:
CREATE VOLATILE TABLE #TABLE1,
NO FALLBACK, NO JOURNAL, NO LOG
(COLUMN1 DEF)
PRIMARY INDEX (COLUMN1)
ON COMMIT PRESERVE ROWS;
 
STEP2:
INSERT INTO #TABLE1 (
SELECT COLUMN1 FROM TABLE2
);
 
2. Creating a table on the fly using a SELECT statement results
CREATE VOLATILE TABLE #TABLE1,
NO FALLBACK, NO JOURNAL, NO LOG
AS (
SELECT COLUMN1 FROM TABLE2
) WITH DATA
PRIMARY INDEX (COLUMN1)
ON COMMIT PRESERVE ROWS;
 
Thanks !
AnandC
 

dnoeth 4628 posts Joined 11/04
01 Mar 2013

Hi AnandC,
there should be no difference in speed.
Dieter

Dieter

KS42982 137 posts Joined 12/12
01 Mar 2013

Hi Dieter,
I have experienced (lot of times) that if I follow the method 1 as AnandC mentioned then data gets inserted so fast than if I follow the method 2. I always tried with permanent tables though. I just tried with permanent table before writing this here, but still method 1 worked much faster than 2. I observed this especially with huge data.
Do you know how it could possible or something that I am missing ?
Thanks.

dnoeth 4628 posts Joined 11/04
01 Mar 2013

Did you check the actual resource usage in DBQL?
Whenever i did it was the same CPU/IOs for both versions.
Of course, there might be differences, but in most cases it could be explained by SET vs. MULTISET or different PIs because they forgot about it.
 
If you look at Explain, both are quite similar, but for permanentt tables there's one significant difference:
CREATE TABLE AS SELECT first merges the data and then creates the table header, so it's only visible when it's finished. That's why i prefer #1, you can see the table grow in dbc.TablesSizeV (or do a COUNT(*) with ACCESS LOCK) and you get better control over the DDL.
I usually prefer to do a CREATE TABLE AS SELECT WITH NO DATA, followed by a SHOW TABLE. Then i modify the DDL to my needs and use this.
I mainly do #2 for SPs with Dynamic SQL, when i really don't know about the definition in advance. 
 
 
And if there actually was a more efficient way to do the insert in one of the cases why use a slower implementation in the other case?
Dieter 

 

Dieter

anandc 11 posts Joined 02/13
01 Mar 2013

Thanks very much for the infromation Dieter..
As you mentioned, I am going to use the CREATE tables in huge stored procedures and there are going to be atleast 50 volatile tables created and dropped when the session is ended. More over they are going to hold and process billions of data daily
I am going to choose OPTION # 2 which i was more inclining towards
Thanks !
Anand

KS42982 137 posts Joined 12/12
01 Mar 2013

I am afraid to say that I do not get almost similar explain in both the cases.
Here is the example. I am trying to insert data from table 1 to table 2. Table 1 is a multiset table. There are around 90 million records in table 1.
Please take a look at the explain CREATE TABLE (SELECT ..). Please check the step 4.
 1) First, we lock a distinct DATABASE1."pseudo table" for read on a
     RowHash to prevent global deadlock for DATABASE1.TABLE1.
  2) Next, we lock DATABASE1.TABLE2 for exclusive use, and we
     lock DATABASE1.TABLE1 for read.
  3) We create the table header.
  4) We do an all-AMPs RETRIEVE step from DATABASE1.TABLE1 by way
     of an all-rows scan with no residual conditions into Spool 1
     (all_amps), which is redistributed by the hash code of (
     DATABASE1.TABLE1.COLUMN1) to all AMPs.  Then we do a SORT to
     order Spool 1 by row hash.  The input table will not be cached in
     memory, but it is eligible for synchronized scanning.  The result
     spool file will not be cached in memory.  The size of Spool 1 is
     estimated with high confidence to be 49,607,857 rows (
     4,563,922,844 bytes).  The estimated time for this step is 2
     minutes and 6 seconds.
  5) We do an all-AMPs MERGE into DATABASE1.TABLE2 from Spool
     1 (Last Use).  The size is estimated with high confidence to be
     49,607,857 rows.  The estimated time for this step is 1 second.
  6) We lock a distinct DBC."pseudo table" for read on a RowHash for
     deadlock prevention, we lock a distinct DBC."pseudo table" for
     write on a RowHash for deadlock prevention, we lock a distinct
     DBC."pseudo table" for write on a RowHash for deadlock prevention,
     and we lock a distinct DBC."pseudo table" for write on a RowHash
     for deadlock prevention.
  7) We lock DBC.Indexes for write on a RowHash, we lock DBC.DBase for
     read on a RowHash, we lock DBC.TVFields for write on a RowHash, we
     lock DBC.TVM for write on a RowHash, and we lock DBC.AccessRights
     for write on a RowHash.
  8) We execute the following steps in parallel.
       1) We do a single-AMP ABORT test from DBC.DBase by way of the
          unique primary index.
       2) We do a single-AMP ABORT test from DBC.TVM by way of the
          unique primary index.
       3) We do an INSERT into DBC.Indexes (no lock required).
       4) We do an INSERT into DBC.TVFields (no lock required).
       5) We do an INSERT into DBC.TVFields (no lock required).
       6) We do an INSERT into DBC.TVFields (no lock required).
       7) We do an INSERT into DBC.TVFields (no lock required).
       8) We do an INSERT into DBC.TVFields (no lock required).
       9) We do an INSERT into DBC.TVFields (no lock required).
      10) We do an INSERT into DBC.TVFields (no lock required).
      11) We do an INSERT into DBC.TVFields (no lock required).
      12) We do an INSERT into DBC.TVFields (no lock required).
      13) We do an INSERT into DBC.TVFields (no lock required).
      14) We do an INSERT into DBC.TVFields (no lock required).
      15) We do an INSERT into DBC.TVFields (no lock required).
      16) We do an INSERT into DBC.TVFields (no lock required).
      17) We do an INSERT into DBC.TVFields (no lock required).
      18) We do an INSERT into DBC.TVFields (no lock required).
      19) We do an INSERT into DBC.TVFields (no lock required).
      20) We do an INSERT into DBC.TVFields (no lock required).
      21) We do an INSERT into DBC.TVFields (no lock required).
      22) We do an INSERT into DBC.TVFields (no lock required).
      23) We do an INSERT into DBC.TVFields (no lock required).
      24) We do an INSERT into DBC.TVFields (no lock required).
      25) We do an INSERT into DBC.TVFields (no lock required).
      26) We do an INSERT into DBC.TVFields (no lock required).
      27) We do an INSERT into DBC.TVFields (no lock required).
      28) We do an INSERT into DBC.TVFields (no lock required).
      29) We do an INSERT into DBC.TVFields (no lock required).
      30) We do an INSERT into DBC.TVFields (no lock required).
      31) We do an INSERT into DBC.TVFields (no lock required).
      32) We do an INSERT into DBC.TVFields (no lock required).
      33) We do an INSERT into DBC.TVM (no lock required).
      34) We INSERT default rights to DBC.AccessRights for
          DATABASE1.TABLE2.
  9) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> No rows are returned to the user as the result of statement 1.

Now, tTake a look at the exlain CREATE TABLE... INSERT INTO...
  1) First, we lock a distinct DATABASE1."pseudo table" for write on a
     RowHash to prevent global deadlock for DATABASE1.TABLE2.
  2) Next, we lock a distinct DATABASE1."pseudo table" for read on a
     RowHash to prevent global deadlock for DATABASE1.TABLE1.
  3) We lock DATABASE1.TABLE2 for write, and we lock
     DATABASE1.TABLE1 for read.
  4) We do an all-AMPs MERGE into DATABASE1.TABLE2 from
     DATABASE1.TABLE1.  The size is estimated with no confidence
     to be 9,361,440 rows.  The estimated time for this step is 1
     second.
  5) We spoil the parser's dictionary cache for the table.
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> No rows are returned to the user as the result of statement 1.

Adeel Chaudhry 773 posts Joined 04/08
02 Mar 2013

The only difference is that .... in option 1 you have a leverage to tweak/select (correct) index.
 
I have seen scenarios where option 2 takes hours to process data .... and if you go with option 1 and correct selection of index .... it will just take a matter of seconds.
 
The latest occurance of the same happened at one of my clients yesterday. :)
 
HTH!

-- If you are stuck at something .... consider it an opportunity to think anew.

dnoeth 4628 posts Joined 11/04
02 Mar 2013

Adeel is right,
check the PIs of both table, they are different, in the second case it was exactly the same as the source table :-)
A wrong PI or default SET instead of MULTISET are the common reasons for bad performance.
Dieter

Dieter

KS42982 137 posts Joined 12/12
04 Mar 2013

That is true the PI of second case is the same as the table as I took the definition of table1 to create table2.

However, I thought in the option 1 when you do CREATE TABLE AS SELECT, the new table would get the exact same PI as the table you are creating from. So in both the options, newly created table should have the same PI. But I guess, that is not the case with option 1 and now it makes sense why option 1 is taking longer in some cases.

Thank you both for the clarification.

dnoeth 4628 posts Joined 11/04
04 Mar 2013

Which PI should the optimizer choose when you write a multi-table SELECT?

When you do a CREATE TABLE AS SELECT you will loose all index info (probably resulting in a NUPI on the first column) and a column's NOT NULL definition, plus SET/MULTISET is reset to the session default (SET for Teradata, MULTISET for ANSI sesssions).

Dieter

Dieter

You must sign in to leave a comment.