All Forums Tools
aarsh.dave 24 posts Joined 11/12
26 Dec 2013
When are secondary indexes required?

Hi All,
I understand that a Secondary Index is a 2-AMP or an all-AMP operation, and also the subtable that is created and its contents.
However, I am not sure in what scenarios should a secondary index (unique or non-unique) be defined.
I would really appreciate some help here.
 
Thanks,
Aarsh

VandeBergB 182 posts Joined 09/06
26 Dec 2013

greetings,
Secondary indexes are typically defined to improve performance with regard to residual conditions in the where clause.  There are multiple ways to improve performance of these conditions, including but not limited to row level paritioning, secondary indexes etc...
Secondary indexes will ONLY be used by the opitmizer if they are very selective, nearly (>95%) selective.  You can find out if the SI is being used by examining the explain plan of the query (ies) that you are tuning.  The first secondary index created on a table will be identified as "index#4", with each subsequent index as a higher mutliple of 4.
One of the more common uses for a Unique Secondary Index arises with the need to maintain uniqueness on a column or columns that are NOT the primary index.  The primary index is NOT a primary key.  The key function of the PI is to define the most common access path while maintaining acceptable data skew, NOT maintaining uniqueness.
cheers!
 

Some drink from the fountain of knowledge, others just gargle.

Raja_KT 1246 posts Joined 07/09
26 Dec 2013

There may be cases where queries may not use PI. Then SI comes into the picture to enhance performance and chance of avoiding FTS.Value ordered NUSI is recommended for range queries.

They can be created and dropped anytime.

 

Understanding the business requirement and design, drives us to create SIs.

 

Cheers,

Raja

Raja K Thaw
My wiki: http://en.wikipedia.org/wiki/User:Kt_raj1
Street Children suffer not by their fault. We can help them if we want.

M.Saeed Khurram 544 posts Joined 09/12
27 Dec 2013

Hi Arash,
 
Secondary indexes are mostly used to improve acces on the queries that use a non PI column in search conditions.
Following are some common uses of SI:

  1. If a non PI column is being used in where clause often, define SI on it
  2. you can use USI to enforce uniqueness in a PPI table where partition columns are not part of PI
  3. You can define NUSI in a PPI table to make a PI access single amp.

 

Khurram

Santanu84 122 posts Joined 04/13
01 Jan 2014

Hi Saeed
I am just curious to know more about your last comment (may be what I have interpreted is wrong).
"NUSI in a PPI table to make a PI access single amp" - would you kindly elaborate more.
As per my understanding, if it is a NUPI table with PPI and we run SQL only with PI column equality check in WHERE predicate it will end up in 1 AMP operation but will scan all partitions. Can NUSI improve this?
 
Thanks
Santanu

M.Saeed Khurram 544 posts Joined 09/12
01 Jan 2014

Hi San,
Yes, PI will lead you to single AMP but will probe all the partitions. Thats why it is recomended to use NUSI consisting of same solumns as NUPI on top of PPI. and when this NUSI is access via where clause it will avoid scanning all the partitions and will lead you directly to the desired row.
Some considerations for NUSI over PPI are:

  1. Eliminate partition probing with PI access
  2. uses row hash locks
  3. 1 amp operation
  4. Can be used with NUPI or UPI
  5. Must be in Where condition
  6. NUSI single amp operation is only supported on PPI
  7. You can use multiload to load table

Please let me know if you need further guidance :)
 

Khurram

Santanu84 122 posts Joined 04/13
01 Jan 2014

Hi Saeed
Thanks for your quick reply. But when I am trying the same in my TD DB I am getting a different result. I am not able to understand why is it different than the desired result.
 

CREATE MULTISET TABLE SCPLN_W.SAMPLE1, NO FALLBACK

(

TRANS_ID INTEGER,

REGION_CD INTEGER,

TRANSACTION_DT DATE

)

PRIMARY INDEX(TRANS_ID)

PARTITION BY RANGE_N(TRANSACTION_DT BETWEEN  DATE '2013-01-01' AND DATE '2013-12-31' EACH INTERVAL '1' DAY, NO RANGE)

;

 

CREATE INDEX(TRANS_ID) ON SCPLN_W.SAMPLE1;

 

COLLECT STATS ON SCPLN_W.SAMPLE1 INDEX(TRANS_ID);

 

SELECT *

FROM SCPLN_W.SAMPLE1

WHERE TRANS_ID = 1001

;

 

Explain SELECT *

FROM SCPLN_W.SAMPLE1

WHERE TRANS_ID = 1001

;

 

  1) First, we do a single-AMP RETRIEVE step from all partitions of

     SCPLN_W.SAMPLE1 by way of the primary index

     "SCPLN_W.SAMPLE1.TRANS_ID = 1001" with a residual condition of (

     "SCPLN_W.SAMPLE1.TRANS_ID = 1001") into Spool 1 (one-amp), which

     is built locally on that AMP.  The size of Spool 1 is estimated

     with high confidence to be 1 row (33 bytes).  The estimated time

     for this step is 0.00 seconds. 

  -> The contents of Spool 1 are sent back to the user as the result of

     statement 1.  The total estimated time is 0.00 seconds. 

 

Please let me konw your response.

 

Thanks

Santanu

 

M.Saeed Khurram 544 posts Joined 09/12
01 Jan 2014

San,
Can you please confirm how much data do you have in this table?
The optimizer will always go for the most optimal plan, if it finds that scanning all partitions is more efficient than using NUSI, it will go for PI access scanning all partitions.
The final decision is made by the optimizer :)
 
Can you please share the demographics of this table?
 
 

Khurram

Santanu84 122 posts Joined 04/13
01 Jan 2014

Hi Saeed
Here is the detail.
1. There are 12499 data in the table.
2. The typical rows / value for TRANSACTION_DT is 3000
3. Even though it is defined as MULTISET NUPI but the TRANS_ID field contains all unique value
Do you need any other demographic information?
I think you are right, optimizer is deciding to go for 1-AMP all partition scan for this data volume, if not any other reason behind this.
Thanks
Santanu

VandeBergB 182 posts Joined 09/06
01 Jan 2014

Saeed,
You've partitioned the table on Transaction_Dt, but the query is asking for Trans_id.  Regardless of the demographics, it will never give you DPE with that configuration.
As a test, repartition the table by Trans_id.  You've only got 12499 rows according to your most recent post, so you could partition the table for each '1' Trans_id.  If you run the same query you will see the explain plan pull from one partition.
If you want to pull the transactions by date, you'll need add the Transaction_Dt column in the partitioning.
 

Some drink from the fountain of knowledge, others just gargle.

Santanu84 122 posts Joined 04/13
01 Jan 2014

Hi VandeBergB
Yes you are absolutely right. But my query was actually initiated from 2 specific comments in the above posts.
1. You can define NUSI in a PPI table to make a PI access single amp.
2. NUSI single amp operation is only supported on PPI
Now, when I am creating a PPI table with NUPI and defining a NUSI on the PI column and doing COLLECT STATS, still my SQL Explain is showing 1-AMP all Partition scan instead of single AMP single Partition PI access.
I was wondering why am I not getting the expected result. My logical understading is because of low data volume may be the optimizer is finding 1-AMP all partition scan as the best suitable way. But I am not sure.
Thanks
Santanu

M.Saeed Khurram 544 posts Joined 09/12
01 Jan 2014

Hi VandeBergB,
The case you are refering is the best scenario of partition and this is the most optimal case in PPI. Secondary indexes are mostly recomended when the partition by column is not part of PI, and you need to get performance. 
 

Khurram

Santanu84 122 posts Joined 04/13
01 Jan 2014

Hi Guys,
I found something. I guess this NUSI is worthy for 1-AMP PI selection in a PPI table in some specific scenarios where optimizer decides to choose the NUSI index.
This time I created the below table.

CREATE MULTISET TABLE SCPLN_W.SAMPLE1 ,NO FALLBACK ,

     NO BEFORE JOURNAL,

     NO AFTER JOURNAL,

     CHECKSUM = DEFAULT,

     DEFAULT MERGEBLOCKRATIO

     (

      CompanyID INTEGER,

      RecordID INTEGER,

      TagName VARCHAR(255) CHARACTER SET UNICODE NOT CASESPECIFIC,

      OldValue VARCHAR(8000) CHARACTER SET UNICODE NOT CASESPECIFIC,

      NewValue VARCHAR(8000) CHARACTER SET UNICODE NOT CASESPECIFIC,

      LastModBy INTEGER,

      LastModDate TIMESTAMP(6),

      "SOURCE" VARCHAR(50) CHARACTER SET UNICODE NOT CASESPECIFIC,

      ETL_ACTION VARCHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,

      ETL_RUN_ID DECIMAL(38,0),

      ETL_MAPPING_NAME VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,

      ETL_INSERTDATE TIMESTAMP(0),

      ETL_UPDATEDATE TIMESTAMP(0)

     )

PRIMARY INDEX ( RecordID )

PARTITION BY 

RANGE_N(LastModDate BETWEEN TIMESTAMP '2005-01-01 00:00:00.000000' AND TIMESTAMP '2013-12-31 23:59:59.000000' EACH INTERVAL '1' MONTH, NO RANGE)

;

 

Now I loaded 35,00,000 data in this table.

 

The PI RecordID = 562227 had 1450 records. So when I ran the below SQL I got the explain with 1-AMP all partition scan.

 

Explain SELECT *

FROM SCPLN_W.SAMPLE1

WHERE RECORDID = 562227

;

 

  1) First, we do a single-AMP RETRIEVE step from all partitions of

     SCPLN_W.SAMPLE1 by way of the primary index

     "SCPLN_W.SAMPLE1.RecordID = 562227" with a residual condition of (

     "SCPLN_W.SAMPLE1.RecordID = 562227") into Spool 1 (one-amp), which

     is built locally on that AMP.  The size of Spool 1 is estimated

     with low confidence to be 2 rows (11,182 bytes).  The estimated

     time for this step is 0.01 seconds. 

  -> The contents of Spool 1 are sent back to the user as the result of

     statement 1.  The total estimated time is 0.01 seconds. 

 

In my next SQL RecordID = 569567 had only 1 record. Still I got 1-AMP all partition scan.

 

Explain SELECT *

FROM SCPLN_W.SAMPLE1

WHERE RECORDID = 569567

;

 

  1) First, we do a single-AMP RETRIEVE step from all partitions of

     SCPLN_W.SAMPLE1 by way of the primary index

     "SCPLN_W.SAMPLE1.RecordID = 569567" with a residual condition of (

     "SCPLN_W.SAMPLE1.RecordID = 569567") into Spool 1 (one-amp), which

     is built locally on that AMP.  The size of Spool 1 is estimated

     with low confidence to be 2 rows (11,182 bytes).  The estimated

     time for this step is 0.01 seconds. 

  -> The contents of Spool 1 are sent back to the user as the result of

     statement 1.  The total estimated time is 0.01 seconds. 

 

Next I created NUSI on NUPI column.

 

CREATE INDEX(RECORDID) ON SCPLN_W.SAMPLE1;

 

COLLECT STATS INDEX(RECORDID) ON SCPLN_W.SAMPLE1;

 

Now my 2nd SQL gave a different explain

 

Explain SELECT *

FROM SCPLN_W.SAMPLE1

WHERE RECORDID = 569567

;

 

  1) First, we do a single-AMP RETRIEVE step from SCPLN_W.SAMPLE1 by

     way of index # 4 "SCPLN_W.SAMPLE1.RecordID = 569567" with no

     residual conditions into Spool 1 (all_amps), which is built

     locally on that AMP.  The size of Spool 1 is estimated with high

     confidence to be 4 rows (22,364 bytes).  The estimated time for

     this step is 0.00 seconds. 

  -> The contents of Spool 1 are sent back to the user as the result of

     statement 1.  The total estimated time is 0.00 seconds. 

 

So I guess 2nd time optimizer thought it is good to have NUSI access.

 

Well, the bottom line is we can create NUSI on NUPI column in a PPI table to make it 1-AMP retrieve. But it may be very specific scenario depending on optimizer's decision.

 

Thanks

Santanu

 

M.Saeed Khurram 544 posts Joined 09/12
01 Jan 2014

Hi San,
I must appreciate the effort you have put to simulate the scenario.
Once again I would say, the final decision is of optmizer to use which index at which scenario. for example even if optmizer feels the FTS is better it can even go for FTS instead of using an index.
 

Khurram

VandeBergB 182 posts Joined 09/06
02 Jan 2014

Santanu,
In your last chunk of DDL, you partitioned by LastModDate.  You aren't partitioned by RecordId.  The PI is hashed to distribute the data across the amps, so asking for a query with a RecordID of 569567, without any residual conditions will, by definition,  get you a one amp operation.
The partitioning doesn't even come into play with your query because you haven't asked for a LastModDate in your intial query. 
All of your queries are single amp queries, you've just shifted the amp that is doing the work.  The last query that scans index#4 is using the SI subtable rather than the base table to generate your result.
One thing that stands out across all of your explain plans is that it doesn't look like you've collected statistics.  Your explain plans all show low confidence, which can be remediated by collecting stats on your PI and any other columns commonly used in joins or search arguments.
 
Thanks,
Blaine

Some drink from the fountain of knowledge, others just gargle.

M.Saeed Khurram 544 posts Joined 09/12
02 Jan 2014

Bliane,
 
I didn't get what you want to conclude from this, but the discussion whther to use SI on a PPI or not, and when it is accessed by the optimizer.
 
You can see in all the explain plans there are lines:
in first explain: First, we do a single-AMP RETRIEVE step from all partitions
in second explain: First, we do a single-AMP RETRIEVE step from all partitions
In third explain: First, we do a single-AMP RETRIEVE step from SCPLN_W.SAMPLE1 by
     way of index # 4 
 
In fact we are discussing the usage of SI on a PPI table.  
can you please elaborate your line:
The partitioning doesn't even come into play with your query because you haven't asked for a LastModDate in your intial query. 
even when there is reference to partition in each explain.
 
 
 

Khurram

MaheshJessy 26 posts Joined 12/10
04 Jun 2015

Hi team, I find one DDL from my database which makes me to think about on that. Would you please help me to understand for Secondary index on PARTITION columns? Is it really worthful to create SI on PARTITION COLUMNS? And Table is very expensive and BIG which has 103, 99, 727, 50(record count) records.
 
UNIQUE PRIMARY INDEX ( c1 ,c2 )
PARTITION BY RANGE_N(c2  BETWEEN DATE '2001-01-01' AND DATE '2016-12-31' EACH INTERVAL '1' DAY )
INDEX ( c1 )
INDEX ( c2 )---Is it really required to create SI index on this column?
 
Regards,
Mahesh

dnoeth 4628 posts Joined 11/04
04 Jun 2015

Hi Mahesh,
you're correct, this index is totally useless as 'c2' is already partitioned by day.

Dieter

MaheshJessy 26 posts Joined 12/10
04 Jun 2015

Thank you Dieter for your confirmation:)

22 Feb 2016

HI Santhanu,
Thanks for the example.

Thanks Saeed.  

You must sign in to leave a comment.