Some of you may be familiar with relational databases other than Teradata and how those other RDBMs utilize indexes.

In Teradata, an “Index” is a physical mechanism that is used to distribute, store, and access data rows. Indexes provide a physical access path to the data and their use can avoid unnecessary full-table scans to locate rows.

To level set, let’s first consider the difference between a Key and an Index. A “Key” is a relational term. The Primary Key of a table is a column set that uniquely identifies a row in a logical table. A key is an identifier and not a physical mechanism.

In Teradata, there are four main types of indexes: Primary Indexes, Secondary Indexes, Join Indexes, and Hash Indexes. Each of these types of indexes have their own distinctive flavors and uses.

All Teradata Database tables require a Primary Index because the system distributes table rows to the AMPs based on primary index values. (* Teradata 13.0 has a NoPI feature for loading staging tables.) To accommodate Teradata’s massively parallel architecture, indexes use a hashing algorithm based on data row hash values as the most efficient means of distributing and retrieving data.

Primary indexes can be either unique or non-unique and partitioned or non-partitioned:

  • Unique primary index (UPI)
  • Non-unique primary index (NUPI)
  • Non-partitioned primary index (NPPI)
  • Partitioned primary index (PPI)

Partitioned primary indexes may have a single partitioning expression or multiple partitioning expressions:

  • Single-level PPI (SLPPI)
  • Multilevel PPI (MLPPI)

The main purposes for Teradata’s Primary Indexes:

  • Provide access to data rows, obviating the need to do full-table scans. By providing the values for all the primary index columns in your SQL WHERE clause, direct access to the AMP with the row(s) can be made using that primary index value. (Note: Teradata does not do ‘partial’ indexing, you must provide values for all the columns defined in an index.)
  • Determine which AMP a row will distribute to. Rows are distributed across the AMPs based on the hash of their Primary Index. The uniqueness of the data values in the primary index will affect how evenly the data distribution is across the AMPs.
  • Efficiency of join processing. Depending upon the choice of index, rows of the table may or may not have to be redistributed, spooled, and sorted prior to the join.

A trade-off that must often be considered is between the needs of full table scan queries and queries that can be range constrained in some way. For example, between strategic queries that need to scan a multi-year history of sales, and tactical queries that only need transactions from, say, the last 60 days. The compromise that’s often made is to break the large table into a number of range partitioned tables which work better for tactical queries than scanning one big table, and to UNION the partitioned tables together for strategic queries that need all the data. But breaking up what should be one table into multiple tables makes them harder to manage and maintain, and although it improves performance for tactical queries, it is not the optimum solution. Up to 64 columns can be specified for a primary index definition.

Partitioned Primary Index (PPI) is a table organization scheme that very elegantly enhances the existing Teradata structures and helps remove this trade-off dilemma. It’s an optional extension to Teradata’s hashed primary indexing that adapts it to more efficiently handle range queries.
What if, instead of having to fully scan a large table to satisfy a query we only had to scan 50% of it? What if we only had to scan 10%….or perhaps even less? This would have a huge, very positive impact on performance, and this is what PPI has been designed to address, to push the envelope for Teradata users who want to see better and better query performance whether it’s for tactical or strategic requests.
PPI has also been designed to be very easily set up and managed and to put a minimum additional burden on the DBA in keeping with our philosophy of low cost database management.

Secondary Indexes provide alternate access paths to the data and may be Hash-ordered or Value-ordered. Secondary indexes may be added or dropped as needed with the caveat that building them requires some amount of system resources and you’ll want to check with your DBA appropriate.

  • Unique secondary index (USI)
  • Nonunique secondary index (NUSI)

Join Indexes offer a variety of six subtypes that include:

  • Single-table join index (STJI)
  • Single-table aggregate join index (STAJI)
  • Single-table sparse join index
  • Multitable simple join index
  • Multitable aggregate join index
  • Multitable sparse join index

Join Indexes are highly recommended for decision support applications because they often provide superior performance to large table joins and aggregate computations. Any Join Index, whether simple or aggregate, multi-table or single-table, can be sparse. The create statement of the index uses a constant expression in the WHERE clause of its definition to narrowly filter its row population. A caution when choosing Join Indexes is that neither MultiLoad nor FastLoad utilities support tables with join indexes. If Join Indexes are needed, a possible workaround could be to FastLoad data into an empty table and then either MERGE or do a INSERT/SELECT into the table with the Join Index.

Hash Indexes are a similar to Join Indexes and have a narrower usage. Hash Indexes are limited to one table and the table’s Primary Index cannot be partitioned.
Hash indexes are useful for queries where the index contains the columns referenced by a query, which will allow the Optimizer to use the index to satisfy the query rather than the underlying base table. Hash indexes can also be defined on a table in place of traditional secondary indexes. Like the Join Index, FastLoad and MulitLoad do not support Hash Indexes.

Criteria for choosing Join Indexes vs NUSIs
The similarities between STJI and NUSI are that STJIs can be defined with the same columns as NUSI; the concept of index covering (the query can be satisfied by columns in the index without accessing the base table) applies to both STJI and NUSI; value ordering is available on both STJI and NUSI; and joins to either STJIs or NUSIs are possible. Teradata does not do partial index retrievals, however, it can use either STJIs or NUSIs to do Full Table Scans of their subtables instead of a FTS of the base table.

The basic differences between STJI and NUSI are that a STJI is similar to a table with a primary index with additional columns defined; a STJI row can be stored on the same AMP or a different AMP as table data row whereas NUSIs are stored on same AMP as table data row; and NUSIs are supported by MultiLoad, but not STJIs.
 

Advantages of Indexes
Indexes are a retrieval mechanism and the intent of indexes is to lessen the time it takes to retrieve rows from a database and eliminate full table scans.

Disadvantages of Indexes
This is where too much of a good thing can be a disadvantage. Whenever a base table row is updated, deleted, or inserted, Index subtables must be also be updated. The more secondary, join or hash indexes you have, the more maintenance that will have to be performed. All Teradata secondary, hash, and join indexes are stored in subtables and will require extra storage space. Some of the indexes are “allergic” to some Load Utilities so you’ll want to check on what may be impacted as a result of index choices.
 

A good practice is to use the EXPLAIN before you execute a query will help you determine which indexes are being used for your query.

One of the most important tasks of a DBA is to choose Indexes, CHOOSE WISELY!

 

Discussion
Celia 8 comments Joined 03/09
20 May 2009

How is it possible that a table could be created without Primary Index (Teradata 13)? How is data distributed among the AMPs?

Alison 5 comments Joined 04/09
20 May 2009

Celia, That's a great question. In TD 13 there is an option to create tables with no primary index.

On this one, we broke all the rules, and the developers had a lot of fun doing it! As you know, on TD a table always has a PI, always has data distribution driven by the hash vlaue of the PI. Once rows arrive at their AMP, they are stored in order by hash value...and that is true of every table on TD.
In the case of No Primary Index Tables (NoPI), they are not hash distributed, and they are not hash ordered.

The goal of NoPI tables is to have a table that is very efficient to acquire data into. Like – ELT for batch or work tables. These table are not going to be positioned properly for joins or primary index access, they're intended to just be places to temporarily land and store data -- until you do the next processing step that generally would have to scan the table and read every row anyway. This will significantly improve the overall process.

For example, you define a NoPI table and load the data with a FastLoad. Instead of doing all the row redistribution, sorting, etc, we are going to take each block that FastLoads into the table and randomly pick an AMP that is going to get that block of data. Then, we'll takd that data block and insert it at the end of the table, that's it. Getting the data into the work tables is going to be a lot more efficient. 30+% improvement in acquiring data in FDL in phase one.

Insert/select into that table will be a little different because the data is not coming from outside so data it will flow into the table based on the AMP where it was created so final result of the query won’t be redistributed, we’ll store the data locally. At that point, inserting data into the table by row hash goes away and the rows just go to the end of the table (on that AMP). This table is only meant to be scanned and used for the next operation, not a specific select operation.
NoPI tables will reduce skew in the batch load operation and you won't have to fret or think about which column(s) would make the best index.

Secondary indexes are allowed on NoPI tables and secondary index access paths for SQL statements will work the same on a NoPI table as they do for a PI table.

That being the case, we all really know that every row in TD has a unique identifier, the rowid. For the NoPI tables, the system will generate a rowid based on the AMP where the row is being stored. No, you can't choose what you want it to be, and no, you can't use it as though it were a real Primary Index.

Alison

kn185024 1 comment Joined 05/09
24 May 2009

Hi Alison,

i would like to know when we define indexes in particular for NUSI's is it recommeded to define NUSI's individually on column or in other way when do you think its appropriate to define composite NUSI's?

Celia 8 comments Joined 03/09
26 May 2009

Thanks, Alison, for your answer.

Kaeri 1 comment Joined 05/09
26 May 2009

Alison, did you know that if a single table JI is created to create a different access path to the main table (it has a NUPI) using a different field as its PI can cause the dreaded "Product Join" to occur when you are updating rows on the main table via a simple update statement (two table). We found that the optimizer for 6.2 has been very silly and tries to update the main table via the JI and as the JI has its PI as a NUPI it has many duplicate rows causing the "Product Join" to occur. The only solution we found was to implement the PK via a USI. Then the optimizer choose that path and did a cool join (Merge). Is there any other way to influence the optimizer ? (we tried adding the recommended indexes)

sunny 1 comment Joined 05/09
27 May 2009

Alison,

Related to the NoPI tables you said they are mostly useful for ELT operations. When you are doing ELT for the most part you join multiple tables and then load to a target. Typically you land data into work tables and then join the work tables to load a target table. If the work tables are not distributed by PI what happens when you join 2 or more tables? Don't the tables have to be redistributed which takes away any benefit you get by not having a PI in the first place? Also if the target table has a PI and the work table does not the work table will have to be redistributed before inserting into the target table. Is that correct or am I missinbg something?

Thanks
Sunny.

Celia 8 comments Joined 03/09
10 Jun 2009

Sunny, I think that "Say Yes to No Primary Index (No PI) Tables" article (http://developer.teradata.com/database/articles/say-yes-to-no-primary-index-no-pi-tables) can reply your question.

humayunmd413 1 comment Joined 12/10
09 Dec 2010

hi alison,
ca you please tell me the diffence btwn the tables
created in user and a databse

JimmyLee 5 comments Joined 06/10
01 Feb 2011

I'm trying to understand if I'm better of using a NUSI or STJI as a covering index. One thing I can't determine is whether or not a NUSI can take advantage of the partitioning in the underlying table or not.
Also, Alison mentions that there is no partial index retrieval with either type of index, but if an STJI is partitioned, won't we get partition elimination if the query has a predicate that filters on the partition column(s)?

"To not give your best is to sacrifice the gift" -- Steve Prefontaine

harshainocu 2 comments Joined 04/11
07 Apr 2011

hello can i get explanation about multitable join indexs, single table join indexs and the how the rows are distributed by using these indes

Alison 5 comments Joined 04/09
22 Aug 2011

Multi-Table Join Indexes and Single Table Join Indexes are database objects. They are different than secondary indexes which are actually subtables that automatically refer back to the base table they are created on.
When JI's are created, you choose their Primary Index. The rows are stored based on the PI of the JI.

Alison

Alison 5 comments Joined 04/09
22 Aug 2011

The Optimizer will do a Full Table Scan if it is not supplied with the 'complete' index value. That is to say, it is looking for a value it can hash.
If you are using a Seconday Index, the Optimizer will choose a FTS on the base table. It can, however, do a FTS on a Join Index.

You can get partition elimination if the query has a predicate that filters on the partition column...but remember, the partition is not the index. [Caveat: Partitioned tables with Unique Primary Indexes MUST include the partitioning column(s) in the Index definintion.

Alison

Alison 5 comments Joined 04/09
22 Aug 2011

There is no difference in a table that is created in a database vs. a user.

Alison

Alison 5 comments Joined 04/09
16 Nov 2011

If you are looking for a great reference on Teradata Indexes, I have just the book for you!
"Teradata® Database Index Essentials" is available at:
http://TeradataIndexes.com

Alison

Prahladpatidar 1 comment Joined 11/11
24 Nov 2011

Hello Alison,
Nice article on Indexes. I have a perplexing question, and not able to find anwwer to it. Could you please answer it...
What if I create 100 partition on a table (based on year ranging from 2000 to 2100), and data is stored in only 11 partitions until now and rest of all 89 are blank.
Does it create a performace problem?

regards
Prahlad

hvganipineni 2 comments Joined 09/09
13 Dec 2011

We have a table from which we are doing a fast export, is there any advantage of creating that table as the no pi table? I think it would definitely have advantage. I would appreciate any inputs.

Harsha Ganipineni

harshainocu 2 comments Joined 04/11
31 Aug 2012

How the PE process when we join the SI column with non Indexed column?
please explain

vasudev 35 comments Joined 12/12
31 Jul 2013

Hi, 
I am using TD V12. By reducing the number of SI in target can the performance be improved. I am using TPUMP to load the data to target table?
Please advise.

koshimae 1 comment Joined 08/13
07 Apr 2014

Hi,
If I remove a certain column as a primary index. Should I drop the statistic for that column?

You must sign in to leave a comment.