# Missing Functions: DENSE_RANK

Some of the SQL:2008 analytical functions are not implemented in Teradata, but most of them can be re-written using existing features.

In this post I will cover the ranking algorithms including DENSE_RANK.

## Ranking definitions

The Wikipedia-article on Ranking lists five algorithms.

Of course all of them return the same output for unique data, but non-unique values will produce different results with duplicate numbers and gaps in the ranking.

Standard SQL proposes syntax for three out of those five variations, while Teradata only implements two of them. In this post I will focus on the missing syntax assuming you're familiar with the existing functions RANK and ROW_NUMBER.

## Wikipedia's #4: Ordinal ranking "1234"

A sequential number.

No duplicate values & no gaps, but non-deterministic, i.e. same row might get different ranking number when query runs a second time .

Implemented in SQL:2008 and Teradata as ROW_NUMBER.

## Wikipedia's #1: Standard competition ranking "1224"

One plus the number of rows with a value *less than the current value*.

Duplicate values & gaps.

Implemented in SQL:2008 and Teradata as RANK.

## Wikipedia's #2: Modified competition ranking "1334"

The number of rows with a value *less than or equal to the current value*.

Duplicate values & gaps.

There's no special syntax for this one in SQL:2008, but it's straightforward:

COUNT(*) OVER (PARTITION BY part_col ORDER BY data_col RANGE UNBOUNDED PRECEDING)

Unfortunately the RANGE syntax in the OLAP window definition is not implemented in Teradata. But *"number of rows with a value less than or equal to the current value"* can also be expressed using a RANK/COUNT combination:

RANK() OVER (PARTITION BY part_col ORDER BY data_col) - 1 + COUNT(*) OVER (PARTITION BY part_col, data_col)

The previous calculation returns the expected result, but explain shows two "STAT FUNCTION" steps.

The most efficient formula with just a single step is based on a *reversed* order in the RANK:

COUNT(*) OVER (PARTITION BY part_col) - RANK() OVER (PARTITION BY part_col ORDER BY data_col DESC) + 1

## Wikipedia's #5: Fractional ranking "1 2.5 2.5 4"

One plus the number of rows with a value* less than the current value* *plus half the number of items equal* to it.

Duplicate values & gaps.

This is also easy to translate into SQL, but the optimizer has to add another STATS step again, which will double runtime:

RANK() OVER (PARTITION BY part_col ORDER BY data_col) + (0.5 * (COUNT(*) OVER (PARTITION BY part_col, data_col) - 1))

## Wikipedia's #3: Dense ranking "1223"

The number of distinct values less than or equal to the current value.

Duplicate values, but no gaps.

It's the SQL:2008 DENSE_RANK function, which can be expressed in different ways in Teradata, but all of them are much slower than the built-in RANK:

SELECT t1.part_col ,t1.data_col ,dt.rnk AS "DENSE_RANK" FROM tab AS t1 JOIN ( SELECT part_col ,data_col ,RANK() OVER (PARTITION BY part_col ORDER BY data_col) AS rnk FROM tab GROUP BY 1,2 ) AS dt ON t1.part_col = dt.part_col AND t1.data_col = dt.data_col;

This query has to access the same table or view twice, which is a big disadvantage if it there are joins or complex conditions included.

Following solutions avoid double access, but are based on nested OLAP steps:

SELECT part_col ,data_col --,DENSE_RANK() OVER (PARTITION BY part_col ORDER BY data_col) -- not implemented ,SUM(CASE WHEN data_col = prev_data_col THEN 0 ELSE 1 END) OVER (PARTITION BY part_col ORDER BY data_col, prev_data_col ROWS UNBOUNDED PRECEDING) AS "DENSE_RANK" FROM ( SELECT part_col ,data_col ,MIN(data_col) OVER (PARTITION BY part_col ORDER BY data_col ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prev_data_col FROM tab ) AS dt; SELECT part_col ,data_col --,DENSE_RANK() OVER (PARTITION BY part_col ORDER BY data_col) -- not implemented ,SUM(CASE WHEN rowno = 1 THEN 1 ELSE 0 END) OVER (PARTITION BY part_col ORDER BY data_col, rowno ROWS UNBOUNDED PRECEDING) AS "DENSE_RANK" FROM ( SELECT part_col ,data_col ,ROW_NUMBER() OVER (PARTITION BY part_col, data_col ORDER BY data_col) AS rowno FROM tab ) AS dt;

If another analytical function must be added to the query you should check if it shares the same PARTITION and ORDER definition and choose the matching version.

Btw, in TD13.10 there's a new "Window Aggregate UDF" feature which allows to create your own OLAP function and the example source code in the manuals is a DENSE_RANK ;-)

The next post in this small series on missing analytical functions will discuss additional SQL:2008 functions based on relative rankings:

CUME_DIST and NTILE.

Are you sure about the first sql example under Wikipedia's #3: Dense ranking "1223"?

Of course, just try it :-)

Dieter

Dieter

Great timing! I was just looking for dense_rank for a porting!

Thanks

Nice work Dieter. Very informative.

Handy Info, thanks Dieter

Thanks Dieter, It helps me in my work

Bhanu Prasad Chintalapalli

Great work.It helped me a lot in my work area.

Hi Dieter,

It's a great effort towards simulating common functions available in other databases but not in Teradata.

However, I had a question regarding the two approaches mentioned above. Is the "partition by part_col" needed in the SUM function in the outer query as this will only give the ranking within the group identified by the column given in the partition clause?