Some of the SQL:2008 analytical functions are not implemented in Teradata, but most of them can be re-written using existing features.
In this post I will cover the ranking algorithms including DENSE_RANK.

Ranking definitions

The Wikipedia-article on Ranking lists five algorithms.
Of course all of them return the same output for unique data, but non-unique values will produce different results with duplicate numbers and gaps in the ranking.
Standard SQL proposes syntax for three out of those five variations, while Teradata only implements two of them. In this post I will focus on the missing syntax assuming you're familiar with the existing functions RANK and ROW_NUMBER.

Wikipedia's #4: Ordinal ranking "1234"

A sequential number.
No duplicate values & no gaps, but non-deterministic, i.e. same row might get different ranking number when query runs a second time .

Implemented in SQL:2008 and Teradata as ROW_NUMBER.

Wikipedia's #1: Standard competition ranking "1224"

One plus the number of rows with a value less than the current value.
Duplicate values & gaps.

Implemented in SQL:2008 and Teradata as RANK.

Wikipedia's #2: Modified competition ranking "1334"

The number of rows with a value less than or equal to the current value.
Duplicate values & gaps.

There's no special syntax for this one in SQL:2008, but it's straightforward:

COUNT(*) OVER (PARTITION BY part_col ORDER BY data_col RANGE UNBOUNDED PRECEDING)

Unfortunately the RANGE syntax in the OLAP window definition is not implemented in Teradata. But "number of rows with a value less than or equal to the current value" can also be expressed using a RANK/COUNT combination:

RANK() OVER (PARTITION BY part_col ORDER BY data_col) - 1
+ COUNT(*) OVER (PARTITION BY part_col, data_col)

The previous calculation returns the expected result, but explain shows two "STAT FUNCTION" steps.
The most efficient formula with just a single step is based on a reversed order in the RANK:

COUNT(*) OVER (PARTITION BY part_col)
- RANK() OVER (PARTITION BY part_col ORDER BY data_col DESC) + 1

Wikipedia's #5: Fractional ranking "1 2.5 2.5 4"

One plus the number of rows with a value less than the current value plus half the number of items equal to it.
Duplicate values & gaps.

This is also easy to translate into SQL, but the optimizer has to add another STATS step again, which will double runtime:

RANK() OVER (PARTITION BY part_col ORDER BY data_col)
+ (0.5 * (COUNT(*) OVER (PARTITION BY part_col, data_col) - 1))

Wikipedia's #3: Dense ranking "1223"

The number of distinct values less than or equal to the current value.
Duplicate values, but no gaps.

It's the SQL:2008 DENSE_RANK function, which can be expressed in different ways in Teradata, but all of them are much slower than the built-in RANK:

SELECT
  t1.part_col
  ,t1.data_col
  ,dt.rnk AS "DENSE_RANK"
FROM tab AS t1
JOIN
 (
  SELECT
    part_col
    ,data_col
    ,RANK() OVER (PARTITION BY part_col ORDER BY data_col) AS rnk
  FROM tab
  GROUP BY 1,2
 ) AS dt
ON t1.part_col = dt.part_col
AND t1.data_col = dt.data_col;

This query has to access the same table or view twice, which is a big disadvantage if it there are joins or complex conditions included.

Following solutions avoid double access, but are based on nested OLAP steps:

SELECT
  part_col
  ,data_col
--,DENSE_RANK() OVER (PARTITION BY part_col ORDER BY data_col) -- not implemented
  ,SUM(CASE WHEN data_col = prev_data_col THEN 0 ELSE 1 END)
   OVER (PARTITION BY part_col
         ORDER BY data_col, prev_data_col
         ROWS UNBOUNDED PRECEDING) AS "DENSE_RANK"
FROM
 (
  SELECT
    part_col
    ,data_col
    ,MIN(data_col)
     OVER (PARTITION BY part_col 
           ORDER BY data_col
           ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prev_data_col
  FROM tab
 ) AS dt;
 

SELECT
  part_col
  ,data_col
--,DENSE_RANK() OVER (PARTITION BY part_col ORDER BY data_col) -- not implemented
  ,SUM(CASE WHEN rowno = 1 THEN 1 ELSE 0 END)
   OVER (PARTITION BY part_col
         ORDER BY data_col, rowno 
         ROWS UNBOUNDED PRECEDING) AS "DENSE_RANK"
FROM
 (
  SELECT
    part_col
    ,data_col
    ,ROW_NUMBER()
     OVER (PARTITION BY part_col, data_col
           ORDER BY data_col) AS rowno
  FROM tab
 ) AS dt;

If another analytical function must be added to the query you should check if it shares the same PARTITION and ORDER definition and choose the matching version.

Btw, in TD13.10 there's a new "Window Aggregate UDF" feature which allows to create your own OLAP function and the example source code in the manuals is a DENSE_RANK ;-)


The next post in this small series on missing analytical functions will discuss additional SQL:2008 functions based on relative rankings:
CUME_DIST and NTILE.

Discussion
strattok 5 comments Joined 10/09
16 May 2011

Are you sure about the first sql example under Wikipedia's #3: Dense ranking "1223"?

dnoeth 86 comments Joined 11/04
16 May 2011

Of course, just try it :-)

Dieter

Dieter

mullasci 2 comments Joined 06/09
18 May 2011

Great timing! I was just looking for dense_rank for a porting!
Thanks

RobG 5 comments Joined 09/10
23 May 2011

Nice work Dieter. Very informative.

Mandar.Nayak 1 comment Joined 05/11
03 Jun 2011

Handy Info, thanks Dieter

cbp7298 1 comment Joined 02/11
05 Mar 2013

Thanks Dieter, It helps me in my work

Bhanu Prasad Chintalapalli

KVB 2 comments Joined 09/12
07 Feb 2014

Great work.It helped me a lot in my work area.

sk85 1 comment Joined 03/14
26 Mar 2014

Hi Dieter,
It's a great effort towards simulating common functions available in other databases but not in Teradata.
However, I had a question regarding the two approaches mentioned above. Is the "partition by part_col" needed in the SUM function in the outer query as this will only give the ranking within the group identified by the column given in the partition clause?

27 Nov 2014
-- Another Example to Get nth highest salaried employee
SELECT empNo
	,salary
	,prev_value
	,SUM(CASE WHEN prev_value = salary THEN 0 ELSE 1 END) OVER (
		ORDER BY salary DESC
			,prev_value DESC ROWS UNBOUNDED Preceding
		) AS "DENSE_RANK"
FROM (
	SELECT empNo
		,salary
		,min(salary) OVER (
			ORDER BY salary DESC ROWS BETWEEN 1 PRECEDING
					AND 1 PRECEDING
			) prev_value
	FROM employee2
	) AS dt;

 

You must sign in to leave a comment.