All Forums Database
brim 10 posts Joined 09/14
22 Sep 2014
Selecting all rows that have a unique column

I have raw data coming into a table for an ETL job, so I'd like to keep all the records.
I'm trying to select all the rows that basically have a unique ID. This can obviously be resolved by a primary key in a set table, but I tried doing so in a volatile table to no avail.
So I've been trying to do something along the lines of a

SELECT Distinct id, name, email
FROM source_table

but due to Teradata SQL, the statement will return all rows unique based on the combination of id, name and email. I instead am trying to find all rows unique based on just ID and select all the columns.
 
How would I accomplish this? I've tried a GROUP BY, but that seems more difficult to do what I'm trying to achieve here.

Raja_KT 1246 posts Joined 07/09
22 Sep 2014

Can you show sample records, what you have and what you want? That will be easier. From my standpoint, it looks like group by is what you resort to using it, since you need all records. Why VT does not work? Do you use on commit preserve row?

Raja K Thaw
My wiki: http://en.wikipedia.org/wiki/User:Kt_raj1
Street Children suffer not by their fault. We can help them if we want.

VBurmist 96 posts Joined 12/09
22 Sep 2014

Hi Brim,
if you want to find rows with unique ID, the following SQL applies.  The window analytic functions are great for such purposes.
 
select id, name, email
from source_table
qualify count(*) over (partition by id) = 1
;
 
Alternatively, you can use the unfiltered query to evaluate the data:
select id, name, email, count(*) over (partition by id)  as number_of_rows_with_this_ID
from source_table
;

 
Regards,
Vlad.
 

brim 10 posts Joined 09/14
23 Sep 2014

Vlad,
The partition function is almost what I need. Because of the `= 1`, the statement is filtering out the partitions or IDs that have a count of greater than 1. I still want those rows, but just 1 of them.
 
So in an example table like the following:
ID |  NAME
1  Aaron
1  Aaron
2  Jason
2  Jason
2  Jason
3 Brian
 
I want the following returned:
1  Aaron
2  Jason
3  Brian
 
Vlad's partition statement is giving me just:
3 Brian
 
 

dnoeth 4628 posts Joined 11/04
23 Sep 2014

You need to use a kind of ROW_NUMBER instead of a group COUNT:

select id, name, email
from source_table
qualify row_number() over (partition by id order by rows unbounded preceding) = 1

But for your example you might simply utilize DISTINCT

select DISTINCT id, name, email
from source_table

 

Dieter

brim 10 posts Joined 09/14
23 Sep 2014

Hi Dieter,
From what I've learned and verified, DISTINCT will filter based on all the columns in the SELECT clause. So in the little code snippet you wrote out, the query would return all rows that have a distinct combination of id, name and email. Teradata won't return based on just distinct id.
 
For the first part where you used row_number, is there something I'm supposed to add in? I can't find the right syntax for a QUALIFY clause

dnoeth 4628 posts Joined 11/04
23 Sep 2014

Ups, sorry, I wrote nonsense, mixing ROW_NUMBER and COUNT:

select id, name, email
from source_table
qualify count(*) over (partition by id rows unbounded preceding) = 1

 

Dieter

brim 10 posts Joined 09/14
23 Sep 2014

Awesome, that's exactly what I need. Thanks!

You must sign in to leave a comment.