All Forums Database
Aress 3 posts Joined 04/12
12 Apr 2012
SQL assistance--find mismatches in daily data

I'm looking for a way to quantify a problem for my developers. 

I have 3 columns:


The action date is when an event happens while the date is snapshot of when in time the data was valid.


THe problem I am having is my ACTION_DATE may not go from NULL to a date until a DATE much later than the actual DATE.


What it should look like:


1              NULL              4-1

1             4-2                 4-2

1             4-2                 4-3

1             4-2                 4-4

1             4-2                  4-5



What it really looks like (aka "bad behavior")


1              NULL                4-1

1             NULL                 4-2

1             NULL                 4-3

1             4-2                    4-4

1             4-2                    4-5


The question is how do I select CUST_ID that display the "bad" behavior--its not always 2 days--it can be 5-45 days--it is not systematic.

I have greatly simplified this table, and as you can imagine each of our millions of records has an entry for each day--not exactly 3NF.  Normally I could play around with some self-joins but these tables cause trouble with our Teradata loading and jobs will get killed by our admins as well as nasty emails sent about system overuse.  ANy ideas on selecting CUST_IDs where the ACTION_DATE at a later point in time was NULL on the DATE where the ACTION_DATE occured.

I do have a similar tables:  A 'current' table that is just all records with DATE=Today.  My original attempts were to join the  historical onto current table--but i did not have much success, and again, because of table size, i had to severely restrict what i was joining--i could not make a general case--I can only make it work for a specific day.

ulrich 816 posts Joined 09/09
12 Apr 2012

I am not sure that I fully understand your model and question. From what I got I would try

select *
from table
qualify min(ACTION_DATE) over (partition by cust_id) <= DATE_COL
          and ACTION_DATE is null



P.S. Date is a reserved word and usully is the system date - so do not use date as columname...

feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud

Aress 3 posts Joined 04/12
13 Apr 2012

Good point, I had changed my column names as this is a work project, and I don't want to use the "real" names of the columns.  DATE should be HIST_DT.

I have not used qualify and partition and had difficulty understanding exactly what they do. Also we are tyring to cool off our system right now as it is under tremendous load--I better figure out what the query does before I hit submit!

  Any recommendations on resrouces to learn about those keywords/functions?


In the meantime, here the query that works for a single day (Feb 1, 2012):

select count(1) from ACT_CURRENT AC



AH.HIST_DT=date '2012-02-01'

where AC.ACTION_DT = date '2012-02-01' and AH.ACTION_DT is NULL

 My difficulty is generaling so that I can run for all dates.  The other portion that may be confusing is not every CUST_ID gets an ACTION_DT--only about 10% do--but out of those 10% that do, some are not posting an ACTION_DT on the appropriate HIST_DT.  The trick is if it does get one, it should be updated on the HIST_DT that the ACTION_DT occurs, instead I have have multiple HIST_DT records that continue to be null until a random time in the future.

We could think of ACTION_DT as a flag--i think of it as almost a FOR EACH statement:

For every CUST_ID ROW that contains an ACTION_DT in the ACT_CURRENT table,

Go to the ACT_HISTORY table record for that HIST_DT and lookup the ACTION_DT.

If ACTION_DT is null, select that row and return the CUST_ID (or count it etc...)

dnoeth 4628 posts Joined 11/04
13 Apr 2012

Still hard to understand :-)

Your narration seems to translate to adding another join condition instead of the hard-coded date:

Instead of a join it might be better with an EXISTS (if there are multiple rows with the same HIST_DT)

select *
where exists
  (select * from ACT_HIST AH
   and AH.ACTION_DT is NULL)

Btw, why is in your "What it should look like" example still a NULL for 4-1?

Because it's the "first" row?



Aress 3 posts Joined 04/12
13 Apr 2012

The action (or event) happened on 4-2 (April 2), thus for the HIST_DT of 4-2 onwards the event ACTION_DT should be '4-2'.  What I've found is that usually the ACTION_DT is NULL for a much longer time.

praveen_reddy 22 posts Joined 10/11
15 Apr 2012

Hi dnoeth,

Can you pls give the query for finding the space occupied by Indexes only.



You must sign in to leave a comment.