All Forums Database
jnbalaji 7 posts Joined 05/11
07 Dec 2014
Complex analytical query help

Dear Folks,
I have a table with information like subscriber added/removed particular feature along with the date it added or removed. Also if he is active or canceled his subscription.

CREATE TABLE subscriber_feature
(SUB_ID INTEGER,  -- subscriber id
feature_status VARCHAR(20) -- feature status change ex - 'preminum added' / 'preminum removed'
feature_status_dttm TIMESTAMP(6), --feature added/removed dttm
sub_status_change_dttm TIMESTAMP(6),  -sub status change dttm 
sub_type_cd CHAR(5),  -- ex value 'prep'
Sub_stat_cd CHAR(1))  - ex value 'active', 'cancel' etc
PRIMARY INDEX(sub_id);

 
I have to write SQL to give report in following format.

        Jan-12     Feb-12    Mar-12   .....Dec-14
Jan-12     100         80        60   .....      
Feb-12       0        100        80   ....        
Mar-12       0          0       100               
.
.
Dec-14                                                
total
premium-   100        180       240  ............                
Feature

 
First row is all Jan-12 data. for Jan-12, how many subscribed for the feature, then how many stays in it by every month end.
(Ex) 100 subscribed Jan-12 and  15 unsubscribed for premium feature and 5 canceled there subscription by feb-12. So first row  should have 100 for Jan-12 and 80(100 - 15 - 5) for Feb-12 and so on for all months.
Second row is for Feb-12. Same like jan. Last row should have the total count of subscribers subscribed for preminum feature by all month end. 
We need to calculate feature added and feature removed using both feature_status and feature_status_dttm. subscription cancel by Sub_stat_cd & sub_status_change_dttm.
Looks very very complex to me. Could some one help me with this? If we can store the result in volatile table and derive the report using another query also fine. Please help.
Thanks.

dnoeth 4628 posts Joined 11/04
07 Dec 2014

Can you elaborate on the rules how feature_status/feature_status_dttm and Sub_stat_cd/sub_status_change_dttm are related and add some sample inserts?
What's you TD release?
I would assume some combination of EXPAND ON/TD_SEQUENCED_COUNT plus GROUPING SETS will be able to return this result.

Dieter

jnbalaji 7 posts Joined 05/11
07 Dec 2014

Hi Dieter,
 
If feature added or removed both will be stored in feature_status. For both the action the event time is stored in  Feature_status_dttm. Using the feature_status and Feature_status_dttm only i can determine when feature is added/removed.
If sub_stat_cd is 'cancel' then  subscription cancel time will be stored in sub_status_change_dttm. Using this i can determine this subscriber is still with us or not.
Based on both i can determine how many subscribed for particular feature in month 1 and how many still using this feature for 3 years month by month.
Sample inserts below
insert into subscriber_feature values(1000, 'premium added', current_timestamp - interval '2' month, current_timestamp - interval '90' day,'prep', 'active');
--Subscriber 1000 added feature in Oct month.
insert into subscriber_feature values(1004, 'premium added', current_timestamp - interval '1' month, current_timestamp - interval '30' day,'prep', 'active');
--Subscriber 1004 added feature Nov month.
insert into subscriber_feature values(1000, 'premium removed', current_timestamp - interval '1' month, current_timestamp - interval '30' day,'prep', 'active');
--Subscriber 1000 removed feature nov month.
insert into subscriber_feature values(1004, 'premium added', current_timestamp - interval '2' month, current_timestamp ,'prep', 'cancel');
--Subscriber 1004 canceled his subscription today (Dec).
insert into subscriber_feature values(1005, 'premium added', current_timestamp, current_timestamp ,'prep', 'Active');
--New Subscriber 1005 added feature today (Dec).
 
Report by end of december
                   Oct-14    Nov-14   Dec-14
Oct -14        1             0             0
Nov-14        0              1             0
Dec-14        0              0             1
Total            1              1             1
 
Thanks

jnbalaji 7 posts Joined 05/11
08 Dec 2014

I am using TD14

dnoeth 4628 posts Joined 11/04
08 Dec 2014

First you need the logic to determine the period when a subscription was active, something like this:

SELECT
  SUB_ID
  ,feature_status                
  ,sub_type_cd
  ,Sub_stat_cd             
  ,TRUNC(feature_status_dttm, 'mon') AS start_dt          -- 1st day of month 
  ,TRUNC(MIN(CASE WHEN feature_status = 'premium removed' -- last day of the previous month before subscription ended
                    OR Sub_stat_cd = 'cancel'             
                  THEN sub_status_change_dttm
             END) 
         OVER (PARTITION BY SUB_ID
               ORDER BY sub_status_change_dttm
               ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
        , 'mon') -1 AS end_dt                             
FROM subscriber_feature
QUALIFY feature_status = 'premium added'                  -- only start of subscription
    AND Sub_stat_cd = 'Active'

Of course you might add some conditions to get only rows from the last 36 months...
 
Then you count the number of rows for each start/end combination:

SELECT
  start_dt
  ,COALESCE(end_dt, DATE '9999-12-01') AS end_dt
  ,COUNT(*) AS CNT
FROM (previous query returning subscription start/end)
GROUP BY start_dt, end_dt

 
Finally you need to cross join it to a table with one row for each month:

SELECT
  CASE WHEN GROUPING (mth) = 0 THEN (mth (FORMAT 'mmm-yy') (CHAR(6))) ELSE 'Total' END
  ,...
  ,SUM(CASE WHEN mth = start_dt AND DATE '2014-09-01' BETWEEN start_dt AND end_dt THEN CNT ELSE 0 END) AS "Sep-14"
  ,SUM(CASE WHEN mth = start_dt AND DATE '2014-10-01' BETWEEN start_dt AND end_dt THEN CNT ELSE 0 END) AS "Oct-14"
  ,SUM(CASE WHEN mth = start_dt AND DATE '2014-11-01' BETWEEN start_dt AND end_dt THEN CNT ELSE 0 END) AS "Nov-14"
  ,SUM(CASE WHEN mth = start_dt AND DATE '2014-12-01' BETWEEN start_dt AND end_dt THEN CNT ELSE 0 END) AS "Dec-14"
FROM
 ( -- any query returning one row per 1st of month, e.g.
   SELECT DISTINCT calendar_date AS mth
   FROM sys_calendar.CALENDAR
   WHERE calendar_DATE BETWEEN ADD_MONTHS(CURRENT_DATE,-36) AND CURRENT_DATE-1
   AND day_of_month = 1
 ) AS cal
CROSS JOIN 
   (previous query returning counts per period)
GROUP BY GROUPING SETS ((mth), ())
ORDER BY mth NULLS LAST

 Hope I got your logic correct :-)
 
But there's no way to dynamically name the 36 month columns, you need to add/remove them manually as needed.

Dieter

jnbalaji 7 posts Joined 05/11
09 Dec 2014

Hi Dieter,
Thanks a lot, This is exactly what i was looking for and also learnt some new functions that i never came across in teradata. 
Thanks a lot,
Naga

jnbalaji 7 posts Joined 05/11
10 Dec 2014

Hi dieter,
Some change in requirement.
Table structure has changed. reporting columns are static.
They are from Jan -15 to Dec - 17.

CREATE TABLE subscriber_feature

(SUB_ID INTEGER,  -- subscriber id

feature_code VARCHAR(20) -- feature code

feature_add_dttm DATE, --add dttm

feature_remove_dttm DATE,  -remove dttm

feature_cancel_dttm DATE,  -- cancel dttm

Sub_stat_cd CHAR(1))  - ex value 'a' - ACTIVE, 'c' - CANCEL  etc

PRIMARY INDEX(sub_id);

INSERT INTO  subscriber_feature VALUES(123,'Code','2015-01-15','2015-03-20', NULL,'A' );
INSERT INTO subscriber_feature VALUES(234,'Code','2015-01-20','2015-04-05', '2015-05-02','C');
INSERT INTO subscriber_feature VALUES(345,'Code','2015-02-05','2015-04-15',NULL,'A');
INSERT INTO  subscriber_feature VALUES(456,'Code','2015-01-25',NULL,2015-03-21,'C');
INSERT INTO  subscriber_feature VALUES(456, 'Code','2015-04-20','2015/05/20',NULL,'A' );
INSERT INTO subscriber_feature VALUES(567, 'Code','2015-02-10','2015-02-25',NULL,'A');
INSERT INTO  subscriber_feature VALUES(678, 'Code','2015-03-10',NULL,NULL,'A');
One problem in that. If the customer removed feature or cancel subscription in same month he added, then end_dt will give previous month end date.
So count may go wrong for this scenario.

We are considering 6 million records. Can we introduce some table to store intermediate result in between to make the query performance better. If so, what format it should be?
Looking forward for your help.

Thanks

jnbalaji 7 posts Joined 05/11
10 Dec 2014

Hi,
"One problem in that. If the customer removed feature or cancel subscription in same month he added, then end_dt will give previous month end date.
So count may go wrong for this scenario."
Above is not a problem as we gre grouping on start date everything works fine. My bad. But as i said huge no of input is the problem. Experts want  to break this process in to two with intermediate table.
 
thanks.

dnoeth 4628 posts Joined 11/04
10 Dec 2014

What's the performance of the current query? If it's good enough there's no need to use additional tables (besides maybe one permanent table with 36 rows for each 1st of month instead of a query on the calendar)
Can you show the explain?

Dieter

You must sign in to leave a comment.