All Forums Database
saraider99 8 posts Joined 05/12
04 Sep 2013
Creating Group ID from SUM Limit

I have a list of products and a count corresponding to the quantity sold in a single table. The data is laid out as such:
Product Name     QTY_SOLD
Mouse                       23
Keyboard                  25
Monitor                     56
TV                             10
Laptop                      45
...
I want to create a group ID where groups are broken up if the sum of the quantity sold is greater than 50.  We can order by Product Name to get an output similar to the following.
Product Name     QTY_SOLD     GROUP_NBR
Keyboard                 25                1
Laptop                     45                1
Monitor                    56                2
Mouse                     23                 3
TV                           10                3
 
I created a case statement to create the output I need but if I want to change the group id cutoff from 50 to say 100 or if i get more products and quantities I have to keep changing the case statement. Is there an easy way to use either recursion or some other method to accomodate this?


UPDATE main
FROM prod_list AS main,
 (
 SEL PROD_NAME
 , QTY_SOLD
 , SUM(QTY_SOLD) OVER (ORDER BY PROD_NAME ROWS UNBOUNDED PRECEDING) RUNNING
 FROM prod_list 
 ) inr
SET GROUP_NBR = 
 CASE
  WHEN RUNNING < 50 THEN 1
  WHEN RUNNING > 50 AND RUNNING < 100 THEN 2
  WHEN RUNNING > 100 AND RUNNING < 150 THEN 3
  WHEN RUNNING > 150 AND RUNNING < 200 THEN 4
  WHEN RUNNING > 200 AND RUNNING < 250 THEN 5
  ELSE 6
 END
WHERE main.PROD_NAME = inr.PROD_NAME
;

 

saraider99 8 posts Joined 05/12
04 Sep 2013

I meant to say, I want to create a group ID where groups are created if the ROLLING sum of the quantity sold is greater than 50.
 

saraider99 8 posts Joined 05/12
05 Sep 2013

I was thinking I may be able to get away with simply distributing the records evenly based on an average or other distribution key. If I had data such as below I would want to distribute them into buckets somewhat evenly.
PROD A - 50
PROD B - 55
PROD C - 1000
PROD D - 25
PROD E - 32
PROD F - 800
One possible result
GRP 1 - (PROD A & D)
GRP 2 - (PROD B & E)
GRP 3 - (PROD C)
GRP 4 - (PROD F)
Is that possible?

ToddAWalter 316 posts Joined 10/11
05 Sep 2013

Change your running sum to order on QtySold. 
Join that result to SELECT SUM(QtySold) AS GrandTotal FROM Prod_List. 
Replace your CASE expression with (running/GrandTotal) DIV <#of buckets you want>. 
This distributes the products into a best approximation equal height histogram by their percentage contribution to the total quantity. If you have very large outliers then you might have missing buckets because the big one will cover multiple percentage buckets. 

saraider99 8 posts Joined 05/12
05 Sep 2013

I tried to construct the query you recommended but when i ran it with over a hundred products it created almost as many buckets and didnt stick to the number of buckets i had wanted. I created another query using MOD and it gave me almost what i was looking for but didnt distribute as evenly as I had hoped for.

UPDATE main FROM prod_list AS main,  
(  
 SEL 
  PROD_NAME  
 ,  QTY_SOLD  
 ,  SUM(QTY_SOLD) OVER (ORDER BY NUMOF ROWS UNBOUNDED PRECEDING) RUNNING  
 FROM prod_list   
) inr 
SET GROUP_NBR = RUNNING MOD <# of buckets>
WHERE main.PROD_NAME = inr.PROD_NAME ;

 

ToddAWalter 316 posts Joined 10/11
06 Sep 2013

This is missing the join to and division by the grand total. That is necessary in order to make the running sum a percentage contribution.  And I left off the multiplication of that result by 100 before the division by the number of buckets.
 
In your sample you order by NUMOF? rather than QTY_SOLD.

saraider99 8 posts Joined 05/12
09 Sep 2013

Is this the query you mean? I did the whole thing over so that I wouldnt have to do an insert and then an update. This still gives me some weird results what am I missing?

WITH PROD_CTE (PROD_NAME,QTY_SOLD) AS 
(
 SELECT 
  PROD_NAME
 , COUNT(DISTINCT PROD_ID) AS QTY_SOLD
 FROM all_prods
 GROUP BY 1
)
SELECT 
 PROD_NAME
, QTY_SOLD
, CUMLTV_SUM
, TTL
, (((RUNNING_SUM / TTL ) * 100  ) / 10) AS BUCKET
FROM (
SELECT
  PROD_NAME
, QTY_SOLD
, SUM(QTY_SOLD) OVER (ORDER BY QTY_SOLD ROWS UNBOUNDED PRECEDING) AS RUNNING_SUM
, TTL
FROM PROD_CTE 
JOIN (
 SELECT SUM(QTY_SOLD) TTL 
 FROM PROD_CTE 
 )sm ON 1=1
)v
;

 

saraider99 8 posts Joined 05/12
24 Sep 2013

I ended up using a suggestion from the guys over at stack overflow.
First create a table that has the group id and the upper and lower limits of each bucket.

-- create the first entry for the recursive query
INSERT  TMP_WORK_DB.GRP_NBRS VALUES (0,1,0,2000000);

INSERT TMP_WORK_DB.GRP_NBRS (GRP_NBR,LOWER_LIMIT, UPPER_LIMIT)
WITH RECURSIVE GRP_RECRSV (GRP_NBR, LOWER_LIMIT, UPPER_LIMIT) 
AS (
SELECT 
    1 AS GRP_NBR
,   LOWER_LIMIT
,   UPPER_LIMIT
FROM TMP_WORK_DB.GRP_NBRS
UNION ALL
SELECT
    GRP_NBR + 1 
,   LOWER_LIMIT + 2000000 -- set the interval to 2 million
,   UPPER_LIMIT + 2000000 -- can be adjusted as needed
FROM GRP_RECRSV
WHERE GRP_NBR < 120 -- needed a limit so that it would not be endless
)
SELECT * FROM GRP_RECRSV
;

Then use a simply WHERE BETWEEN on the data to determine which GROUP ID to assign.

-- delete the first entry because it was duplicated
DELETE FROM TMP_WORK_DB.GRP_NBRS WHERE GRP_NBR = 0;

-- set grp nbr using the limits table
INSERT TMP_WORK_DB.PROD_LIST_GRP
WITH NUMOFPRODS (PROD_NAME,QTY,RUNNING) AS
(
    SELECT 
        PROD_NAME
    ,   COUNT(DISTINCT PROD_ID) AS QTY
    ,   SUM(QTY) OVER (ORDER BY QTY ROWS UNBOUNDED PRECEDING) RUNNING
    FROM TMP_WORK_DB.PROD_LIST
    GROUP BY 1
)
SELECT 
    PROD_NAME
,   QTY
,   RUNNING
,   GRP_NBR
FROM NUMOFPRODS a
JOIN TMP_WORK_DB.GRP_NBRS b ON RUNNING BETWEEN LOWER_LIMIT AND UPPER_LIMIT
;

 

You must sign in to leave a comment.