All Forums Database
Valeria 11 posts Joined 03/15
24 Mar 2015
Question on Randomized allocation

Hello,
I have a question concerning Randimized allocation:
I would like to split a customers' selection in two parts  (50% and 50%) with the condition that the two parts has to contain the same number of customers which belongs to the same Abglkey 480, 481 and 482.
 
Example 600 customers --->100 have 480 as Abglkey, 200 have 481 as Abglkey, 300 have 482 as Abglkey. The results must be:
            Table 1     Table 2
480      50             50
481      100           100
482      150           150
I wrote the following SQL
ins into BP_ATV.einsatz_115_S3
sel  REFNR, case when REFNR is not null then ' ' end Gutscheinnummer  , case when REFNR is not null then 'AKTIV3' end Kennzeichen
from BP_ATV.AKQ_CLEANS_TEST_115
where ABGLKEY in (480, 481, 482)
randomized allocation.5
order by Abglkey
 
The problem is that order by is not allowed in subqueries. I also found the statement randomized allocation when... but it's not perfect and when I have 100 Abglkey it's not easy to do.
Do you have an idea how can I do that?
Thank you,
 
Valeroa
 

dnoeth 4628 posts Joined 11/04
25 Mar 2015

Hi Valeria,
RANDOMIZED ALLOCATION is an option of the SAMPLE clause, but sampling only works on all rows. Plus it's random, not repeatable, so you can't do two Inserts and still get distinct sets.
 
if the number of ABGLKESs is small and known (like in your example) you can do

select ..., SAMPLEID as sid
from BP_ATV.AKQ_CLEANS_TEST_115
where ABGLKEY in (480, 481, 482)
SAMPLE RANDOMIZED ALLOCATION
  WHEN ABGLKEY = 480 THEN 0.5,0.5
  WHEN ABGLKEY = 481 THEN 0.5,0.5
  WHEN ABGLKEY = 482 THEN 0.5,0.5
END 

 
This will result in two 50% samples for each ABGLKEY, numbered from 1 to 6.
You must materialize this result in a table and when you need to select a single sample filter using 

WHERE sid MOD 1 = 0 (or 1)

 
For a larger number of ABGLKEYs (or unknown values) this will work:

SELECT ... ,
   CASE WHEN PERCENT_RANK()
             OVER (PARTITION BY ABGLKEY
                   ORDER BY rnd) < 0.5
        THEN 0 
        ELSE 1 
   END AS sid
FROM  
 (
   SELECT ... ,
      RANDOM(-2000000000,2000000000) AS 
   from BP_ATV.AKQ_CLEANS_TEST_115
   where ABGLKEY in (480, 481, 482)
 ) AS dt

 
This also creates two random groups each 50%.
 
 

Dieter

Valeria 11 posts Joined 03/15
02 Apr 2015

Hello Dieter,
 
thank you very much it works very well. The only thing that I noticed is that Teradata that in the end I have some customers more in the group 1 than in the group 0 because Teradata. This is due to Abglkey that contains odd numbers of customers. for eg:
Abglkey  sid 1         sid   0
1            10                 9
2            7                   6
3            5                   4
 
Is there a way that allow this 1 more in group 1 is assigned sometimes to 0 instead of to 1?
I hope it is clear. This is only for being more precise :) But it works very well you helped me a lot!
Thank you,
Valeria

dnoeth 4628 posts Joined 11/04
02 Apr 2015

Hi Valeria,
you could switch 0 and 1 like this (not random, but based on odd/even ABGLKEYs):

SELECT ... ,
   (CASE WHEN PERCENT_RANK()
             OVER (PARTITION BY ABGLKEY
                   ORDER BY rnd) < 0.5
        THEN 0 
        ELSE 1 
   END + (ABGLKEY MOD 2)) MOD 2 AS sid
FROM 
 (
   SELECT ... ,
      RANDOM(-2000000000,2000000000) AS
   from BP_ATV.AKQ_CLEANS_TEST_115
   where ABGLKEY in (480, 481, 482)
 ) AS dt

 

Dieter

You must sign in to leave a comment.