All Forums Database
karthikcsekar 17 posts Joined 02/14
27 Feb 2014
Join Issues - Need help in most efficient way to solve the problem

Hi,
I need some help to solve the following problem in the most efficient way. 
Consider Table1 - 

ShopNbr	Item
1	100
1	101
1	102
2	101
2	104
3	100
3	102
3	103
3	104
3	105

I have provided a snapshot of the data above. The actual data will have close to 1000 shops and 300 items. Each shop need not have all 300 items.We have Shop-Item level data here.
In the above case, I need to choose 2 shops which will capture the maximum number of distinct items. i.e. If i choose Shops 1 and 2, I will get items 100, 101, 102, 104 ( 4 items) and similarly i need to check for combinations 2&3 and 1&3 and choose the one which captures most items. 
I can do this through a cross join but my problem is in my actual data I have close to 1000 shops. We need to know the most efficient way to do this.
In simple words I need the best combination (lets say I need to choose 5 out of 1000 shops), it would be 1000C5 iterations which is nearly 1000^5 which is not practical.
Also note we do not have access to stores procedures in our servers. 
Thanks in advance. Any help will be extremely useful :)
Karthik

CK
karthikcsekar 17 posts Joined 02/14
02 Mar 2014

Any suggestions ?

CK

karthikcsekar 17 posts Joined 02/14
04 Mar 2014

Dieter - Any suggestions ?

CK

dnoeth 4628 posts Joined 11/04
05 Mar 2014

Hi Karthik,
i don't think there's a simple solution, especially without Stored Procedures.
As it's only 300 items you might use bit-functions to create a bit-string and then BITOR and COUNTSET to combine and count.
For two shops it's only 1000 * 1000 rows and maybe you can apply some rigid filter before you add the next shop to reduce the number of comparisons.

Dieter

karthikcsekar 17 posts Joined 02/14
05 Mar 2014

Dieter - Thanks a lot.
If I give a rule saying that always choose a shop which has maximum items, can I write a normal SQL code to achieve that ? 
In other words, among all 1000 shops I will identify 1 shop which has maximum items, select that shop, delete all the items chosen in that shop, and again identify 2nd shop with maximum items and so on till 5 shops or even 10 shops if needed. ( I will know that number of shops)
How easy is this ? I am not able to come up with a coding logic for this though in Teradata.
Thanks again.

CK

emilwu 72 posts Joined 12/07
05 Mar 2014
CREATE VOLATILE TABLE T 
(
SHOP INT,
ITEM INT
)
UNIQUE PRIMARY INDEX(SHOP,ITEM)
ON COMMIT PRESERVE ROWS;
INSERT INTO T 
VALUES (1,100);
INSERT INTO T 
VALUES (1,101);
INSERT INTO T 
VALUES (1,102);
INSERT INTO T 
VALUES (2,101);
INSERT INTO T 
VALUES (2,104);
INSERT INTO T 
VALUES (3,100);
INSERT INTO T 
VALUES (3,102);
INSERT INTO T 
VALUES (3,103);
INSERT INTO T 
VALUES (3,104);
INSERT INTO T 
VALUES (3,105);


SELECT SHOP1, SHOP2 , SUM(CNT) AS TOTAL_CNT
FROM 
(
SELECT T1.SHOP AS SHOP1, T2.SHOP AS SHOP2, COUNT(DISTINCT T1.ITEM) AS CNT
FROM 
T T1 
INNER JOIN
T T2
ON
T1.ITEM = T2.ITEM AND
T1.SHOP LT T2.SHOP
GROUP BY T1.SHOP, T2.SHOP
)A 
GROUP BY 1,2
QUALIFY ROW_NUMBER() OVER (ORDER BY TOTAL_CNT DESC) =1

 

dnoeth 4628 posts Joined 11/04
05 Mar 2014

Hi Karthik,
imho you can't choose the shop with the maximum number of items, e.g.
shop 1: 2,3,4,5,6,7
shop 2:1,2,3,4
shop 3: 5,6,7,8
shop 1 + 2 or shop 1 + 3 has less items than shop 2+ 3

Dieter

karthikcsekar 17 posts Joined 02/14
05 Mar 2014

Agreed totally with you Dieter. But in the actual data we have we feel that the case you have explained would be pretty rare and we are okay with ignorning that though I understand we are compromising the accuracy of the model. 
But if we are fine with it, I still am not able to come up with the macro which can perform the same for me.

CK

karthikcsekar 17 posts Joined 02/14
09 Mar 2014

Dieter - Sorry again but any new ideas ?

CK

padhia 35 posts Joined 06/10
10 Mar 2014

Although this is an interesting problem, but I would agree with Dieter in that, this is more of an algorithmic efficiency problem than data access efficiency problem. At 1000x300, your entire problem can be represented in 37.5K bytes of storage (or 300K if you use byte instead of bit). It will be more efficient to use SQL to retrieve entire data into memory and then solve the problem using procedural language rather than solve the problem using SQL. Formally, this class of problems belongs to combinatorial optimization problems (http://en.wikipedia.org/wiki/Combinatorial_optimization).
 
Good luck.

krishaneesh 140 posts Joined 04/13
10 Mar 2014
create table shopping 
(ShopNbr integer,
 Item integer)
 primary index (shopnbr);

insert into shopping values (1, 100);
insert into shopping values (1  ,101);
insert into shopping values (1 , 102);
insert into shopping values (2,  101);
insert into shopping values (2  ,104);
insert into shopping values (3 , 100);
insert into shopping values (3,  102);
insert into shopping values (3  ,103);
insert into shopping values (3 , 104);
insert into shopping values (3,  105);
insert into shopping values (4  ,103);
insert into shopping values (4 , 104);
insert into shopping values (4,  105);
--create a permanent table with the distinct combination of shops

create table shop_list as(
sel a1.shopnbr shop1,a2.shopnbr shop2 from shopping a1,shopping a2
where a1.shopnbr<>a2.shopnbr group by 1,2) with data primary index(shop1,shop2);

--The below will give 2 of the rows for the same combination

sel shop1,shop2,count(*) from 
(sel b.shop1,b.shop2,item from shopping a join shop_list b on a.shopnbr=b.shop1
union
sel b.shop1,b.shop2,item from shopping a join shop_list b on  a.shopnbr=b.shop2) c group by 1,2 order by 3 desc

 

krishaneesh 140 posts Joined 04/13
10 Mar 2014
shop1 shop2 Count(*)
3 2 6
2 3 6
1 4 6
3 1 6
4 1 6
1 3 6
4 3 5
3 4 5
1 2 4
2 1 4
4 2 4
2 4 4

The above mentioned is a bit cumbersome process but can be extrapolated for 5 shop combinations. This can be done with a volatile table. but if the report is required then instead of building volatile table multiple times, a permanent list with all the combinations of shops would be an idea. Here i considered only a combination of 2 shops in whcih we will get 4P2 conmbinations. the first 2 rows of the output points to the same combination.
HTH

krishaneesh 140 posts Joined 04/13
10 Mar 2014
create table shop_list2 as(
sel a1.shopnbr shop1,a2.shopnbr shop2,a3.shopnbr shop3 from shopping a1,shopping a2,shopping a3
where a1.shopnbr<>a2.shopnbr and a2.shopnbr<>a3.shopnbr and a1.shopnbr<>a3.shopnbr group by 1,2,3) with data primary index(shop1,shop2,shop3);

sel shop1,shop2,shop3,count(*) from 
(sel b.shop1,b.shop2,b.shop3, item from shopping a join shop_list2 b on a.shopnbr=b.shop1
union
sel b.shop1,b.shop2,b.shop3, item from shopping a join shop_list2 b on a.shopnbr=b.shop2
union
sel b.shop1,b.shop2,b.shop3, item from shopping a join shop_list2 b on a.shopnbr=b.shop3) c group by 1,2,3

 

krishaneesh 140 posts Joined 04/13
10 Mar 2014

This would be for combination of 3 shops. Notice that there would be 3 join conditions whn joining the 3 tables instead of 2 joins for 3 tables. With the limited data that i have, the result of this is only 6 for all the combinations but i am sure this will work for 1000 with 5 combination. This would definitely involve heavy cpu particulary during the first create table step. Also please be aware that there would be repeating combinations like with 3 shops 124,142,421,412,214,241 correspond to the same combination but all of them will have the same result. Similarly when dealing with combination of 5, it will have the 5! vlaues having the same combination(i.e., if we have only a single combination maximum, then that will have 120 combinations corresponding to the max value but any one of them can be taken).

karthikcsekar 17 posts Joined 02/14
24 Mar 2014

Thanks all. We managed to do this through a different medium

CK

You must sign in to leave a comment.