All Forums Database
Glass 225 posts Joined 04/10
06 Feb 2012
Performace of Count(Distinct)

I have many users issuing queries that include multiple count(distinct) clauses in the select list. when .5 to 2 B rows are being accessed the queries sometimes run 10 hours or so.

Has anyone found a way to optimize such queries?

robpaller 159 posts Joined 05/09
06 Feb 2012

Are these simple SELECT statements or more complex queries with multiple joins? 

Glass 225 posts Joined 04/10
09 Feb 2012

More complex with two or 3 joins usually.

Glass 225 posts Joined 04/10
09 Feb 2012

Example:

SELECT    upc_num,
    wd_cd,
    COUNT ( DISTINCT household_num ),
    SUM ( Ext_Paid_Price_Amt ),
    COUNT ( DISTINCT Unique_Trans ),
    SUM ( Ext_Item_Qty )
    FROM crm_view.Customer_Transaction_Item,
        user_view.INVRY_PROD_UPC_XREF_CRNT
    WHERE crm_view.Customer_Transaction_Item.upc_num =
    user_view.INVRY_PROD_UPC_XREF_CRNT.scn_id
        AND  ( trans_dt BETWEEN '2010-06-30' AND '2010-09-21'
    OR    trans_dt BETWEEN '2011-06-29' AND '2011-09-20' )
    GROUP BY upc_num,
        wd_cd
    ORDER BY upc_num,
        wd_cd;

gotuchintu 32 posts Joined 12/05
13 Feb 2012

I would suggest to create a sparse aggregate join index for the query, if possible. That will help a lot

Somnath Roy

robpaller 159 posts Joined 05/09
13 Feb 2012

Couple of thoughts come to mind:

  1. AJI would be one option, as mentioned, depending on your load strategy for the tables. Some load utilities require that AJIs be dropped and recreated after the load has completed.
  2. Have you considered a denormalized, static fact table that is maintained as part of the daily ETL process that would satisfy this type of user access? The query appears to be quarterly in nature. Your denormalized fact table(s) could be produced at different calendar levels to facilitate easy access. (e.g. Weekly, Monthly, Quarterly, Annual, etc.) Then maximize the use of compression to reduce the storage requirements.
  3. What partitioning has been implemented on the underlying tables, if any?
  4. Do the users always run this report for entire corporation and all products sold? How much of the data is ignored by the users because they are only interested in how well fishing poles and tackle vs. swimming pools and goggles sold during the Summer season?
Glass 225 posts Joined 04/10
17 Feb 2012
  1. Thank you all for the responses:

The main fact Table cannot easily have an AJI due to near constant usage and size(1 TB).

I have created a denormalize table to test but the rusults where not a lot better, considering the overhead that would be required (the fact table is continous load and the denormalized version would need to be close to that)

The fact table is partitioned on trans_dt as is the denormalized table.

The users submitting this type of query are mainly the Marketing folks and I'm suspect they are pulling more data than is actually used but theres little hope of convincing anyone they don't need it all.

 

I think I will test an AJI on the denormalized version.

 

Thanks,

 

 

 

 

 

You must sign in to leave a comment.