All Forums Viewpoint
mikesteeves 46 posts Joined 10/11
23 Jan 2014
TASM Workloads and frequent demotions

We have TASM setup to demote user queries from USERS-SHORT to USERS-MEDIUM and then USERS-LONG, after some CPU seconds consumed threshold.  We are seeing a number of queries get sent to our USERS_SHORT workload, only to be demoted to USERS_MEDIUM and then USERS_LONG.  By a number I mean around 150 queries an hour.  Since these are end users and we can control what they write, I don’t know if it makes sense to adjust/raise the thresholds that dictate demotion and let them live in the higher priority workloads longer or not.  Some of my concerns are:
Ø  Is this demotion of so many queries taxing on the system?
Ø  Does it make more sense to leave the threshold of USERS_SHORT at 2 CPU seconds and therefore preserve that slice for the truly short queries?
Ø  Does our version of the database (V2R13.10.01.04b) have an issue with determining accurately what the query is going to use and therefore starting them in the wrong workload to begin with?
I remember reading that if you have a lot of demotion going on, you need to go back and tweak your WDs but I’m not sure what that would do, if the optimizer is the who decides what WD a query is a given.
Thanks for any insight you can offer.
-Mike
 

andydoorey 35 posts Joined 05/09
24 Jan 2014

One of the biggest issues with queries being demoted in this way is that queries are jumping the queue by starting in short.  If lots of queries do this then you may end up with more queries running on the system than you want, and queries that are assigned to the long workload in the first place may not ever end up running, as the number of active queries in the workload may all be from demoted queries.
It sounds like the estimates for these queries are very inaccurate if TASM thinks they are under 2 cpu seconds and they end up in the long queue.  Check the TDWM estimate columns in DBQL to compare the estimated resource with the actual resource used.  If these are vastly different run an explain on the query and see if it looks sensible.  You can also use rewind in viewpoint to compare estimated rows with actual rows in each step.
I suspect that statistics may be out of date, not collected, or worst of all collected on empty tables, making the estimates inaccurate.
I've seen a very busy system before where collecting stats made a query run take longer to run.  Not because it made the query less efficient, but because TASM had a better estimate for the query so allocated it to the correct workload where it was queued.  Before the stats were collected the query was less efficient, but as it ran immediately then to the user it completed in less time and therefore seemed better.
Andy

mikesteeves 46 posts Joined 10/11
24 Jan 2014

Thanks for the quick feedback.  We have regular stats collection jobs in place and we validate the stats once a week to make sure they are all current.  I'll check these specific queries to see if they are hitting tables that may not be picked up by our regular stats collection jobs.  
Thanks agian,
              Mike

crewdy 8 posts Joined 03/10
06 Feb 2014

We have a let's say "challenging" workload. Andy know's as we both worked on the same datawarehouse. As Andy says statistics are imperative and collection schedule is too. Collect only if 10% of the rows have changed, if you collect too many times then this will impact on performance. Collect too infequently or not at all then you will get a drift in how many queries are reclassfied and eventually if the statistics are so far out the Optimiser could pick a wholly inappropriate execution plan and consequently be classified incorrectly. As you say the exception criteria have to carefully considered so as not to interfer with correctly classified work  in a lower priority workload.
I have done extensive analysis on TDWM estimates, classification and recleassification and have been able to reset the exception rules to demote based upon this data however it is constant challenge to get users to recollect stats at appropriate times. 
Happy to share findings
Nick

You must sign in to leave a comment.