With increased workload complexity, it is common for a system to run in full capacity. There is always a need to find resource consuming sessions and bad running queries. The intention of this article is to discuss a few ways to identify resource consuming sessions or bad running queries using Teradata Viewpoint.

The quickest way to identify resource consuming session is to look at Top Session By menu. In the preference section of Query Monitor portlet, click display tab and select Top Session Graph. The Top Session By menu will then be displayed on the Query Monitor portlet: select the appropriate metric in the drop down box and the bar next to it shows sessions with the largest value. The wider the bar, the more resources the session is consuming. Mouse over on the top of the bar to see the session value and click on the bar to see the session details. (See Figure 1).

Figure 1

Another way to monitor resource consuming session is to select the desired metric in the Configure Column option of Query Monitor and then sort the column in descending order. For example: to identify the top CPU consuming session, in the Configure column option select CPU USE. Sort the CPU USE column in descending order. (See Figure 2). The Set button in Configure Column can be used to set a threshold value. Queries consuming resources above the threshold value will then be shown in red.

Figure 2

If it appears that there are too many queries in Query Monitor to look at and you only want to look at the queries that exceeded certain value, in the Preference option specify the threshold value in criteria tab and then select By Session - My Criteria to see only those queries that exceed the threshold. 

To identify the bad running queries for tuning, in the preferences section of the Query Spotlight portlet select appropriate criteria and define the threshold for bad queries. All the queries that exceed the threshold will be displayed in Query Spotlight. Select Duration in Configure Column and sort by it. Click on the session and start analyzing it.

Below are some tips to analyze queries in the Query Spotlight portlet.

  1. The width of the time bar indicates the total elapsed time for a query. To zoom in to a step, adjust the width of the time bar.
  2. The snapshot tab shows a snapshot of the metric value for a point in time indicated by the slider bar.
  3. The moving slider bar will refresh the stats in the snapshot and when the metric value is in red, it indicates that at that point in time  themetric value exceeded the threshold.  (See Figure 3). Enabling Rewind at that point in time and using other monitoring portlets will provide a holistic view of the system situation at that point in time.

Figure 3

In summary, there are many ways to identify bad running queries or top resource consuming queries using Viewpoint. Top session By, My Criteria and the Configure Column option in Query Monitor can be used to identify resource consuming sessions running on the system at that point in time, and using the Query Spotlight portlet along with the Rewind feature can be used to identify and troubleshoot any resource consuming sessions or bad/long running query in the past.

Discussion
Subbu99 2 comments Joined 05/05
12 Oct 2012

Neat article and nice quick tips to identify bad queries on the fly while monitoring the box.

Thanks,
Subbu

Shrity 61 comments Joined 05/12
12 Oct 2012

Thanks Subbu !!

sandipan23 1 comment Joined 05/10
14 Oct 2012

Good one. Thanks for sharing the information.

teradatauser2 29 comments Joined 04/12
14 Oct 2012

"Queries consuming resources above the threshold value will then be shown in red"- What is the threshold value that we need to put here. Generally we use the Request CPU to monitor the Viewpoint.

teradatauser2 29 comments Joined 04/12
14 Oct 2012

the CPU use shows the value as percentage. Could you please elaborate about this value.

Shrity 61 comments Joined 05/12
15 Oct 2012

REQ CPU shows number of CPU seconds used by the current request so far. This is a good metric when we want to see query that consumed maximum amount of CPU so far.

CPU USE is the CPU used during the last sample period divided by the total available CPU (system-wide) during that sample period. This means it is Percent of the total amount of available CPU used by the query. This is a good metric to identify query that is currently consuming high CPU.

It is possible for a query to have high REQ CPU but low CPU USE. Usage of REQ CPU and CPU USE depends on what we are trying to achieve. I usually use REQ CPU when I try to find out bad running queries for tuning purpose. But use CPU USE when my system is running high on CPU usage and need to identify session that is currently consuming high CPU.

Let me know if you have any question. :-)

teradatauser2 29 comments Joined 04/12
15 Oct 2012

Thanks for the answer !!

16 Oct 2012

Good tips on using viewpoint to track the queries.

Shrity 61 comments Joined 05/12
18 Oct 2012

Thanks Surya !!

vipenkoul 5 comments Joined 02/10
19 Oct 2012

I do not have access to viewpoint and in order to tune our queries I have the following questions for you:

1. Is there a way to find this info with the help of queries manually?
2. Is there a way to find the exact threshold ratio set with the help of queries manually?

Thanks,
Vipen.

Shrity 61 comments Joined 05/12
19 Oct 2012

Vipen,

I am not aware of any way via the query to do live monitoring of Teradata systems. Viewpoint does live monitoring of Teradata system, so you can know at that point in time the step that is being executed by the query or how much resource is being used by the query etc.. It can also simulate the live monitoring by taking you back in time using rewind feature.

Other thing what Viewpoint does is trending analysis using Metric Analysis/Metric graph portlet. This can be done querying DBQL, resusage data because this is looking at historical data.

If you are looking for all the completed queries to identify bad running queries you can query DBQL tables.

However the amount of information you can get using the DBQL/Resusage table depends on the logging that are enabled. If you do not have logging for explain enabled you will not see the explain.

Can you elaborate more on #2. What are the threshold ratio you are looking for ?.

Thanks,
Shrity

vipenkoul 5 comments Joined 02/10
19 Oct 2012

For #2, I am looking for a ratio CPU to Disk IO, because of which our queries are becoming a killing target by TDWM.

SmarakDas 25 comments Joined 02/12
22 Oct 2012

Hello Shrity...Thanks for the article. Very nice elaboration.
I would like to clear a few things:
(a) Amongst CPU Use and Impact CPU, which one is a more dominant factor for identifying Bad Queries to be Tuned ?
(b) It's a difficult/misguided task zeroing on CPU Use alone for identifying Queries to be tuned. So, amongst other metrics like Spool Usage, Impact CPU, CPU Skew, CPU Use, ReqCPU, Total IO Count, PJI; which 3-4 metrics are very critical for identifying Bad Queries ?

My Opinion is (In Decreasing Preference) : ReqCPU (As mentioned by you) | CPU Skew/CPU Use | Total IO Count/Impact CPU | Spool Usage/PJI

I understand that such questions are best answered via experience, but I am looking for a near-ideal picture for identifying Bad Queries.

Shrity 61 comments Joined 05/12
22 Oct 2012

Vipen,

You are talking about TASM rules. You may have to work with your DBA to identify what are those..

Thanks
Shrity

Shrity 61 comments Joined 05/12
22 Oct 2012

Samark,
You are welcome and thanks for the comment,

In my opinion, there is no single definition of bad queries and the definition may change based on workload/jobs running. If my system is I/O bound I would not care much about CPU usage, this is the reason we don't have a bad query indicator in Viewpoint. For live monitoring if I encounter a situation that system is running slow, I usually see system health to see which metric is going beyond the threshold and identify the query that has maximum resource consumption for those metrics in query monitor portlet and label it saying these are bad queries at this point in time.

(a) Amongst CPU Use and Impact CPU, which one is a more dominant factor for identifying Bad Queries to be Tuned ?
>> CPU Use and Impact CPU have different Usage. Impact CPU shows "CPU time in seconds of the highest CPU utilized AMP" so it is better used along with CPU Skew. CPU USE is only available when query is running as that shows CPU Utilization by a query at that point in time. If my system has high CPU Utilization I would use CPU USE to identify which query is using more CPU at that point in time. If it has high AMP CPU Skew I would use CPU Skew and Impact CPU.

(b) It's a difficult/misguided task zeroing on CPU Use alone for identifying Queries to be tuned. So, amongst other metrics like Spool Usage, Impact CPU, CPU Skew, CPU Use, ReqCPU, Total IO Count, PJI; which 3-4 metrics are very critical for identifying Bad Queries ?
>> To Identify bad queries, I usually use Query Spotligtht portlet (If I don’t know what metric I am targeting at) and go by Duration as long running queries has more tuning opportunities than a short tactical queries. Then I look at steps that took long to complete. Then I start dissecting the steps to see identify resource that is consumed more in that step using the Snapshot. If I don't find anything I enable "control rewind" to see what was going on with system at that point in time, especially system health portlet to see resource consumption and correlate that to query resource consumption..

Yes you are right we cannot use only one metric to say query is good/bad. Instead for completed queries I think we need to first identify which metric is making that query run slow. For running queries we need to look at the metrics that is impacting my system performance at that point in time.

This is purely how I look at things... :-)

Thanks, Shrity

vipenkoul 5 comments Joined 02/10
22 Oct 2012

So it means that there is no way to find the TASM rules from the database views/tables related to TDWM?

Shrity 61 comments Joined 05/12
22 Oct 2012

You can know by quering TDWM tables but not everyone have access to it.

vipenkoul 5 comments Joined 02/10
23 Oct 2012

Thanks for the information.

Robins Lazer 1 comment Joined 11/12
08 Nov 2012

Nice article Shrity.

Shrity 61 comments Joined 05/12
09 Nov 2012

Hey Robins.. Thanks !!

vijaychris 3 comments Joined 05/09
27 Nov 2012

Hey Shrity, Good Article ! A question on the CPU % Use. I see many sessions with more than 100% CPU usage. I would expect it to be below 100%. Is my assumption incorrect?

br230015 7 comments Joined 09/11
14 Dec 2012

Hi Shrity,
Can you explain Delta CPU and relate it to CPU Tunning? What is the best way to control high IO utilization? Any tip for tunning IObound system?

Thanks in advance.
Badshah Rehman

Teradata

Shrity 61 comments Joined 05/12
02 Jan 2013

Thanks Vijay. Can you let me know which CPU related metric you are refering to, that will help me in answering the quetion. I assume it is in query monitor portlet.

Shrity 61 comments Joined 05/12
02 Jan 2013

Hi Badshah Rehman,

Delta CPU is Total CPU usage time consumed, in seconds, since the last sample. This is a good metric to identify CPU consuming sessions at that point in time, especially if the value is very high it “may” indicate query is executing an expensive product join.

It requires analyzing historical data of system to build a strategy to tune any system. However I think having a good combination of CPU bound queries with IO bound queries can improve resource utilization of a system.

vijaychris 3 comments Joined 05/09
02 Jan 2013

Yes, it's in the Query Monitor Portlet. I'm referring to the CPU Use metric. It's showing numbers that are greater than 100.

Shrity 61 comments Joined 05/12
04 Jan 2013

Vijay, it sounds like a bug please open an incident.

TDAdmin 1 comment Joined 06/10
24 Jan 2013

Great discussion so far. I have few questions...

1. Is there a way to calcaulate Request CPU and AMP CPU Skew from any of the DBQL tables after queries finished running?
2. How to determine (via process or a formulae) if a Teradata system is CPU or I/O bound?

skneeli 2 comments Joined 10/09
24 Jan 2013

Nice article Shrity

Shrity 61 comments Joined 05/12
25 Jan 2013

Ram,

You field guy may have an answer to your questions. I am not an expert in DBQL area so cannot comment.

Thanks,
Shrity

Shrity 61 comments Joined 05/12
25 Jan 2013

Thanks Satheesh !!

vijaychris 3 comments Joined 05/09
05 Feb 2013

Hi Shrity, GSC pointed to a knowledge article KAP2B5BF6 which had some explanation on why the CPU Usage is > 100%. I thought I'll let you know about this. Thanks.

ceslag 2 comments Joined 04/11
04 Jul 2013

Hi Shrity,
Very nice article!
Anyway, how about a system with high Node CPU Skew? Which metrics do we need to check to be able to identify suspect queries?
I appreciate your response.
Thanks,
Cesar

Shrity 61 comments Joined 05/12
08 Jul 2013

Thanks Cesar, That's a good question. One way I could think off is by identifying if there is a Node CPU Skew. This can be done by looking at System Health portlet, if you see high Node CPU Skew you can go to Node Resource portlet to find the Node that is skewed and then drill down to skewed node to see the Vprocs that is skewed. In Viewpoint 14.10 we have HOT AMP report in query monitor, the AMP that you see skewed in Node Resource portlet should be listed in HOT AMP report. You can drill down to the AMP and see the session that are logged on to the AMP and find the session that is skewed.

Proactive 2 comments Joined 07/14
30 Jul 2014

Hi Shrity

What should I do for system slowness issue. I also did system throttle settings.But sometimes delays goes so high. What should a DBA do for this? I consider Impact CPU, CPU use ,  Req CPU parametrs for responsible queries but I want to know how to use them step by step.

This is one instance when system slowness issue comes

358   339             26         0       308       0       3        19      2

All    Not idle      Active    Block delay  abort  resp    idle    parse

how to investigate this slowness issue, Would you plz throw some light how it can be resolved.

Thanks in advance

Shrity 61 comments Joined 05/12
01 Aug 2014

I don't have a straight forward answer to resolve slowness issue. Neither I think there is any rule of thumb. System slowness can be for multiple reasons, few that I can think of..
1. One Query or set of queries consuming lot of resources 
2. Several queries getting blocked (This is not true in your case)
3. Node parallel efficiency is impacted (Node resource portlet can help understanding this)
4. Bad TASM rules 
5. System is in flow control.. Out of AWTs..
6. etc..
What I see is that you have only 26 active sessions but 308 sessions in delay queue.. You may start with checking if I am completly utilizing my system resource, using system health portlet. If not, then revisit your TASM rules so see why you have 308 queries in delay queue. If yes, then what are the active queries doing, are there any bad long running queries hogging up all the resources (if system health shows high CPU utilization check for High impact CPU, High CPU Use..)...  Other than that I can't say much without looking at your system.. Sorry I was not of much help here..
Thanks
Shrity

Proactive 2 comments Joined 07/14
29 Aug 2014

Thanks Shrity so much.
Would u plz tell me what is requests (count) in Session Info in query monitor overview tab. Is it also related to slowness issue? Means culprit queries can have high requests count? 

Shrity 61 comments Joined 05/12
02 Sep 2014

Requests (count) in Session Info in query monitor is number of requests (Queries) submitted by the session. If a session is consuming high resource and request count is high you know why resource consumption by that session is high, but if a session is consuming high resource and request count is low 2 or 3 you may want to look at those queries.

gbindra 2 comments Joined 06/12
20 Nov 2014

Is there a way to display the query based on SQL Text. There are some adhoc queries run by users which we want to kill as soon as they come to the system. We can't track DBQL because it only registered in DBQL after the completed execution. Is there a way in viewpoint we can highlight them based on sQL text pattern match instead going through each SQL.
 
 

Shrity 61 comments Joined 05/12
20 Nov 2014

There is no way you can highlight query based on SQL txt. If you could attach a queryband to the query then you can either filter the query using workload management rule or sort it in the datagrid of query monitor portlet.

kvz 15 comments Joined 08/13
08 Sep 2015

Hi Shrity,
Need help for alert setup.
How can i set alert mail for PJI > 5 in viewpoint? If not, what is the other way to get alert for high PJI?

Shrity 61 comments Joined 05/12
14 Sep 2015

You cannot set alert on PJI but PJI is  calculated as "when ampcputime < HashAmp()+1 or TotalIOCount = 0 then 0  else (ampcputime *1000)/TotalIOCount"  and you can set alerts on AMP CPU  

kvz 15 comments Joined 08/13
26 Oct 2015

Thanks Shirty!

kvz 15 comments Joined 08/13
29 Oct 2015

Is there any way to schedule script which will send alert email for sessions which are in delayed state for more than 30 min ??

You must sign in to leave a comment.