NOTE:  This content is only relevant to Teradata 12.0 and earlier releases.

If you’ve been monitoring AWT usage by tracking the sum of WorkNew and WorkOne counts over time, you’re going to need a new approach in Teradata 12, and I’ve got one for you.  This builds on one of my postings earlier this year (Simplifying AMP Worker Tasks Monitoring and Getting it Right) where I explained how to interpret the in-use count columns in ResUsageSAWT when the collection and logging rates are different. Since then, I’ve been looking over more real-world SAWT data (most with logging rates of 600 and collect rates of 60) and I’ve got some additional advice I’d like to share with you.

Why Discontinue the Old Approach?

It’s been common for DBAs who care about AMP worker tasks to track usage over time by collecting either puma –c or awtmon output every minute or 10 minutes or so, then extracting the in-use counts for WorkNew and WorkOne at the same point in time, and seeing how close their sum is to 62. If you’ve got the default of 80 AWTs per AMP, 62 is the maximum number of AWTs available to support user-initiated work (usually WorkNew and WorkOne). That’s the old approach.

So here's why you might want to consider a new approach and what that new approach might be.

After following the recommendations in the earlier posting, if you’re logging and collection rates are different you’re dividing the reported in-use counts by the CollectIntervals or the Active column value. As a result, you will have a calculated average in-use count for each work type. That average is independent of the averages of the other in-use counts. This means you can’t just add WorkTypeInuse00 and WorkTypeInuse01 together anymore and expect a sum that never exceeds 62, because the two in-use counts are no longer correlated point-in-time snapshots. Because AMP worker task in-use counts tend to go up and down very quickly, the in-use counts must be viewed at the same point in time to make sense of them when considered together. Under these circumstances, if you continue tracking AWTs by adding up WorkNew and WorkOne counts, you’ll get softened numbers, not the “how high did they go today” numbers that your really need to see to do your job.

What’s the New Approach?

Here’s some real-world output where the division by the CollectIntervals column has already been done on the in-use columns. I’m showing you 5 different work types because those are the 5 that had some activity. Only 5 AMPs are reported, all for the same logging interval:

Let’s consider the several ways this data could be used to monitor AWT in-use counts: But first notice that each “averaged” work type in-use count is significantly below its respective max in-use count (compare Inuse00 against Max00, for example). This is the softening impact of the averaging that the in-use counts have undergone.

What about summing the in-use counts? If we sum the Inuse counts for AMP 0 (21.4 + 12.9 + 12 = 46.3) it would produce a sum that is nowhere near the InuseMax number of 71 for that same AMP. InuseMax represents the highest count across all work types (it sums all work types during the same point in time) during the logging interval. Summing the in-use counts is summing averages, so the worst case AMP worker task combined usage will be missed.

What about summing the Max counts? If we sum the Max column values for AMP 0 (40 + 31 + 13 + 14 + 5 = 103), we end up with a number (103) that is unrealistically high compared to the number of AWTS defined per AMP (80). So this is not a useful monitoring approach.

What about InuseMax? The piece of data carried in the ResUsageSAWT table that is the most self-consistent is InuseMax, in that it truly reflects the total usage of AWTs at a particular point in time. It can never exceed the number of AWTs defined on the system per AMP. It is also a reasonable metric because it reflects the worst case for the logging interval. When you’re monitoring AMP worker tasks, you want to know the worst case. It does include all work types, rather than the more familiar WorkNew and WorkOne, but to my way of thinking, that broader inclusion makes it even more relevant.

My vote goes to the InuseMax column for the best choice for tracking AMP worker tasks usage levels when you’re using the ResUsageSAWT table.



monisiqbal 17 comments Joined 07/09
13 May 2010

Carrie, do the ResUsageSpma counterparts of these metrics behave in the same fashion?
I'm talking about AwtInUse and AwtInUseMax which give data on a per node basis.
AwtInuse --------- Number of AWTs currently in use for this node.
AwtInuseMax --- Peak number of AWTs (Max) on this node. This reported Max value is the maximum reached during each log period.

You talked about the softening impact of the in-use counts because of averaging. I presume, in ResUsageSpma the avg. in-use counts are summed up on the node. I'm not sure about it as it isn't evident from documentation. But I suspect summation because we in our data we that the in-use counts are much greater than the Max counts. Can you please confirm?

carrie 595 comments Joined 04/08
14 May 2010

In the article, when I talked about softening due to averaging, I was talking about the situation where the logging and collect intervals were different. In that case you would have to divide the numbers reported for the various work type inuse counts by the number of collect intervals in the logging interval. When you do that you lose sense of any peaks and valleys across different collect intervals.

If you have the same logging and collect interval, there will be no averaging required.

With the SPMA table, what is reported in the AwtInuse column is the average number of in-use AWTs across all the AMPs on that node. You are correct in that regard that there is some averaging going on. What you will not see in this number is any skew in the in-use counts across different AMPs. But SPMA is designed to provide node-level data, so that should information about skew across AMPs should not be expected from the SPMA table.

By way of example, I have some SPMA and matching SAWT data I just now compared, and here's what I found looking at one logging interval on one node:

- SPMA: AwtInuse = 52 and AwtInuseMax = 65

- SAWT: My total of all worktypeinuse counts for all AMPs on the node was 1879; I divided that by the number of AMPS (36) and got 52.19, the same number reported in SPMA AwtInuse. The largest (max) number in the InuseMax columns for all the AMPs in that logging interval was 65, same max as reported in SPMA for this node.

My conclusion is the numbers do correlate across the tables, but you are getting different types of averages depending on the table.

Thanks, -Carrie

monisiqbal 17 comments Joined 07/09
14 May 2010

Good to know that the averages aren't skewed.

If the logging rate and collect interval are the same then the AwtInUse count shouldn't be greater than the AwtInUseMax, right?

And from your previous article we learnt that we should divide the in-use count by the CollectIntervals however I'm confused if we should divide the max in-use count as well?

carrie 595 comments Joined 04/08
18 May 2010

You are correct: If logging/collect interval the same, AwtInuse should never be greater than AwtInuseMax.

You never want to divide a max column in any ResUsage table by CollectIntervals. The max represents the maximum value within the reported logging interval.

monisiqbal 17 comments Joined 07/09
18 May 2010

Thank you very much Carrie. After using the CollectIntervals our data indeed looks good.
Apart from the ResUsageSawt table we have some doubts around data aggregation for AWT data in ResUsageSpma. Can you please throw some light on the matter:

(PS: I know I've been bugging you a lot :). Thanks again for the knowledgeable articles and the clarifications)

carrie 595 comments Joined 04/08
21 May 2010

AwtInuseMax gives you the maximum AWT count from all the AMPs in the node. Please refer to my example above.

monisiqbal 17 comments Joined 07/09
23 May 2010

Appreciate your help. Thanks

LUCAS 17 comments Joined 06/09
23 Jun 2014

what about AWT analysis in V14.10 ?
CollectIntervals column does not exist any longer, so i wonder wether another column is superseding the old one,
or the difference between Log Time and CollectTime has disappeared ?
Maybe an update document about AWT for V14.10 is available ? I couldn't find it.
i'm trying now to produce graphical views on InUseMax across days, with a bar by Node: does an average InUsemax makes sense ?

carrie 595 comments Joined 04/08
24 Jun 2014

Hi Pierre,
Starting in 14.0 and up, there is no longer a CollectIntervals column in any ResUsage table.  In the ResusageSAWT table,  MailBoxDepth and WorkTypeInUse00-15 are now track fields and no longer needs to be divided by the CollectIntervals column. The contents in those fields represent a snapshot taken at the end of the logging interval.  In prior releases the contents of those fields represented a sum of snapshots taken at the end of each collect interval.  So you had to divide by collectintervals in order not to get an inflated number.  You no longer are required to do that.
The 14.0  AMP Worker Task and ResUsage Monitoring orange book (available at Teradata at your service) documents this in Chapter 8.
When it comes to AMP worker tasks monitoring, I usually prefer to look at max of Inusemax rather than the average, so I can see what the worst case AWT usage is on any one AMP (either on the node or systemwide).  The worst case is usually more important than the average because if one AMP runs out of AWTs, it will impact all queries doing all-AMP operations.  In that regard max of inusemax is more actionable than is the average.  It will also allow you to more easily identify skewed processing on a node (where one or more AMPs on the node are holding on to AWTs longer than AMPs on other nodes).  But it really depends what you want to get out of the monitoring.   There is no one right way to use this information.
Thanks, -Carrie

You must sign in to leave a comment.