Last month I talked about things that don’t honor a CPU limit, and explained what a CPU limit is. This month I’d like to look at CPU limits from a slightly different perspective—What happens when you define CPU limits at multiple levels? For example, you may already have a system level CPU limit on your platform, but now you’d like to use CPU limits on one or two of your resource partitions (RP) as well. Yes, you can do this. Read along with me while I explain to you what you can expect.


When there are two levels of CPU limits present, each CPU limit will enforce its specified percentage of CPU as though it were the only CPU limit active. There is no internal adjustment made to interpret a resource partition level CPU as a lower percentage limit when a system-level CPU limit is introduced.


Priority Scheduler considers several factors in apportioning CPU in the presence of multiple levels of CPU limits. It determines whether any other resource partition (other than the one with the limit) is active and using CPU. It also considers whether the CPU usage of those components that are under the control of a limit are reaching the defined CPU limit or not. During a typical age interval (usually 60 seconds), Priority Scheduler may use different formulas to apply to resource partitions and allocation groups at different times, sometimes restricting CPU, sometimes not, but always with these goals in mind:

  • The system-level CPU limit will not be exceeded.
  • Any RP-level CPU limits will not be exceeded by the combined CPU usage of active allocation groups running under their control.
  • If an RP-level CPU limit and a system-level CPU limit are both present, then the CPU consumed by the RP may be restricted to less than its defined CPU limit due to fair enforcement of the system-level CPU in the presence of multiple active RPs.

What this final bullet is saying is this: Dual levels of CPU limits work more predictably if the system-level limit is higher than the RP-level limit. If, for whatever reason, you have a system-level CPU limit that is equal to or lower than an RP-level CPU limit, then the RP limit may not be enforced because it is never reached. This is because the system limit will restrict CPU allowed to be consumed by that RP to a lower level that the defined RP limit, based on relative weight differences and consumption patterns among active RPs.

The graphic above illustrates this situation. The colored/shaded bars represent CPU consumed, both in total on the platform, and by each of the 2 active resource partitions. RP1 has the same 60% CPU limit as exists at the system-level. In the process of enforcing the system-level limit of 60%, RP1 ends up consuming less than 60% of the CPU, so the RP1 limit of 60% is never reached.


It is recommended that RP CPU limits always be set lower than a system level CPU limit. If that recommendation is not followed, then the practical RP-level CPU limit enforced by Priority Scheduler may be less than the defined RP level CPU limit, due to the need to enforce the system-level CPU limit equitably. Even under those conditions, you may still want to keep the RP-level CPU limit in place in order to provide a benefit in the future when you begin to relax, or even remove, the system-level CPU limit.


Examples


Following are 4 snapshots of Priority Scheduler monitor output (with only the relative weight and the CPU usage fields showing) that were taken during the experiment that tested multi-level CPU limits. Each snapshot represents the same workload, but with a different CPU limit setting. The settings illustrated in these 4 snapshots include:

  1. No CPU limits defined
  2. A single CPU limit at the Resource Partition-level (RP1) set at 60%
  3. A single system-level CPU limit defined at 75%, no RP-level limit
  4. An RP-level limit at 50% and the system-level limit at 75%

Only the resource partition and allocation group resource consumption detail is shown. The “Avg CPU %” column represents the CPU utilization for that component at that time.

Snapshot 1: No CPU Limits

 

Notice that the combined CPU usage of all 3 active resource partitions, including the System performance group (AG 200), is 95% (5+76+12+2 = 95). 95% is well above the point where the future system-level CPU of 75% is going to be placed.
 

Also notice that that RP1 alone is consuming 76% of the CPU, well above what the 60% RP-level CPU limit will allow. The 60% RP-level CPU limit will be applied in the next snapshot. There are two allocation groups under the control of RP1: AG 13 and AG 23. Both will be impacted the CPU limit on RP1, as shown in the next snapshot.

Snapshot 2: Limit RP1 to 60%

 

In the second snapshot, RP1 CPU usage has fallen from 76% when no CPU limit was in place, down to 60% with the limit. The CPU limit at the resource partition level is working as expected. Notice that with no system-level CPU limit in place, RP2 (the non-capped resource partition) has increased its CPU consumption from 12% to 20%. This increase happens because there is spare CPU available, CPU that had previously been used by RP1 but that no longer can be due to the CPU limit.

Snapshot #3: Limit System to 75%, no CPU Limit on RP1
 

In the above snapshot, there is no RP limit, but there is a system-level CPU limit. Note that the combined resource partition CPU percentages, including AG 200, total 73% (4+58+10+1 = 73), which is the range of normal for a 75% limit. All 3 RPs are consuming lower percentages of CPU compared to the “no limit” snapshot.

Snapshot #4: Limit system to 75% and limit RP1 to 50%

 

This final snapshot shows the impact of two levels of CPU limits. Notice the combined RP and AG200 usage, when considered from the “Avg CPU” perspective, is 72% (4+50+16+2 = 72. RP1 CPU usage is at 50%, exactly where the 50% RP limit was set. This illustrates adherence to both CPU limits.


It is likely that total CPU usage was 3 points below the system-level CPU limit of 75%, not because the CPU limit functionality is over-limiting, but rather because of the presence of the 50% CPU limit on RP1. 75% of the CPU could not be consumed by the active RPs. This is evident if you look at the “No Limit” snapshot. In Snapshot #1 RP1 is the biggest CPU consumer, using 76% of the CPU. The other 2 RPs combined used only 17% of the CPU. With RP1 held down to 50%, demand for CPU from the other RPs is stronger, but at the time this snapshot was taken it was not strong enough to use up the additional 25% of CPU that represents the delta between the system-level limit and the RP1 limit.

Tip for Managing Multiple Levels of CPU limits


If an RP-level CPU limit pre-exists a system-level CPU limit, consider reducing the RP-level CPU limit proportionally with the reduction that the system-level limit introduces. For example, assume you have a 60% CPU limit on RP1. Then, at a later time, you introduce a 75% CPU limit at the system level (a 75% system-level CPU limit is a 25% reduction from the no-limit case). In this situation, you could reduce the RP-level CPU limit on RP1 by 25% as well, from 60% to 45%. This will limit work running in RP1 to about the same percentage of the available CPU compared to when there was no system-level CPU limit. 
 

Discussion
nycle 1 comment Joined 03/10
30 Mar 2010

Hi, carrie, I'm nycle. I come from China.Welcome to Xiamen.China. Could I make a friend with you? I am very happy that you are an expert of teradata. And I do the job of oracle for several years, but I am a learner of teradata. So could you send me some resources of teradata? And my email is: chenhuayun@sina.com Thank you very much.

carrie 595 comments Joined 04/08
30 Mar 2010

To learn about Teradata, see the courses offered via the Teradata Education Network at www.teradata.com/t/TEN.

v 1 comment Joined 04/10
15 Apr 2010

Hi carrie,
I am a big fan of you.I am basically a teradata Master and passionate about learning more and more on teradata But the concept you are highlighting such as "macro" and "peek or not to peek" i liked a lot and got a clear picture.Thanks a lot for the time you are spending for the readers.Thanks a lot

monisiqbal 17 comments Joined 07/09
28 Apr 2010

Excellent post. From the content it looks like CPU limit is imposed all the times, at all cost by Teradata. I have a question related to TASM: If a query starts running in state 1 whose CPU Limit is 30% and continues to run in state 2 whose CPU limit is 50%, then will the new CPU limit enforced by TASM be applied to this already running query?

AbeK 37 comments Joined 08/09
29 Jul 2010

Carrie, thanks for this explanation, it helped clarify a doubt that I had. I'd still like your thought on my doubt.
I have CPU Limits at both the RP level and the AG within and not at the System level. With this setup will the cumulative use of CPU within a RP obey the RP CPU limit or would each of the AG bubble up the respective CPU limit(my belief is that, this is impossible physics)
From your explanation it is apparent that the effect is cumulative/sum of all AG CPU use combined, but would like to have a confirmation.

carrie 595 comments Joined 04/08
30 Jul 2010

Sorry, I missed that April question, so let me respond to that first.

Anytime a CPU limit is changed, whether manually by the DBA, or automatically by TASM, all active work that is under the control of that CPU limit will be impacted by the new CPU limit, and will respond accordingly. The CPU limit will be enforced immediately by priority scheduler, but it may take a minute or two for the change to be apparent. Usually one or two age intervals worth of time (age interval is 60 seconds by default) have to pass for a changed CPU limit to fully affect running queries. This assume the running query is able to consume the greater level of CPU, which may not be the case. CPU limits set higher than demand do not make any difference at the times when demand is low.

In terms of the CPU limits at the AG and the RP level (one within the other), you are correct. The cumulative use of CPU across all the allocation groups within that RP will not be allowed to exceed the RP level CPU limit. The RP level CPU limit works the same whether or not the allocation groups within the RP have CPU limits as well.

A CPU limit does not reserve CPU ahead of time, or guarantee a specific level of CPU for an allocation group or resource partition. It's only job is to keep the level of CPU usage to the limit level specified, should usage reach that level, no matter what the level it is set on.

AbeK 37 comments Joined 08/09
30 Jul 2010

Carrie,Thank you for the explanation and confirmation. I can breathe better now.

ramubindu 6 comments Joined 06/08
19 Sep 2010

Hi carrie/monisiqbal,

i want to find out CPU &Memory utilization in teradata but i don't know how to find out CPU &Memory utilization can you please send me a SQL query to find out CPU &Memory utilization .

Thanks
Ram

carrie 595 comments Joined 04/08
22 Sep 2010

To get CPU utilization, you can execute the resnode macro and look at the CPU Bsy % column. Or use SQL like this against the ResUsageSPMA table:

sel
,sum(CPUUServ + CPUUExec) /sum(spma.secs*spma.NCPUs)) (format 'Z99.9%') (named "PctCPUBusy")
,sum(CPUUServ ) / sum(spma.secs*spma.NCPUs)) (format 'Z99.9%') (named "PctOSCPU")
,sum(CPUIoWait) / sum(spma.secs*spma.NCPUs)) (format 'Z99.9%') (named "PctIOWait")
from dbc.resusagespma spma
where thedate = ???????
and thetime between ???? and ????
;

A good place to go for information on memory is Woody's blog on this exchange.

bpmurray 1 comment Joined 02/11
15 Feb 2011

Carrie,

Do you have any documentation on how to calculate avalable CPU per workload? I can't seem to find any documentation on this. I have always seen it in respect to relative weighting between active groups,but not as CPU

vasudev 35 comments Joined 12/12
12 Jan 2013

Hi Carrie,

Is there any critical resource levels for Tactical queries?
While going through one of the articles i came across with the tactical queries. So, i want to know is there any resource levels which can impact the performance of the tactical queries.

Thanks in advance.

carrie 595 comments Joined 04/08
16 Jan 2013

A belated response to bpmurray:

I'm so sorry, but I somehow missed this comment from quite a while back.

Relative weight is an indication of what priority scheduler will allocate to an allocation group. The allocation group may service many different workloads. So while you can get a feel for how much CPU might be made available to a workload, I don't know of any way to scientifically determine that. If you have a one-to-one relationship between workload and allocation group, you could guess that the relative weight at run time (not at definition time) will indicate approximately the level of CPU available to that workload. However, that relative weight may get adjusted up or down by priority scheduler depending on past usage of queries within the workload, so it is also not particularly accurate.

Thanks, -Carrie

carrie 595 comments Joined 04/08
16 Jan 2013

Trustngs,

In regards to CPU limits, you would not want to apply a CPU limit on an allocation group or a resource partition where a tactical query application runs, if that CPU limit is set lower than the expected CPU usage of the tactical queries. CPU limits are usually only placed on allocation groups that support low priority work. In any priority scheduler setup you implement, it is always important to protect the tactical work by making sure more resources are made available to it than you expect it to consume.

Thanks, -Carrie

vasudev 35 comments Joined 12/12
17 Jan 2013

Thank you very much Carrie,

Learned new information from your reply. I want to know is there any specifications like average system CPU use of X% or parser usage at X% or Number of AWT's or something related to numbers for tactical queries performance.

carrie 595 comments Joined 04/08
21 Jan 2013

What resource usage levels can be supported on the platform while good tactical performance is also exhibited is very specific to the hardware power and balance, the nature of the tactical application, and the concurrency and characteristics of the non-tactical work running at the same time. So, unfortunately, I cannot make any general recommendations in response to your question, other than it is never good for tactical query consistency if you exhaust any resource on the box and then the tactical queries have to wait for resources and are delayed.

Thanks, - Carrie

vasudev 35 comments Joined 12/12
22 Jan 2013

Fine Carrie, thank you very much.

-Ganapathy

ashikmh 2 comments Joined 02/12
21 Feb 2013

Hi Carrie,

Please correct me, If I am wrong.

I am calculating the total available cpu as below from DBC.ResUsagespma table, is it correct..?

sum (CPUIoWait+CPUUExec+CPUUServ+CPUIdle) as Total Available CPU

Thanks

carrie 595 comments Joined 04/08
25 Feb 2013

SPMA does not report on available CPU, but rather it reports on consumed CPU. For CPU consumed, you don't want to add in CPUIdle, as that is time CPU is not being consumed. The same is true for CPUIoWait.

See the Resusage manual chapter on the ResUsageSPMA table for more detail on these columns.

Thanks, -Carrie

Roopalini 31 comments Joined 05/08
05 Jul 2013

Hi Carrie,
Thanks for the fine article. Can you explain what "% of system" means? This is our setup. I have 4 RP's each with a CPU LIMIT of 20%. Within each RP , I have allocation groups where I have set up only % of system and not CPU limit. For instance, I have a RP ZZ for which CPU limit is set to 20% and within that RP, I have ZZLow , ZZHigh , ZZMedium , ZZETL, where I have not given a CPU limit, however just allocated % of system ( eg: ZZ ETL is alloacted 8% of system,but no CPU limit). Does that mean that that AG would have access to a minimum of 8% of RP at any given time?
Thanks
Roopalini
 

carrie 595 comments Joined 04/08
08 Jul 2013

Sorry, but I don't see any reference to "% of system" in the above blog posting I wrote, so I am not exactly sure I am understanding your question.
 
Do you mean priority scheduler allocation "relative weight"?  If you do not have a CPU limit defined, then the allocation group relative weight will determine how frequently work within an allocation group will receive access to CPU, and this will influence consumption.
 
Relative weights should not be thought of as representing the percent of CPU that will be consumed.  Only under controlled test conditions will relative weights be likely to appear similar in percentage to the amount of CPU consumed by active Allocation Groups. Mostly consumption and relatives weights are very different.
 
More realistically, relative weights are an indication of the contrast in priority of the running work. The exact percentage represented by the relative weight is less important than the ratio between different allocation froup relative weights.  If one allocation group has double the relative weight of a second active allocation group, the first group can be expected to be offered CPU twice as often, with the potential for consuming twice as much of the resource as the group with the lower relative weight, if demand between the two categories of work were equivalent.
 
In your case, if ZZETL has an assigned weight of 8%, then it's relative weight is a result of the following calculation:  8% (the assigned weight of the allocation group) * the assigned weight of the RP.  The relative weight that results is probably quite a bit less than 8%.  But only if all RPs are active and all allocation groups within the RP are active.   Only active components are part of the relative weight calculation done by priority scheduler.
 
See the priority scheduler orange book for SLES 10 (titled:  Using Priority Scheduler Teradata Database V2R6) for more information on the difference between assigned weights and relative weights in SLES 10 priority scheduler.
 
Thanks, -Carrie

Roopalini 31 comments Joined 05/08
09 Jul 2013

Thanks for the response Carrie. I understood. Will go over the Orange book and get a better understanding.

SANJI 9 comments Joined 08/10
20 Dec 2013

Hi Carrie !!!
Excuse me if this question is outside the scope of this discussion...
From the TASM orange book...
While defining an exception action for "Change WD across all operating environments ... "Worklaod B is mapped to AG 'X'. During daytime operating period, Allocation Group 'X' has a very low weight including an absolute limit of 2%. During night time operating period, AG 'X' has a higher (still relatively low) weight but no absolute limit. Workload is automatically managed differently during day vs night, but without compromising the consistency in accounting and management"
How does an allocation group's weight change ? I understand the milestone concept where based on a performance period (threshold), the allocation group can change, but the snippet from TASM orange book mentions that the allocation group's weight changes during different operating periods.
Any insight is much appreciated.
Thanks
Sanjeev

SANJI 9 comments Joined 08/10
21 Dec 2013

Another question.. does a Resource Partition CPU's limit effect the RP's relative weight calculation. Similarly would a CPU limit on an AG impact the Relative Weight within the RP % calculation & overall respectively. 

For instance

 

RP Name RP Wt RP Relative Wt RP CPU Limit

-----------------------------------------------------

Default 20 20% (?) 60

Tactical 60 60% (?) 100

Standard 20 20% (?) 80

 

 

carrie 595 comments Joined 04/08
30 Dec 2013

Sanjeev,
 
To answer your most recent question: 
 
The calculation of the relative weight of either a resource partition or an allocation group has nothing to do with where a CPU limit is set.  The CPU limit percent does not participate in the calculation of relative weight.    Up until the defined CPU limit is reached, relative weight will determine priority and frequency of access to CPU for an allocation group.
 
 For the earlier question:
 
In order for an allocation group weight to chnage automatically (day vs. night for example), the administrator must have defined a different priority scheme for each of two operating environments.   
 
This is done in Viewpoint Workload Designer.  The screen where relative weights are defined is specific to a single planned environment (operating enviroment).  If there are multiple planned environments established, then weights need to be specified for all allocation groups within each planned enviroinment. When the planned environment changes, a new priority setup is then in effect.  
 
When a planned environment changes, Viewpoint sends the new settings to the PDE layer in the Teradata Database which controls priority scheduler, making the change in weights happen without any intervention from the DBA.
 
Thanks, -Carrie

SANJI 9 comments Joined 08/10
30 Dec 2013

Thank you Carrie..Appreciate you responding...
Quick one though.. In the context of "a different priority scheme"... how different is it from Workload change (based on a state change) from say "A" to "B" with different allocation groups assigned to them, or is it the same ?
Thank you once again.
Regards
Sanjeev

carrie 595 comments Joined 04/08
02 Jan 2014

Sanjeev,
 
I'm not sure I am understanding your question correctly.
 
When you set up "a different priority scheme" for a given planned environment, the change in priority is implemented by means of a TASM state change.  The same allocation groups are used in both State A and State B, it is just that the priority given to an allocation group could be different from one state to another.
 
You can read more about state changes in another blog posting of mine on State Change Optimizations.   There is additional information on state changes in the TASM orange book.
 
Thanks, -Carrie

SANJI 9 comments Joined 08/10
04 Jan 2014

I apologize for not being clear. I'll repost my uderstanding under the appropriate blog.. Thank you so much for taking time to answer.
Sincere Regards
Sanjeev

31 Mar 2016

Hi Carrie,
we recently migrated to 6800 system and implemented PMCOD 75% .
I am using following formulae for available CPU Cycles in 6800 system.
So estimated total available CPU seconds/day is #Nodes*#CPUs*Seconds/day = 4*32*86,400 CPU Seconds/day . i.e. 11,059,200 CPU Seconds/day.
 
As per my understanding, 75% PMCOD means CPU will theoretically be 33% higher compared to no COD.  It is nothing but, the access to CPU is reduced by internal mechanisms that stop the CPU from doing work for a percent of the time for each core on the node. So CPU Utilization recorded in DBQL will be higher by an amount that represents the inverse of the PMCOD level.
So bottom line is, the total available CPU cycles/day won’t change but with COD, CPU utilization by the user will be higher as compared to no COD.
 
Is my understanding correct or is it little more complicated?
 

carrie 595 comments Joined 04/08
04 Apr 2016

 The 6800 does not support PM COD.  WM COD is the only supported option for COD on the 6800. 
 
If you were actually using platform metering (PM) COD on the 6800 because of some special situation, then what you say above sounds reasonable to me.
 
Thanks, -Carrie

05 Apr 2016

Hi Carrie,
Thank you for the response.
Yes, we are using platform metering on the 6800 box and here is the formulae we are using to calculate estimated total available CPU seconds/hr:
Number of Nodes (4) * Number of cores per node(16) * PMCOD(0.75) * 3600 (60 secs*60 mins) * .80 (capacity available after OS) = 138240 seconds available per hour.
Node Model: 6800
# Nodes:4
CPU's/Node:16
AMPs/Node:42
CPU Seconds / Minute: 3,840
Time Slot (minutes):60
Capacity / Slot:230400
Total System Capacity /Slot:230400
Capacity Available after OS:80%
Capacity Multiplier: 0.75
estimated total available CPU seconds/hrs:            138,240
 

You must sign in to leave a comment.