All Forums Database
Random_Thought 87 posts Joined 06/09
29 Jul 2009
Thoughts on CPUBusy caclulations.

A few people in our organization are slightly disbelieving of the CPU utilisation charts I produce, they always focus on the CPU utilization charts, as if it is the only metric to be interested in.(But thats a different post)

We use Linux based TD nodes, and I use the DBC.ResUsagespma table to extract the CPU utilisations.

((CPUUExec+CPUUServ) / (CPUIoWait+CPUUExec+CPUUServ+CPUIdle) ) *100

Does this seem right? Its the sort of calculation that I would use in the Unix world, but I thought I would seek opinion.

dnoeth 4628 posts Joined 11/04
29 Jul 2009

Of course you're correct :-)

CPUIdle + CPUIoWait = cpu idle
CPUUServ + CPUUExec = cpu busy

Dieter

Dieter

Random_Thought 87 posts Joined 06/09
29 Jul 2009

Thanks for the confirmation Dieter.

For others I have attached the SQL I use to extract the days CPU Utilisation.

I hope its useful to others.

Random.

select thedate,
cast(cast(cast(TheTime as format '99:99:99.99')
as char(11)) as time(6)) as time_of_day,
((sum(CPUUExec)) / (sum(CPUIoWait+CPUUExec+CPUUServ+CPUIdle)) ) *100 as CPU_USR,
((sum(CPUUServ)) / (sum(CPUIoWait+CPUUExec+CPUUServ+CPUIdle)) ) *100 as CPU_SYS,
((sum(CPUIoWait)) / (sum(CPUIoWait+CPUUExec+CPUUServ+CPUIdle)) ) *100 as CPU_WIO,
((sum(CPUIdle)) / (sum(CPUIoWait+CPUUExec+CPUUServ+CPUIdle)) ) *100 as CPU_IDLE
from dbc.resusagespma
where TheDate > (CURRENT_DATE - interval '1' DAY)
group by thedate,thetime
order by thedate,thetime;

standalone 17 posts Joined 08/08
04 Aug 2009

thanks @Random_Thought that query help me a lot

26 Aug 2009

Hi Random,
Could you please let us know what CPU_USR and CPU_SYS are?

Plus
-how can I identify which user is using most of the resources (CPU) and which user(s) is/are waiting for the resources for a given time range?
-Querylog table has totalCPUTime what is this value?
Thanks
Feroz

26 Aug 2009

I am trying to understand the CPUUExec and CPUUServ

Random_Thought 87 posts Joined 06/09
07 Sep 2009

Shaik,
Hi, I'll post some more information in next couple of days. I've just got back into office after being away for a while.
In brief though, CPU_SYS is the same as CPUSERV, and is the percentage of time the underlying linux/Windows server was executing System type code, such as reading from disk managing the run queues, handling interupts.

CPU_USR is same as CPUEXEC, and is the amount of time spent executing user code.

add both to gether and you will see the total CPU utilisation on the Teradata system.

The IDLE time is divided into CPUidle, which is really idle, and IOWAIT which is where the system is effectively idle, but at least one process is waiting for IO.

If you read some Unix performance books, they may give you a better understanding.

I'll post a better explanation later if it helps.

random.

monisiqbal 119 posts Joined 07/09
28 Sep 2009

Note that these times are multiplied by the number of CPUs on each node (corresponding to their respective rows).
In a heterogeneous system where the NodeTypes of your nodes are different this calculation will not hold true.

Random_Thought 87 posts Joined 06/09
05 Oct 2009

@monisiqbal. Not sure about this. My calculation works out a percentage based on the total. it includes all the CPU seconds available. Thus if you had low power nodes in your environment, they would still be X% utilised even though they do less work than the higher spec ones.

BUSY= (exec+serv) divided by (exec+serv+idle+iowait)

monisiqbal 119 posts Joined 07/09
05 Oct 2009

But the high power nodes would be contributing the CPU % according to their NodeNormFactor (which would be higher).
Consider an example where we have 20% CPU util. for a node of 5550 and let's say 5550 is twice as fast as 5500 which contributes 20%. Then the total CPU contribution should not be 20% because only 5550 would be equaling to 40% w.r.t. 5500.

leo.issac 184 posts Joined 07/06
08 Oct 2009

Can we compare the CPU calculated from Resusage table with one shown in DBQLogtbl?

The above discussion, talks about systemwide CPU utilization on a day and Hour basis. DBQLogtbl also shows us TotalCPU per vproc and other details, can we corelate that CPU calculation with the calculations we make with resusage data?

Which is more precise? DBQLlog or Resusgae?

Does DBQL account for Waittime and Idletime w.r.t CPU utilization?

Random_Thought 87 posts Joined 06/09
09 Oct 2009

That is an interesting question..
I believe that the AMPCPUtime is the total USER CPU by the statement ignore the system, wait and idle time.

The DBQL records one record for each statement; so its difficult to use just that information to generate a usage profile. I think it also only records the CPU used in running the query upto first response time, I don't believe it includes the CPU involved in returning data to the client.

I use the AMPCPUtime for recharging to departments and the resusage for capacity planning.

Random_Thought 87 posts Joined 06/09
09 Oct 2009

Which is more precise? I would imagine that both are as accurate as each other, but I use them for different purposes. Resusagepma is good for providing general graphs on AWT CPU utilisation monitoring the relationships between AWT usage and CPU usage to provide best guess limits for restricting work etc. I also use it to manage BYNET contention and Physical/Logical IO etc.

DBQL is better for identifying the rogue queries, grouping CPU by User/Department etc.

does this help?

leo.issac 184 posts Joined 07/06
10 Oct 2009

Thanks for the information Random_thoughts! This is very useful.

I would like to ask few more questions regarding the CPU Usage by a Query per user.Hope you will help me understand.

(1)Will the CPU Usage of a Query remain same (excluding wait time and idle time) across all types of loads on the system for eg:- I have a Query that is run by a userid. an account string is assigned to this userID. The allocated weight for this user is different during different performance periods. Will I see same CPU Usage by this query during different performance periods.Where should I be checking the CPU Usage of this query, DBQL or Resusage?

(2)I have a SQL that deals with aggregation based on keys. I have 3 combination of keys and My SQL can be run in a single shot or in multiple phases(3 different SQLs).When the allocated weight for my performance group is high, which is the best scenario to run the SQL (a) Run the SQL in single shot (b) Run the SQL in 3 parallel sessions? What is the impact on CPU Usage in (a) and in (b)

Can anybody advise me and enlighten me on this

Random_Thought 87 posts Joined 06/09
13 Oct 2009

Leo.

(1) You should check CPUUsAGE for queries in DBQL, one statement per record. its easier I think.

(2) a) b) depends on the query. the intensity of reading data from the disk and the concurrency of other jobs in the system. RUnning together you could find you have issues with Cache pollution as one queries rows are discarded from memory to be reread in a couple of seconds, or it could benefit the other queries as the data is in cache somewhere and no physical read is required. You should run the queries and monitor the cache. Post if you need advice on this, I don't know if its covered elsewhere

I don't know if this will help answer your question but ...

The CPU usage reported by DBQL should be the same each time your query is run. Assuming data is the same, your statistics are upto date etc. you may notice some slight variation but ingeneral we will assume it to be constant.

What can alter is the elapsed time for the query?
from when the query is submitted, to when the data starts to return to when the data finishes returning are all measures you should be concerned about.

When a query is submitted DBQL records a timestamp in starttime. When it starts to run, a timestamp is written into firstStepTime. The difference is the Delay time. This is governed by the TASM group the query is put in. if only 2 queries from that group can run at once. This query will wait.

When a query completes, as far as teradata is concerned then the timestamp is written into FirstRespTime. The difference between FirststepTime and FirstRespTime is the elapsed time of the query. Therefore the AMPCPUtime cpu seconds are consumed here. For Example you may have consumed 5 seconds of CPUtime and taken 15 seconds to execute. The factors that can affect the elapsed time are CPU weightings this is where your allocated weight comes into play. Your weightings may not work in the same way as you would expect. In this example I will have a batch and a adhoc group and set the weightings so I have a 80/20 split between them. This split does not mean that one gets 80% of the CPU and the other 20%. All it means is that the underlying Linux scheduler under the teradata covers will send 80% of the time slices to Batch and 20% to adhoc. In this case 1 query running in batch can be swamped by several running in the adhoc. Why? because One query cannot by itself usually consume the whole allocation. The process has to wait for iO (Disk or network)) so does not require CPU, and relinquishes it. If the adhoc has more queries then there is a greater chance that one will require CPU, and thye will use their whole timeslice. Then if you have several large adhoc queries reading loads of data, the caches become filled with their data, and not the data the batch process needs to run with thus exagerating the IO wait for the batch query.

The last part of the query is the data return. This is from when the FirstRespTime is recorded to the last byte that is returned to your invoking program. This can be affected by Network traffic, disk load (remember the data is coming from spool, which is on physical disk. this will still need CPU, but I am not sure where this is recorded. if your query is an aggregation then you many not be returning millions of rows. NOte Teradata as standard does not return the LastResponse time, so measuring this is not as easy as on other platforms.

Does this help? or confuse. Please ask if you require more information or clarification. For every person asking the question there are often loads of people wanting to know that wont post!

Random

monisiqbal 119 posts Joined 07/09
13 Oct 2009

@Random Very good explanation. Cleared up quite a few things.

Thanks,

leo.issac 184 posts Joined 07/06
13 Oct 2009

Random_thoughts, you rock! Great information!

Random_Thought 87 posts Joined 06/09
14 Oct 2009

Monis, Leo, Thanks for your kind words.

In answer to another question earlier in the thread that I overlooked from Monis. The different Node sizes are catered for because you will run more AMPs on the more powerful ones and fewer on the less powerful ones, to even things out. In general a parallel system will run to the speed of the slowest component, so doing it this way would ensure that the more powerful NODES get more work. Having said that all our systems have the same nodes in the configurations and I do not have experience of different node sizes.

Random

sandipan23 2 posts Joined 05/10
12 Jan 2011

Sometime I see my system goes beyond 40% usage on CPU_Serv. Is it normal for Teradata on Linux ?

vincent91 14 posts Joined 02/10
21 Jun 2011

Hi all,

Can someone help me to understand the AMPCPUTime define in dbc.dbcqlogtbl ?

Is it correctif I try this approach :
AMPCPUTime = CPUUExec + CPUUServ ?

thanks for your help

Puja 1 post Joined 11/11
14 Nov 2011

Hi all,

When im running a query,some tables/views are involved in this.I want to derive some view level information from it.Can any one suggest any idea.

Thanks

Inna 1 post Joined 11/13
06 Nov 2013

Hi All,
 
I was wondering how to modify this query (posted above by @Random) to account for different generations of hardware in a coexistent environment ?
select thedate,
cast(cast(cast(TheTime as format '99:99:99.99')
as char(11)) as time(6)) as time_of_day,
((sum(CPUUExec)) / (sum(CPUIoWait+CPUUExec+CPUUServ+CPUIdle)) ) *100 as CPU_USR,
((sum(CPUUServ)) / (sum(CPUIoWait+CPUUExec+CPUUServ+CPUIdle)) ) *100 as CPU_SYS,
((sum(CPUIoWait)) / (sum(CPUIoWait+CPUUExec+CPUUServ+CPUIdle)) ) *100 as CPU_WIO,
((sum(CPUIdle)) / (sum(CPUIoWait+CPUUExec+CPUUServ+CPUIdle)) ) *100 as CPU_IDLE
from dbc.resusagespma
where TheDate > (CURRENT_DATE - interval '1' DAY)
group by thedate,thetime
order by thedate,thetime;

Santanu84 122 posts Joined 04/13
03 Jun 2014

Hi All
One quick question. What does the below query signify in terms of user cpu calculation? I know it is different than the above CPUUEXEC calculation from resusagespma table.
 
SELECT thedate , timetime, (CPUUEXEC / NULLIFZERO(NCPUS)) CPUOPUSER
FROM dbc.resusagespma WHERE TheDate = CURRENT_DATE ORDER BY thedate , thetime DESC ;

 

Thanking You

Santanu

Ekladios 13 posts Joined 02/12
29 Jul 2014

Hi All,
I am using DBQLogTbl  SUM ( AMPCPUTime ) per hour in a specific date. The maximum value is 174550.98.
We have one node and 9 CPUs. The total available cpu seconds in one hour is 60*60*9 = 32400 seconds
If DBQLogTbl AMPCPUTime is in seconds, this is my understanding,  then this exceeds  the total available cpu seconds.
If DBQLogTbl AMPCPUTime is in 100 milliseconds then max is 17455 seconds  then the ratio to total seconds is 1.8 -- The system CPU ( sar 5 10 ) was showing 0% idle 
Can someone advice please?
Thanks

krishaneesh 140 posts Joined 04/13
29 Jul 2014

(sum(CPUUExec)+sum(CPUUServ)+sum(CPUIoWait)+sum(CPUIdle).

This is used to calculate the total available cpu secs on the system.
 

VBurmist 96 posts Joined 12/09
29 Jul 2014

Hi Ekladios, 
there could be queries that run more than one hour, or at least span two hours.  If AMPCpuTime is summed up to "per hour", which is probably a query with a GROUP BY on a timestamp (starttime, firstresptime, etc), then those queries are charged to a particular hour, rather than splitting among hours.    
 
That could be a reason why a DBQLogTbl SUM (AMPCpuTime) per hour could show high numbers.
 
Regards,
Vlad.

You must sign in to leave a comment.