19 June, 2009

Cpu Load Monitoring

We've recently had explore collecting and analyzing CPU resource loading metrics for our project. The host OS is Linux 2.6 based and rather than install a performance-specific set of utilities we aimed at utilizing readily available services.

Ask any Linux weenie what CPU loading they have been experiencing and most will respond by starting the top utility. Top is available on most Unix distributions, is a well known utility and has pretty much stood the test of time. For those reasons it seemed obvious to incorporate it into a performance monitoring data collection and analysis tool suite.

The following Tcl script takes two arguments, the first an output file which will be populated with the redirected top log file, the second is a duration (in seconds) of how long to monitor the performance. After the raw collection has completed the script parses the log, extracts the CPU loading metrics and outputs a median and mean CPU processing load expressed in percentage of utilization.

It's worth noting that the script incorporates searching for the regular expression *Cpu* to find the top line item holding the Cpu metrics. As a result, if you have a process running that matches *Cpu* you'll find the script errors out during processing of the log file. I'll leave it as a lesson to the reader how to correct (wink).

We've extended on the same principle to monitor network activities.

Have fun.


#!/usr/bin/tclsh

proc log { msg } {
# puts stderr __$msg
}

proc mean { L } {
set sum 0.0
foreach e $L {
set sum [expr $sum + $e]
}
return [expr $sum / [llength $L].]
}

proc median { L } {
set L1 [lsort -real $L]
return [lindex $L1 [expr [llength $L1] / 2]]
}

proc process { fileName } {
set fp [open $fileName r]
while { [gets $fp line] >= 0 } {
switch -glob -- $line {
*Cpu*
{
log "processing line '$line'"
set el [split $line ,]
set userLoad [string trimleft [lindex $el 0] "Cpu(s):"]
set sysLoad [lindex $el 1]
set niceLoad [lindex $el 2]
set idleLoad [lindex $el 3]
log "parsing $userLoad"
log "parsing $sysLoad"
log "parsing $niceLoad"
log "parsing $idleLoad"
set userLoad [lindex [split $userLoad \%] 0]
set sysLoad [lindex [split $sysLoad \%] 0]
set niceLoad [lindex [split $niceLoad \%] 0]
set idleLoad [lindex [split $idleLoad \%] 0]
log "extracted $userLoad"
log "extracted $sysLoad"
log "extracted $niceLoad"
log "extracted $idleLoad"
# puts [expr $userLoad+$sysLoad+$niceLoad+$idleLoad]
lappend L [expr 100.0-$idleLoad]
}
}
}
close $fp
puts "mean cpu load: [mean $L]"
puts "median cpu load: [median $L]"
}

proc monitorCpuLoad { fileName duration } {
catch { exec top -b -n $duration -d 1 > $fileName }
process $fileName
}

proc help { } {
upvar #0 argv0 argv0
puts stderr "usage: $argv0 \[logFileName\] \[duration\]"
exit
}

#---main---
if { [llength $argv] != 2 } {
help
}
monitorCpuLoad [lindex $argv 0] [lindex $argv 1]



No comments: