ITPub博客

首页 > Linux操作系统 > Linux操作系统 > A more detailed threshold we should monitor on UNIX side

A more detailed threshold we should monitor on UNIX side

原创 Linux操作系统 作者:netbanker 时间:2008-04-22 04:49:44 0 删除 编辑

this is an example of SUN solaris data, it may be a bit different on other platform.

FYI:

[@more@]

The iostat output contains summary information for all devices.

Field

Description

r/s

Shows the number of reads/second

w/s

Shows the number of writes/second

kr/s

Shows the number of kilobytes read/second

kw/s

Shows the number of kilobytes written/second

wait

Average number of transactions waiting for service (queue length)

actv

Average number of transactions actively being serviced

wsvc_t

Average service time in wait queue, in milliseconds

asvc_t

Average service time of active transactions, in milliseconds

%w

Percent of time there are transactions waiting for service

%b

Percent of time the disk is busy

device

Device name

What to look for

* Average service times greater than 20msec for long duration.

* High average wait times.

Field Descriptions

Field

Description

cpu

Processor ID

minf

Minor faults

mif

Major Faults

xcal

Processor cross-calls (when one CPU wakes up another by interrupting it).

intr

Interrupts

ithr

Interrupts as threads (except clock)

csw

Context switches

icsw

Involuntary context switches

migr

Thread migrations to another processor

smtx

Number of times a CPU failed to obtain a mutex

srw

Number of times a CPU failed to obtain a read/write lock on the first try

syscl

Number of system calls

usr

Percentage of CPU cycles spent on user processes

sys

Percentage of CPU cycles spent on system processes

wt

Percentage of CPU cycles spent waiting on event

idl

Percentage of unused CPU cycles or idle time when the CPU is basically doing nothing

* Involuntary context switches (this is probably the more relevant statistic when examining performance issues.)

* Number of times a CPU failed to obtain a mutex. Values consistently greater than 200 per CPU causes system time to increase.

* xcal is very important, show processor migration

Section 1: Netstat -ain

Field

Description

name

Device name of interface

Mtu

Maximum transmission unit

Net

Network Segment Address

address

Network address of the device

ipkts

Input packets

Ierrs

Input errors

opkts

Output Packets

Oerrs

Output errors

collis

Collisions

queue

Number in the Queue

The information in Section 1 will help diagnose network problems when there is connectivity but response is slow.

Values to look at:

* Collisions (Collis)

* Output packets (Opkts)

* Input errors (Ierrs)

* Input packets (Ipkts)

* Network collision rate = Output collision / Output packets

For a switched network, the collisions should be 0.1 percent or less (see the Cisco web site as a reference) of the output packets

vmstat output is actually broken up into six sections: procs, memory, page, disk, faults and CPU. Each section is outlined in the following table.

Field

Description

PROCS

r

Number of processes that are in a wait state and basically not doing anything but waiting to run

b

Number of processes that were in sleep mode and were interrupted since the last update

w

Number of processes that have been swapped out by mm and vm subsystems and have yet to run

MEMORY

swap

The amount of swap space currently available free The size of the free list

PAGE

re

page reclaims

mf

minor faults

pi

kilobytes paged in

po

kilobytes paged out

fr

kilobytes freed

de

anticipated short-term memory shortfall (Kbytes)

sr

pages scanned by clock algorithm

DISK

Bi

Disk blocks sent to disk devices in blocks per second

FAULTS

In

Interrupts per second, including the CPU clocks

Sy

System calls

Cs

Context switches per second within the kernel

CPU

Us

Percentage of CPU cycles spent on user processes

Sy

Percentage of CPU cycles spent on system processes

Id

Percentage of unused CPU cycles or idle time when the CPU is basically doing nothing

What to look for

The following information should be used as a guideline and not considered hard and fast rules. The information documented below comes from Adrian Cockcroft's book, Sun Performance Tuning. Other operating systems like HP and Linux may have different thresholds.

* Large run queue. Adrian Cockcroft defines anything over 4 processes per CPU on the run queue as the threshold for CPU saturation. This is certainly a problem if this last for any long period of time.

* CPU utilization. The amount of time spent running system code should not exceed 30% especially if idle time is close to 0%.

* A combination of large run queue with no idle CPU is an indication the system has insufficient CPU capacity.

* Memory bottlenecks are determined by the scan rate (sr) . The scan rate is the pages scanned by the clock algorithm per second. If the scan rate (sr) is continuously over 200 pages per second then there is a memory shortage.

* Disk problems may be identified if the number of processes blocked exceeds the number of processes on run queue.

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/67/viewspace-1002812/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2008-06-12

  • 博文量
    20
  • 访问量
    90545