Analisando uso de recursos e performance de sistemas no Linux

Yves Junqueira <yves@cetico.org>

Ragazza che corre sul balcone (1912)

Analisando uso de recursos e performance de sistemas no Linux

05 de Outubro de 2006, Festival Software Livre-DF

Uso de recursos e performance

Recursos escassos (pero no mucho)

Tema complexo

Cenários comuns para estudo de performance

Tempo de uso do Processador

Atividades do processador

# mpstat -P ALL
03:18:26     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
03:18:30     all    8,36    0,00    2,46    0,16    0,00    3,28    0,00   85,74    677,30
03:18:30       0    2,96    0,00    4,93    0,33    0,00    6,25    0,00   85,53    425,00
03:18:30       1   13,82    0,00    0,33    0,00    0,00    0,00    0,00   86,51    252,63

Conceitos importantes

Memória Virtual - vmstat

Cenário: sistema normal e saúdavel

$ vmstat 3
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0 214000  12832  14920 102328    4    3    39    25  205  183  7  3 89  1
 2  0 214000  10808  14920 102332    0    0     0     5  331  829  3  1 96  0
 1  0 214000  12700  14928 102332    0    0     0    16  346  977  5  1 94  0
 2  0 214000  12724  14928 102332    0    0     0     0  334 1074  6  3 91  0
 3  0 214000  21352  14936 101644    0    0     1     3  336 1244  7  1 81 11
 1  0 214000  21360  14948 101644    0    0     0    77  336  865  0  0 100  0
 0  0 213984  19204  14992 103412    0    0   605    53  411 1527 12  5 67 16
 1  0 213984  19060  14996 103468   20    0    39     0  337  790  1  1 96  3
 3  0 213984  19052  15000 103468    0    0     0     1  326  782  1  0 99  0
 2  0 213984  19052  15000 103468    0    0     0     0  326  789  1  0 99  0
 2  0 213984  19052  15004 103468    0    0     0     1  328  873  3  1 97  0
 2  0 213984  19084  15004 103468    0    0     0     0  339  917  1  1 98  0

Memória

Cenário: thrashing ("debatendo-se")

Processos competindo por memória, causando piora geral de performance

$ vmstat 3
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  7 1056184  48128   9096   9324   11   71    13    83    0    1  2  1 97  0  0
 0  6 1093104  48136  10432   9252 1565 12307  2009 12313 1353  616  3 11  0 86  0
 0  5 980708 190292  12120   9120 2981 9260  3544  9264 1453  863  1  9  0 90  0
 0  4 867396 785244  13152   9108 2368    0  2711     4 1304  562  2  5  0 93  0
 0  7 953220  48076  12708   5940 1193 28651  1287 28657 1200  321 32 48  0 19  0
 1  7 1022564  48096  13396   6308  883 23115  1421 23119 1243  509  3 30  0 67  0
 0  8 1049588  49996  14448   7696 1899 9008  2711  9013 1353  609  2 15  0 83  0
 0  8 1100532  48408  15400   8992 1388 17015  2280 17017 1338  619  6 19  0 75  0
 0  8 1142004  47984  15996   9776 1109 13861  1725 13873 1307  615  7 14  0 79  0
 0  9 1181016  48052  16776  12620  901 13004  2156 13017 1281  621  3 16  0 81  0
 0  8 661588 443400  17700  14224 1900 4993  2567  5003 1333  607 10 14  0 77  0
 2  2 659736 397848  18040  14328  789    0   923   283 1206  312 44 48  0  8  0
 3  6 689432  48120  10224   8440  795 14457   988 14561 1229  428 17 78  0  5  0
 0  5 747040  48024   1188   5688  407 21473   713 21497 1275  450  3 33  0 64  0

Estado de processos

Processos podem estar em diversos estados

Comandos interessantes:

Tabela de processos

Verificando o estado dos processos - /proc

$ ls -la /proc/self/
total 0
dr-xr-xr-x   6 yves yves 0 2007-10-05 03:35 .
dr-xr-xr-x 155 root root 0 2007-10-04 09:19 ..
dr-xr-xr-x   2 yves yves 0 2007-10-05 03:35 attr
-r--------   1 yves yves 0 2007-10-05 03:35 auxv
--w-------   1 yves yves 0 2007-10-05 03:35 clear_refs
-r--r--r--   1 yves yves 0 2007-10-05 03:35 cmdline
-r--r--r--   1 yves yves 0 2007-10-05 03:35 cpuset
lrwxrwxrwx   1 yves yves 0 2007-10-05 03:35 cwd -> /usr/src/linux-2.6.22-rc7/Documentation
-r--------   1 yves yves 0 2007-10-05 03:35 environ
lrwxrwxrwx   1 yves yves 0 2007-10-05 03:35 exe -> /bin/ls
dr-x------   2 yves yves 0 2007-10-05 03:35 fd
dr-x------   2 yves yves 0 2007-10-05 03:35 fdinfo
-r--r--r--   1 yves yves 0 2007-10-05 03:35 io
-rw-r--r--   1 yves yves 0 2007-10-05 03:35 loginuid
-rw-r--r--   1 yves yves 0 2007-10-05 03:35 make-it-fail
-r--r--r--   1 yves yves 0 2007-10-05 03:35 maps
-rw-------   1 yves yves 0 2007-10-05 03:35 mem
-r--r--r--   1 yves yves 0 2007-10-05 03:35 mounts
-r--------   1 yves yves 0 2007-10-05 03:35 mountstats
-rw-r--r--   1 yves yves 0 2007-10-05 03:35 oom_adj
-r--r--r--   1 yves yves 0 2007-10-05 03:35 oom_score
lrwxrwxrwx   1 yves yves 0 2007-10-05 03:35 root -> /
-r--r--r--   1 yves yves 0 2007-10-05 03:35 schedstat
-rw-------   1 yves yves 0 2007-10-05 03:35 seccomp
-r--r--r--   1 yves yves 0 2007-10-05 03:35 smaps
-r--r--r--   1 yves yves 0 2007-10-05 03:35 stat
-r--r--r--   1 yves yves 0 2007-10-05 03:35 statm
-r--r--r--   1 yves yves 0 2007-10-05 03:35 status
dr-xr-xr-x   3 yves yves 0 2007-10-05 03:35 task
-r--r--r--   1 yves yves 0 2007-10-05 03:35 wchan

Tabela de processos

Verificando o estado dos processos - ps

Formato de saída customizável

$ ps -e -o rss,s,cmd|sort -n|tail -n 10
15252 S gnome-panel --sm-client-id default1
17056 S nautilus --no-default-window --sm-client-id default2
24496 S gnome-terminal
24928 S /usr/bin/python /usr/bin/gnochm /home/yves/Desktop/Inbox/Textos/Optimizing Linux Performance.chm
26884 S mono /usr/lib/tomboy/Tomboy.exe --panel-applet --oaf-activate-iid=OAFIID:TomboyApplet_Factory --oaf-ior-fd=26
28328 S gaim
36148 S python
36220 S /home/yves/bin/thunderbird/thunderbird-bin -P yourbase
41936 R /usr/X11R6/bin/X :0 -br -audit 0 -auth /var/lib/gdm/:0.Xauth -nolisten tcp vt7
139084 S /usr/lib/firefox/firefox-bin

Uso total de memória física por um determinado programa

$ WHAT=apache2; M=0; for x in $(ps -e -o rss,cmd|grep $WHAT |awk '{ print $1}'); do M=$(($M + $x));done; echo $M
74320

Tabela de processos

Verificando o estado dos processos - top

top - 03:10:55 up 48 days,  3:45,  4 users,  load average: 0.50, 0.58, 0.58
Tasks:  87 total,   1 running,  86 sleeping,   0 stopped,   0 zombie
Cpu0  :  4.4%us,  4.4%sy,  0.0%ni, 81.8%id,  2.4%wa,  0.0%hi,  7.1%si,  0.0%st
Cpu1  :  8.1%us,  1.0%sy,  0.0%ni, 90.2%id,  0.7%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2058956k total,  1997820k used,    61136k free,    12564k buffers
Swap:  2048276k total,    44404k used,  2003872k free,  1097504k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  nFLT COMMAND                
27727 mysql     20   0  131m  53m 4184 S    0  2.6   1641:03 6503 mysqld                 
 2477 root      18   0  4928  516  428 S    0  0.0   0:11.28   86 sshd                   
11887 postfix   15   0  4880  864  724 S    0  0.0   0:00.28   84 tlsmgr                 
    1 root      15   0  1952  444  384 S    0  0.0   0:05.71   53 init                   
 2515 ntp       15   0  4128  672  608 S    0  0.0   0:03.92   48 ntpd                   
 1383 root      15  -4  2180  636  448 S    0  0.0   0:00.04   43 udevd                  
18324 root      18   0 22296 6944 4128 S    0  0.3   0:00.40   43 apache2                
 2246 root      18   0  1628  480  388 S    0  0.0   0:19.75   42 syslogd                
 2451 root      19   0  4820  704  608 S    0  0.0   0:01.78   29 master                 
18335 root      16   0  7704 2308 1872 S    0  0.1   0:00.06   16 sshd                   
 2252 root      16   0  1584  268  264 S    0  0.0   0:00.00   15 klogd                  
18187 root      15   0 59648  27m 2464 S    0  1.4   0:53.44   14 userapp1          
19532 postfix   19   0  4824 1556 1264 S    0  0.1   0:00.00   14 pickup                 
19396 root      15   0  7864 2388 1936 S    0  0.1   0:00.05   12 sshd                   
19352 root      15   0  789m 747m 8148 S   18 37.2  17:13.98   11 userapp2
 2458 postfix   15   0  4860  588  520 S    0  0.0   0:00.23    8 qmgr                   
 2526 root      16   0  1920  444  376 S    0  0.0   0:02.15    6 mdadm                  
 2628 root      18   0  1576  288  284 S    0  0.0   0:00.00    6 getty                  
 2630 root      18   0  1580  288  284 S    0  0.0   0:00.00    6 getty                  
 2631 root      18   0  1576  288  284 S    0  0.0   0:00.00    6 getty                  
 2633 root      18   0  1576  288  284 S    0  0.0   0:00.00    6 getty                  
 2634 root      18   0  1576  288  284 S    0  0.0   0:00.00    6 getty                  
 2636 root      18   0  1580  288  284 S    0  0.0   0:00.00    6 getty                  
27690 root      25   0  2808 1052 1048 S    0  0.1   0:00.00    3 mysqld_safe            
27728 root      18   0  1880  484  480 S    0  0.0   0:00.00    3 logger                 
 3311 root      16   0  7704 2312 1872 S    0  0.1   0:00.07    3 sshd                   
 6492 snmp      15   0  7276 2236 1468 S    0  0.1   0:30.64    2 snmpd                  

Operações E/S (I/O) - discos

Outras dicas

cache contents (e.g. holding TCP contexts) are not reused optimally as packets from devices are sent to different processors on every interrupt. @intel.com

Operações E/S (I/O) - discos

# iostat -xk
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9,90    0,00    7,72   20,47    0,00   61,91

Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               5,05     0,00 47,14  0,00   880,81     0,00    37,37     0,26    5,51   4,86  22,90
sdb               0,34     0,00 38,38  0,00   184,51     0,00     9,61     0,24    6,32   6,25  23,97
md0               0,00     0,00  0,00  0,00     0,00     0,00     0,00     0,00    0,00   0,00   0,00
md1               0,00     0,00 90,91  0,00  1065,32     0,00    23,44     0,00    0,00   0,00   0,00

# vmstat 3
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0  44404  52148   7860 1096704    0    0     1     4    0    2 17  9 67  7
 1  0  44404  54112   7820 1093044    0    0  1303   433  741  771  8  7 73 12
 2  0  44404  53732   7844 1093388    0    0     1  1213  774 1036  8  6 85  1
 1  0  44404  53732   7852 1093872    0    0     5   345  720  705  7  6 85  2
 0  0  44404  51064   7868 1094480    0    0  1032   421  759  975  8  7 81  3
 0  0  44404  53484   7876 1093124    0    0   413   236  724  704  7  5 85  2
 0  0  44404  52096   7908 1090884    0    0   515   711  719  661  7  6 86  2

Não é o caso, mas quando há excesso de operações de escrita, fique atento. Pode estar havendo excesso de log.

Load average

Bastante sensível a I/O

$ uptime
 04:15:01 up 97 days, 14:09,  2 users,  load average: 0.64, 0.10, 0.00

- bom para espiar o código do Kernel (nr_active)

2045 unsigned long nr_active(void)
2046 {
2047         unsigned long i, running = 0, uninterruptible = 0;
2048 
2049         for_each_online_cpu(i) {
2050                 running += cpu_rq(i)->nr_running;
2051                 uninterruptible += cpu_rq(i)->nr_uninterruptible;
2052         }
2053 
2054         if (unlikely((long)uninterruptible < 0))
2055                 uninterruptible = 0;
2056 
2057         return running + uninterruptible;
2058 }

Novos recursos

SystemTap

Feito para ser equivalente ao dtrace do Solaris (Santo Cálice dos sysadmins)

Utiliza kprobes

Muito interessante, mas ainda mal integrado às distribuições

$ cat socket-trace.stp 
probe kernel.function("*@net/socket.c").call {
  printf ("%s -> %s\n", thread_indent(1), probefunc())
}
probe kernel.function("*@net/socket.c").return {
  printf ("%s <- %s\n", thread_indent(-1), probefunc())
}

/proc/pid/io

WeeeeeeeEEeeEEE! - desde kernel 2.6.20

CONFIG_TASK_IO_ACCOUNTING=y

$ cat /proc/11626/io
rchar: 32756789
wchar: 18496037
syscr: 32369
syscw: 13689
read_bytes: 9089024
write_bytes: 1138688
cancelled_write_bytes: 8192

Obrigado!

VALEU!