ITPub博客

首页 > 自动化运维 > 大规模网络运维 > [20200430]监测机房温度.txt

[20200430]监测机房温度.txt

原创 大规模网络运维 作者:lfree 时间:2020-04-30 09:49:15 0 删除 编辑

[20200430]监测机房温度.txt

--//以前的一个需求,要求监测机房内温度,实际上间接测试硬盘的问题一样可以大致了解机房的温度。
--//正好别人有这样的需求,我自己在新的测试环境测试看看。

# fdisk -l

Disk /dev/cciss/c0d0: 1800.2 GB, 1800280694784 bytes
255 heads, 63 sectors/track, 218871 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

           Device Boot      Start         End      Blocks   Id  System
/dev/cciss/c0d0p1   *           1        1305    10482381   83  Linux
/dev/cciss/c0d0p2            1306        7832    52428127+  83  Linux
/dev/cciss/c0d0p3            7833       11748    31455270   82  Linux swap / Solaris
/dev/cciss/c0d0p4           11749      218871  1663715497+   5  Extended
/dev/cciss/c0d0p5           11749       13799    16474626   8e  Linux LVM
/dev/cciss/c0d0p6           13800      218871  1647240808+  83  Linux

# smartctl -a /dev/cciss/c0d0 -d cciss,0
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.39-300.26.1.el5uek] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               HP
Product:              EG0600FCVBK
Revision:             HPD5
User Capacity:        600,127,266,816 bytes [600 GB]
Logical block size:   512 bytes
Logical Unit id:      0x5000c500763520db
Serial number:        S0M2Q6EW0000B4449XDF
Device type:          disk
Transport protocol:   SAS
Local Time is:        Thu Apr 30 09:21:37 2020 CST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature:     32 C
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Drive Trip Temperature:        60 C
Manufactured in week 19 of year 2014
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  133
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  2198
Elements in grown defect list: 0
Vendor (Seagate) cache information
  Blocks sent to initiator = 37741738
  Blocks received from initiator = 1549759033
  Blocks read from cache and sent to initiator = 2597055257
  Number of read and write commands whose size <= segment size = 3182526893
  Number of read and write commands whose size > segment size = 0
Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 49708.35
  number of minutes until next internal SMART test = 7

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0    3196278.151           0
write:         0        0         0         0          0      29020.030           0

Non-medium error count:       36

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -       9                 - [-   -    -]
# 2  Background short  Completed                   -       2                 - [-   -    -]
# 3  Background short  Completed                   -       1                 - [-   -    -]

Long (extended) Self Test duration: 3870 seconds [64.5 minutes]

--//简单执行如下就可以记录硬盘温度。
# smartctl -a /dev/cciss/c0d0 -d cciss,0 | grep "Current Drive Temperature" | cut -f2 -d:
     32 C

--//有一些机型还可以看CPU温度:
$ cat /sys/class/thermal/thermal_zone0/temp
8300

--//但是这个温度不是摄氏温度。而是华氏温度/100.转换一下。公式如下:℃=(F-32)×5/9

$ echo "scale=2;($(cat  /sys/class/thermal/thermal_zone0/temp)/100 - 32 )*5/9" | bc -l
28.33

--//以前我同事要求将信息插入数据库,对方定时提取,温度过高通过短信提醒,我记忆硬盘温度35还是36度短信提醒。

SCOTT@book> create table Temperature (t date,h_temp  number(5,2),c_temp number(5,2));
Table created.

SCOTT@book> create unique index i_Temperature_t on Temperature(t);
Index created.

#! /bin/bash
hard_temp_value=$( /usr/sbin/smartctl -a /dev/cciss/c0d0 -d cciss,0| grep '^Current Drive Temperature'| cut -f2 -d: | cut -f1 -d"C" | sed 's/ //g')
cpu_temp_value=$(echo "scale=2;($(cat  /sys/class/thermal/thermal_zone0/temp)/100 - 32 )*5/9" | bc -l)

# echo $hard_temp_value $cpu_temp_value
su - oracle -c 'sqlplus -S  scott/book'   <<EOF
set feedback off
set termout off
insert into  Temperature values(sysdate,$hard_temp_value,$cpu_temp_value);
set feedback on
quit
EOF

--//然后5分钟定时执行1次,记录在表中。

SCOTT@book> select * from Temperature ;
T                       H_TEMP     C_TEMP
------------------- ---------- ----------
2020-04-30 09:40:15         32      28.33
2020-04-30 09:41:32         32      28.33

--//当然现在已经不需要了^_^。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/267265/viewspace-2689379/,如需转载,请注明出处,否则将追究法律责任。

全部评论
熟悉oracle相关技术,擅长sql优化,rman备份与恢复,熟悉linux shell编程。

注册时间:2008-01-03

  • 博文量
    2698
  • 访问量
    6480234