Latest release: safte-monitor-0.0.5.tar.gz
renamed to safte-monitor due to trademark issue "SAFTEMON" is a registered trademark of StorCase Technology Inc.
safte-monitor reads disk enclosure status information from SAF-TE capable enclosures (SCSI Accessible Fault Tolerant Enclosures). SAF-TE is a component of SES (SCSI Enclosure Services) which is common on most SCSI disk enclosures these days. safte-monitor can monitor multiple SAF-TE devices and will automatically probe and detect them.
The information retreived includes power supply and fan status, temperature, audible alarm, drive faults, array critical/failed/rebuilding state and door lock status. safte-monitor logs changes in the status of these enclosure elements to syslog and can optionally execute an alert help program with details of the component failure. This could send a pager message for example. Temperate alert limits also be set.
safte-monitor is specifcally useful if you have equipment deployed in remote locations or unattended over weekends when no-one may hear an audible alarm.
By default safte-monitor will run in the background and log to syslog. It can be run in one-shot scan mode to print found SAF-TE device info using the -p flag. It will also respond to web requests on port 8123 and display HTML output of enclosure status.
usage: ./safte-monitor [-h] [-p] [-n] [-a] [-T] [-t <max_temp>] [-A <alert_prog>] -h show this help message -p print - print device scan information then exit -T log temperature changes -N alert for non critical state changes -A <f> program to run for alerts -t <n> max temperature (default 35.0 celcius) -n numeric sg device names eg. /dev/sg0 (default) -a alpha sg device names eg. /dev/sga
By default temperatures and temperature limits are in Celcius. This can be changed to Farenheit by removing the -DUSE_CELCIUS from the Makefile.
#!/bin/sh echo ALERT device=$1 message=$2 system=$3 partno=$4 code=$5 | \ mail -s 'safte alert' root
# ./safte-monitor -p SAF-TE Device ESG-SHV SCA HSBP M10 (0:0:6:0) no. of fans = 0 no. of power supplies = 1 no. of device slots = 4 door lock installed = 0 no. of temp sensors = 1 audible alarm = 0 no. of thermostats = 0 power supply 0 is okay and on device slot 0 disk present,active,unconfigured device slot 1 insert ready device slot 2 insert ready device slot 3 insert ready temp sensor 0 is 25.0 celcius and okay overall temperature is okay SAF-TE Device CNSi G8324 (2:0:0:50) no. of fans = 4 no. of power supplies = 2 no. of device slots = 6 door lock installed = 1 no. of temp sensors = 3 audible alarm = 1 no. of thermostats = 0 power supply 0 is okay and on power supply 1 is okay and on fan 0 is operational fan 1 is operational fan 2 is operational fan 3 is operational device slot 0 disk not present,no error device slot 1 disk not present,no error device slot 2 disk not present,no error device slot 3 disk present,no error device slot 4 disk present,no error device slot 5 disk present,no error door lock is locked speaker is off temp sensor 0 is 25.6 celcius and okay temp sensor 1 is 25.6 celcius and okay temp sensor 2 is 21.7 celcius and okay overall temperature is okay
a temperature change (if option is selected to log temp changes)
SAF-TE Device CNSi G8324 (2:0:0:52): temp sensor 2 changed from '23.9 degrees' to '22.8 degrees'
a power supply failing
SAF-TE Device CNSi G8324 (2:0:0:51): power supply 1 changed from 'okay and on' to 'malfunctioning and commanded on' SAF-TE Device CNSi G8324 (2:0:0:51): ALERT power supply 1 malfunctioning and commanded on
the power supply back up again
SAF-TE Device CNSi G8324 (2:0:0:51): power supply 1 changed from 'malfunctioning and commanded on' to 'okay and on'
a drive in an array fails
SAF-TE Device CNSi G8324 (2:0:0:51): device slot 4 changed from 'disk present,no error' to 'disk present,critical array' SAF-TE Device CNSi G8324 (2:0:0:51): ALERT device slot 4 is disk present,critical array SAF-TE Device CNSi G8324 (2:0:0:51): device slot 5 changed from 'disk present,no error' to 'disk present,faulty' SAF-TE Device CNSi G8324 (2:0:0:51): ALERT device slot 5 is disk present,faulty
after a rebuild has been started
SAF-TE Device CNSi G8324 (2:0:0:51): device slot 4 changed from 'disk present,critical array' to 'disk present,rebuilding,critical array' SAF-TE Device CNSi G8324 (2:0:0:51): ALERT device slot 4 is disk present,rebuilding,critical array SAF-TE Device CNSi G8324 (2:0:0:51): device slot 5 changed from 'disk present,faulty' to 'disk present,rebuilding,critical array' SAF-TE Device CNSi G8324 (2:0:0:51): ALERT device slot 5 is disk present,rebuilding,critical array
and now the array is okay again
SAF-TE Device CNSi G8324 (2:0:0:51): device slot 4 changed from 'disk present,rebuilding,critical array' to 'disk present,no error' SAF-TE Device CNSi G8324 (2:0:0:51): device slot 5 changed from 'disk present,rebuilding,critical array' to 'disk present,no error'
Please send bug reports to michael@metaparadigm.com
The SAF-TE spec can be found here.
Copyright Metaparadigm Pte. Ltd. 2001. Michael Clark
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation.