MonkeyBrains.net/~rudy/example Random examples

  3ware monitoring  
nagios_3ware using check_by_inet
Nagios is a quick way to consolodate all those monitoring crons into one place. Having them spread across more than 2 machines can really bog you down. :) Often, we install 3ware cards. In a cluster (with an internal LAN) it is easier (for me) to setup inetd to do queries rather than setup SSH with keys and then turning on sudo for 3ware scripts (tw_cli needs root access). Probably more secure as well... opening up one port to run one script as root is better than opening up ssh access from your monitoring box. OK, all that said:

I found a comprehensive Nagios_3ware checking script on roedie.nl. Download it, plop it in /root/, make a couple of edits, setup inetd, test it, set up nagios to do check_by_inet. This can be used for any script you want to run as root or as any other user without any login credentials -- think "internal network" or "firewall that port" to keep the big ol' Internet out.

Edit the script

You downloaded check_3ware from roedie.nl but it needs a couple of edits to work as an inetd script. You need to set the PATH in your script as inet is doesn't do that as it isn't a shell.... PATH needs to include awk and grep. I go ahead and manually code in the location of tw_cli as well.
# changes to check_3ware.sh 

... Change the part that defines TWCLI ...
PATH=/usr/bin:/bin
TWCLI=/usr/local/bin/tw_cli
 ... Add this right BEFORE the case "$EXITCODE" in part of check_3ware.sh ...
# if we are calling via check_by_inet, prefix the exitcode
if [ "X$inetd_dummy" != "X" ]; then
        echo -n "$EXITCODE "
fi

Setup inetd

Look through /etc/services and pick a service you like the name of and isn't in use. Lets say you pick venus ... that means you'll get check_3ware.sh listening on port 2430 (the venus port). Here is your /etc/inetd.conf config:
# /etc/inetd.conf
venus stream  tcp     nowait.4  root    /root/check_3ware.sh
The .4 means: don't let this script run more than 4 times per minute. Really, you should only be checking every hour or so --- no point in getting alerts about RAID every 10 minutes. You get one alert, and you head down to the data center and swap out the disk as soon as you can. Filling up your mailbox will just train you to ignore alerts. Side rant: a well configure monitoring system ONLY emails you when you actually have to do something. Spurious alerts train the brain to treat alerts like spam.

Assuming you don't have any other inet.d stuff running, you need to start it now... /etc/init.d/openbsd-inetd restart and to make it startup on reboot, you need to set up your boot (rc3.d in CentOS, rc2.d in Debian, rc.d in BSD, good old rc.local, whatever) script.

Test!

#telnet host-with-3ware.example.com venus 
Trying 10.10.10.2...
Connected to gibbon.
Escape character is '^]'.
0 UNITS OK:  /c0/u0 OK -
Connection closed by foreign host.
The 0 will be our exit code used by check_by_inet.

check_by_inet

I saw there was a check_by_ssh ... lots of admin overhead there. ssh keys, setting up sudo for scripts to run as root, etc. That is where I got the idea to spend an hour making check_by_inet: because I am LAZY! I figure an hour spent on this script will pay itself off once I deploy to at least 20 boxes -- I am saving 3 minutes per box by using check_by_inet (I hope)! You, my friend, are saving even more time by just copy and pasting this schiznit.

Drop this in /usr/lib/nagios/plugins on the nagios main server (not the client end).

#! /usr/bin/perl -wT # ----------------------------------------------------------------------------- # File Name: check_be_inet.pl # # Author: Rudy Rucker - Berkeley, California # # Date: 2010/03/24 # # Description: This script will check any port for scripts you have # bound to ports via inetd. # # Email: crapsh@monkeybrains.net (rarely check that address) # # ----------------------------------------------------------------------------- # Beerware 2010 (bw) Rudy Rucker # # Credits go to Ethan Galstad for coding Nagios # Credits go to Richard Mayhew for coding check_ircd.pl # # Lots of code taken from check_ircd.pl # # ----------------------------------------------------------------------------- # ----------------------------------------------------------------[ Require ]-- require 5.004; # -------------------------------------------------------------------[ Uses ]-- use Socket; use strict; use Getopt::Long; use vars qw($opt_V $opt_h $opt_t $opt_p $opt_H $opt_w $opt_c $verbose); use lib "/usr/lib/nagios/plugins"; use utils qw($TIMEOUT %ERRORS &print_revision &support &usage); # ----------------------------------------------------[ Function Prototypes ]-- sub print_help (); sub print_usage (); sub bindRemote ($$$); # -------------------------------------------------------------[ Enviroment ]-- $ENV{PATH} = ""; $ENV{ENV} = ""; $ENV{BASH_ENV} = ""; # -------------------------------------------------------------[ print_help ]-- sub print_help () { print "Beerware (c) 2010 Rudy Rucker Perl check_by_inet plugin for Nagios "; print "Usage: $0 -H <host> -p <port> [-t <timeout>]\n"; print " -H, --hostname=HOST Name or IP address of host to check -p, --port=INTEGER Port that you wish to query (Default: none ) -v, --verbose Print extra debugging information -t, --timeout=INTEBER Maximum time for conx and remote response "; } # -------------------------------------------------------------[ bindRemote ]-- sub bindRemote ($$$) { my ($in_remotehost, $in_remoteport) = @_; my $proto = getprotobyname('tcp'); my $sockaddr; my $this; my $that; my ($name, $aliases,$type,$len,$thataddr) = gethostbyname($in_remotehost); if (!socket(ClientSocket,AF_INET, SOCK_STREAM, $proto)) { print "PORT UNKNOWN: Could not start socket ($!)\n"; exit $ERRORS{"UNKNOWN"}; } $sockaddr = 'S n a4 x8'; $this = pack($sockaddr, AF_INET, 0, INADDR_ANY); $that = pack($sockaddr, AF_INET, $in_remoteport, $thataddr); if (!bind(ClientSocket, $this)) { print "PORT UNKNOWN: Could not bind socket ($!)\n"; exit $ERRORS{"UNKNOWN"}; } if (!connect(ClientSocket, $that)) { print "PORT UNKNOWN: Could not connect socket to port $in_remoteport ($!)\n"; exit $ERRORS{"UNKNOWN"}; } select(ClientSocket); $| = 1; select(STDOUT); return \*ClientSocket; } # ===================================================================[ MAIN ]== MAIN: { my $hostname; Getopt::Long::Configure('bundling'); GetOptions ( "h" => \$opt_h, "help" => \$opt_h, "v" => \$verbose,"verbose" => \$verbose, "t=i" => \$opt_t, "timeout=i" => \$opt_t, "p=i" => \$opt_p, "port=i" => \$opt_p, "H=s" => \$opt_H, "hostname=s" => \$opt_H); if ($opt_h) {print_help(); exit $ERRORS{'OK'};} ($opt_H) || ($opt_H = shift) || usage("Remote hostname not specified\n"); my $remotehost = $1 if ($opt_H =~ /([-.A-Za-z0-9]+)/); ($remotehost) || usage("Invalid host: $opt_H\n"); ($opt_p) || ($opt_p = shift) || ($opt_p = 'UndefinedPort'); my $remoteport = $1 if ($opt_p =~ /^([0-9]+)$/); ($remoteport) || usage("Remote port not specified: $opt_p is invalid\n"); if ($opt_t && $opt_t =~ /^([0-9]+)$/) { $TIMEOUT = $1; } # Just in case of problems, let's not hang Nagios $SIG{'ALRM'} = sub { print "Something is Taking a Long Time, Increase Your TIMEOUT (Currently Set At $TIMEOUT Seconds)\n"; exit $ERRORS{"UNKNOWN"}; }; alarm($TIMEOUT); my ($name, $alias, $proto) = getprotobyname('tcp'); if ($verbose) { print "MAIN(debug): Timeout set to $TIMEOUT seconds\n"; #sleep $TIMEOUT; # uncomment this if you want to see a timeout error # Don't bother checking hostname unless we are in verbose mode chomp($hostname = `/bin/hostname`); $hostname = $1 if ($hostname =~ /([-.a-zA-Z0-9]+)/); print "MAIN(debug): hostname = $hostname\n"; print "MAIN(debug): binding to remote host: $remotehost -> $remoteport -> $hostname\n"; } my $ClientSocket = &bindRemote($remotehost,$remoteport); while (<ClientSocket>) { my $remoteExitCode=$ERRORS{"UNKNOWN"}; # unknown state if (s/^([0-3]) //) { # policy: prefix of digit then space on remote side will be exit code. $remoteExitCode=$1; } print "$_"; # this is the one line output from the other side! alarm(0); exit($remoteExitCode); } print "PORT UNKNOWN: Unknown error - maybe inetd is not up on other end...\n"; exit $ERRORS{"UNKNOWN"}; }

Next, set up the 'command' on the nagios server:

# /etc/nagios3/conf.d/check_by_inet.cfg 
# The check by inet command!  Takes the PORT number as it's arg.
define command {
    command_name    check_by_inet
    command_line    /usr/lib/nagios/plugins/check_by_inet -p '$ARG1$' -H '$HOSTADDRESS$' -t 10
}
# check that 3ware card state via inet
define service {
        hostgroup_name                  3ware-servers
        service_description             3WARE
        check_command                   check_by_inet!2430
        use                             generic-service
        normal_check_interval           30
        retry_check_interval            10
        notification_interval           72 ; 10minutes * 72 = 12 hours
}
Goodness, it actually works. Took me more like 3 hours to figure all this out. :)