May 08, 2013
Script to avoid server offloadingMonitoring the kindgdom… ZABBIX!
For a while, we’ve been trying to get proper monitoring at TakeLAN, my home servers, and customers after a lot of work, tuneup and changes we finally made it we got the perfect monitoring setup.
Monitoring…

On the owned datacenters, we’ve installed DHT22 sensors to monitor intake, and exhaust of the room temperature, with that we can get an idea on the heat produced, and it’s circulation. (There’s another sensor above some servers for extra monitoring) but we don’t pay attention to those since we also monitor the temp of the servers.

Those and our friendly Raspberry Pi 4 were enough to monitor both UPS’s and the temperature of one of the rooms.
The changes gave us peace of mind that at low cost, we could also keep historical data on it and use it as a primary jump host for techs and admins.
Setting those DHT22 was easy enough, we wrote some apps to monitor both and to calibrate them precisely at the temperature inside the room and the humidity.
This Raspberry also is in charge of reporting the temperature on other sensors, so it must be operative; we should change this single point of failure. Still, the UPS’s don’t support this operation in USB, and I’m looking into multiplexing options to detect a failure of this device and switch over.
Backups!
It’s no secret, I run multiple servers in my house, and I also have a detached garage which comes very handy for remote backups!
The backup happens over powerline (yeah yeah, I have two gig network wires running to the garage, but powerline was more comfortable!)
Again, it’s a raspberry pi 4 with 2 10 TB hard drives plugged in who receive a copy of my zpool, media, storage, LXC containers, and QEMU backups.
I guess if we need to grow, we’ll continue to use USB since we don’t write full blast to it, and it’s mainly archival… if we are going to recover more than likely, we’ll pick up the drives and plug them straight to a machine.

For this to happen, we had to retrofit the garage with an exhaust fan on the roof to keep the garage fresh enough for this device’s regular operation, and I also hooked them up in the roof gable to avoid vibrations of the air compressor and what not.
With this, we have covered the main points, monitoring, and backup.
Sync between datacenters
We opted for rsync + lsyncd.
This is the example we use to sync TX01 server to LA, KS and SEC.
-- General Settings local sourcesandtargets = require('syncfolders') settings { logfile = "/var/log/lsyncd.log", statusFile = "/var/run/lsyncd/lsyncd.status", pidfile = "/var/run/lsyncd/lsyncd.pid", maxDelays = 4000, insist = true, maxProcesses = 20 } -------------------------------------------------------------------------- -- LAX TRANSFER DETAILS -- -------------------------------------------------------------------------- ----------------------------------------------------------------- Bind Transfers sync { default.rsyncssh, source = "/var/lib/bind", targetdir = "/var/lib/bind", host = "lsyncd-vmin.la.takelan.com", excludeFrom = "/etc/lsyncd/exclude", exclude = { "*.log", "*.tmp", "*~", "*.swp" }, delete = "running", delay = 0, settings { maxProcesses = 1 }, rsync = { binary = "/etc/lsyncd/locking_rsync.sh", backup = true, backup_dir = "/var/lsyncdbackup/var/lib/bind/", archive = true, links = true, update = true, temp_dir = "/tmp/", }, } sync { default.rsyncssh, source = "/etc/bind", targetdir = "/etc/bind", host = "lsyncd-vmin.la.takelan.com", excludeFrom = "/etc/lsyncd/exclude", exclude = { "*.log", "*.tmp", "*~", "*.swp" }, delete = "running", delay = 0, settings { maxProcesses = 1 }, rsync = { binary = "/etc/lsyncd/locking_rsync.sh", backup = true, backup_dir = "/var/lsyncdbackup/etc/bind/", archive = true, links = true, update = true, temp_dir = "/tmp/", }, } -------------------------------------------------------------------------- -- LAX TRANSFER DETAILS -- -------------------------------------------------------------------------- -------------------------------------------------------------------------- -- KAN TRANSFER DETAILS -- -------------------------------------------------------------------------- ----------------------------------------------------------------- Bind Transfers sync { default.rsyncssh, source = "/var/lib/bind", targetdir = "/var/lib/bind", host = "lsyncd-vmin.ks.takelan.com", excludeFrom = "/etc/lsyncd/exclude", exclude = { "*.log", "*.tmp", "*~", "*.swp" }, delete = "running", delay = 0, settings { maxProcesses = 1 }, rsync = { binary = "/etc/lsyncd/locking_rsync.sh", backup = true, backup_dir = "/var/lsyncdbackup/var/lib/bind/", archive = true, links = true, update = true, temp_dir = "/tmp/", }, } sync { default.rsyncssh, source = "/etc/bind", targetdir = "/etc/bind", host = "lsyncd-vmin.ks.takelan.com", excludeFrom = "/etc/lsyncd/exclude", exclude = { "*.log", "*.tmp", "*~", "*.swp" }, delete = "running", delay = 0, settings { maxProcesses = 1 }, rsync = { binary = "/etc/lsyncd/locking_rsync.sh", backup = true, backup_dir = "/var/lsyncdbackup/etc/bind/", archive = true, links = true, update = true, temp_dir = "/tmp/", }, } -------------------------------------------------------------------------- -- KAN TRANSFER DETAILS -- -------------------------------------------------------------------------- -------------------------------------------------------------------------- -- SEC TRANSFER DETAILS -- -------------------------------------------------------------------------- ----------------------------------------------------------------- Default to transfer home files (mail, websites, etc) for _, sourcesandtargets in ipairs( sourcesandtargets ) do sync { default.rsyncssh, source = sourcesandtargets, targetdir = sourcesandtargets, host = "lsyncd-vmin.nj.takelan.com", excludeFrom = "/etc/lsyncd/exclude", exclude = { "*.log", "*.tmp", "*~", "*.swp" }, settings { maxProcesses = 1 }, delay = 300, delete = "running", rsync = { binary = "/etc/lsyncd/locking_rsync.sh", backup = true, backup_dir = "/var/lsyncdbackup/", archive = true, links = true, update = true, append_verify = true, temp_dir = "/tmp/", }, } end ----------------------------------------------------------------- Bind Transfers sync { default.rsyncssh, source = "/var/lib/bind", targetdir = "/var/lib/bind", host = "lsyncd-vmin.nj.takelan.com", excludeFrom = "/etc/lsyncd/exclude", exclude = { "*.log", "*.tmp", "*~", "*.swp" }, delete = "running", delay = 0, rsync = { binary = "/etc/lsyncd/locking_rsync.sh", backup = true, backup_dir = "/var/lsyncdbackup/var/lib/bind/", archive = true, links = true, update = true, temp_dir = "/tmp/", }, } sync { default.rsyncssh, source = "/etc/bind", targetdir = "/etc/bind", host = "lsyncd-vmin.nj.takelan.com", excludeFrom = "/etc/lsyncd/exclude", exclude = { "*.log", "*.tmp", "*~", "*.swp" }, delete = "running", delay = 0, rsync = { binary = "/etc/lsyncd/locking_rsync.sh", backup = true, backup_dir = "/var/lsyncdbackup/etc/bind/", archive = true, links = true, update = true, temp_dir = "/tmp/", }, } ----------------------------------------------------------------- Apache Transfers sync { default.rsyncssh, source = "/etc/apache2", targetdir = "/etc/apache2", host = "lsyncd-vmin.nj.takelan.com", excludeFrom = "/etc/lsyncd/exclude", exclude = { "*.log", "*.tmp", "*~", "*.swp" }, delete = "running", rsync = { binary = "/etc/lsyncd/locking_rsync.sh", backup = true, backup_dir = "/var/lsyncdbackup/etc/apache2/", archive = true, links = true, update = true, temp_dir = "/tmp/", }, } ----------------------------------------------------------------- Logrotate Transfers sync { default.rsyncssh, source = "/etc/logrotate.d", targetdir = "/etc/logrotate.d", host = "lsyncd-vmin.nj.takelan.com", excludeFrom = "/etc/lsyncd/exclude", exclude = { "*.log", "*.tmp", "*~", "*.swp" }, delete = "running", rsync = { binary = "/etc/lsyncd/locking_rsync.sh", backup = true, backup_dir = "/var/lsyncdbackup/etc/logrotate.d/", archive = true, links = true, update = true, temp_dir = "/tmp/", }, } ----------------------------------------------------------------- Cronjobs Transfers sync { default.rsyncssh, source = "/var/spool/cron/crontabs", targetdir = "/var/spool/cron/crontabs", host = "lsyncd-vmin.nj.takelan.com", excludeFrom = "/etc/lsyncd/exclude", exclude = { "*.log", "*.tmp", "*~", "*.swp" }, delete = "running", rsync = { binary = "/etc/lsyncd/locking_rsync.sh", backup = true, backup_dir = "/var/lsyncdbackup/var/spool/cron/crontabs/", archive = true, links = true, update = true, temp_dir = "/tmp/", }, } sync { default.rsyncssh, source = "/etc/cron.d", targetdir = "/etc/cron.d", host = "lsyncd-vmin.nj.takelan.com", excludeFrom = "/etc/lsyncd/exclude", exclude = { "*.log", "*.tmp", "*~", "*.swp" }, delete = "running", rsync = { binary = "/etc/lsyncd/locking_rsync.sh", backup = true, backup_dir = "/var/lsyncdbackup/etc/cron.d/", archive = true, links = true, update = true, temp_dir = "/tmp/", }, } sync { default.rsyncssh, source = "/etc/cron.daily", targetdir = "/etc/cron.daily", host = "lsyncd-vmin.nj.takelan.com", excludeFrom = "/etc/lsyncd/exclude", exclude = { "*.log", "*.tmp", "*~", "*.swp" }, delete = "running", rsync = { binary = "/etc/lsyncd/locking_rsync.sh", backup = true, backup_dir = "/var/lsyncdbackup/etc/cron.daily/", archive = true, links = true, update = true, temp_dir = "/tmp/", }, } sync { default.rsyncssh, source = "/etc/cron.hourly", targetdir = "/etc/cron.hourly", host = "lsyncd-vmin.nj.takelan.com", excludeFrom = "/etc/lsyncd/exclude", exclude = { "*.log", "*.tmp", "*~", "*.swp" }, delete = "running", rsync = { binary = "/etc/lsyncd/locking_rsync.sh", backup = true, backup_dir = "/var/lsyncdbackup/etc/cron.hourly/", archive = true, links = true, update = true, temp_dir = "/tmp/", }, } sync { default.rsyncssh, source = "/etc/cron.monthly", targetdir = "/etc/cron.monthly", host = "lsyncd-vmin.nj.takelan.com", excludeFrom = "/etc/lsyncd/exclude", exclude = { "*.log", "*.tmp", "*~", "*.swp" }, delete = "running", rsync = { binary = "/etc/lsyncd/locking_rsync.sh", backup = true, backup_dir = "/var/lsyncdbackup/etc/cron.monthly/", archive = true, links = true, update = true, temp_dir = "/tmp/", }, } sync { default.rsyncssh, source = "/etc/cron.weekly", targetdir = "/etc/cron.weekly", host = "lsyncd-vmin.nj.takelan.com", excludeFrom = "/etc/lsyncd/exclude", exclude = { "*.log", "*.tmp", "*~", "*.swp" }, delete = "running", rsync = { binary = "/etc/lsyncd/locking_rsync.sh", backup = true, backup_dir = "/var/lsyncdbackup/etc/cron.weekly/", archive = true, links = true, update = true, temp_dir = "/tmp/", }, } ------- THIS ONE MUST USE RSYNC DIRECTLY! sync { default.rsyncssh, source = "/etc", targetdir = "/etc", host = "lsyncd-vmin.nj.takelan.com", excludeFrom = "/etc/lsyncd/exclude", exclude = { "*.log", "*.tmp", "*~", "*.swp" }, delete = "running", rsync = { binary = "/usr/bin/rsync", backup = true, backup_dir = "/var/lsyncdbackup/etc/crontab/", archive = true, links = true, update = true, _extra = { "--include=crontab", "--exclude=*" }, temp_dir = "/tmp/", }, } ------- THIS ONE MUST USE RSYNC DIRECTLY! ----------------------------------------------------------------- Webmin Transfers sync { default.rsyncssh, source = "/etc/webmin", targetdir = "/etc/webmin", host = "lsyncd-vmin.nj.takelan.com", excludeFrom = "/etc/lsyncd/exclude", exclude = { "*.log", "*.tmp", "*~", "*.swp" }, delete = "running", rsync = { binary = "/etc/lsyncd/locking_rsync.sh", backup = true, backup_dir = "/var/lsyncdbackup/etc/webmin", archive = true, links = true, update = true, temp_dir = "/tmp/", }, } -------------------------------------------------------------------------- -- SEC TRANSFER DETAILS -- --------------------------------------------------------------------------
This image may explain the sync.

As you see, we keep servers in sync for some critical services to make sure if a location fails, we can still respond, namely (SEC01 & TX01).
This way, we’ll ensure changes in one server will reach another now a classic sync problem should we copy open files?… No, not really, so for that, we create a wrapper around rsync.
#!/bin/bash # REMEMBER TO MOUNT THIS FOLDER! # sshfs#root@vmin01.tx.takelan.com:/opt/scripts/locks /opt/scripts/locks fuse delay_connect,defaults,idmap=user,IdentityFile=/root/.ssh/id_rsa,port=22,uid=0,gid=0,allow_other 0 0 ### Definitions OPENFILES_SLEEP_TIME=5 SLEEP_TIME=25 MAX_WAIT=3600 LOCKFILE_FOLDER='/opt/scripts/locks' LOCK_FILE="$LOCKFILE_FOLDER/rsync-lock" FOLDER_HOST='vmin01' HOSTNAME=`hostname` # Rsync wrapper to avoid copying partial - opened files... RSYNC_BINARY="/usr/bin/rsync" echo "Running locking with $@" >> /root/params.log source=(${@: -2}) #---------------------------------------------------------------> Functions # Call this function to decide the final destiny of the sync. function checkOrDie() { mountpoint $LOCKFILE_FOLDER > /dev/null 2>&1 if [ $? -ne 0 ]; then echo "Failure... exiting due to mountpoint failure." exit 0 fi return 0 } # Check if the mountpoint is mounted and working... function checkMountpoint() { mountpoint $LOCKFILE_FOLDER > /dev/null 2>&1 if [ $? -eq 0 ]; then echo "Success: Everything is okay." else umount -f $LOCKFILE_FOLDER mount $LOCKFILE_FOLDER checkOrDie fi return 0 } # Remove lockfile if exists function checkMountpointWorks() { if [ $HOSTNAME == $FOLDER_HOST ]; then echo "I'm the folder host... so skipping" else checkMountpoint fi return 0 } # Remove lockfile if exists function removeLockfile() { if [ -f $LOCK_FILE ]; then cat $LOCK_FILE | grep $HOSTNAME > /dev/null 2>&1 if [ $? -eq 0 ]; then echo "Lockfile exists and its mine, removing" rm $LOCK_FILE fi fi return 0 } # Removes the lock even if it belongs to someone else function forceRemoveLock(){ if [ -f $LOCK_FILE ]; then LOCKCONTENT=`cat $LOCK_FILE` echo "Lockfile exists and belongs to $LOCKCONTENT! since im forcing... removing" rm $LOCK_FILE fi return 0 } # Create the lockfile with my hostname function createLockfile() { echo $HOSTNAME > $LOCK_FILE return 0 } # Check if source has open files function checkForOpenFiles() { echo "Checking ${source[0]} for open files..." lsof +D "${source[0]}" | tail -n +2 | awk '{ print $4 ";" $9 }' | grep -v 'cwd;' | grep -v 'dovecot' > /dev/null 2>&1 hasOpenFiles=$? echo "Has open files, returned $hasOpenFiles" while [ $hasOpenFiles -eq 0 ]; do echo "It seems like we do have open files... blocking sync!" # This could take a bit... lsof +D "${source[0]}" | tail -n +2 | awk '{ print $4 ";" $9 }' | grep -v 'cwd;' | grep -v 'dovecot' > /dev/null 2>&1 hasOpenFiles=$? echo "Has open files, returned $hasOpenFiles" # Avoid CPU Pinning sleep $OPENFILES_SLEEP_TIME echo "Sleeping for $OPENFILES_SLEEP_TIME" # Differential sleep sleep $[ ( $RANDOM % 5 ) + 1 ]s done echo "Done checking... calling rsync!" return 0 } #---------------------------------------------------------------> Functions #---------------------------------------------------------------> Main Program # Make sure the mount is working properly checkMountpointWorks # Check if file exists, wait for global lock to go away if [ ! -f $LOCK_FILE ]; then echo "Executing lock..." createLockfile echo "Checking for open files..." echo "Syncing!" # Wait for the lock file to expire or until removed. else NUM_SECS=$(( $(date +%s) - $(stat -c %Y $LOCK_FILE) )) while [ -f $LOCK_FILE ] && (( $NUM_SECS < $MAX_WAIT )); do sleep $SLEEP_TIME # Differential sleep sleep $[ ( $RANDOM % 10 ) + 1 ]s if [ -f $LOCK_FILE ]; then NUM_SECS=$(( $(date +%s) - $(stat -c %Y $LOCK_FILE) )) echo "Lockfile exists for: $NUM_SECS seconds..." else break fi done echo "Maximum wait reached or file removed on remote end" forceRemoveLock createLockfile echo "Syncing!" fi # Stop to check for open files checkForOpenFiles # Start syncing $RSYNC_BINARY "$@" rsync_res=$? # Cleanup echo "Rsync finished with status $rsync_res..." removeLockfile exit $rsync_res #---------------------------------------------------------------> Main Program
This script will wrap the lsync rsync call and ensure that the origin doesn’t currently have open files, and thus we can initiate a copy.
Yes, this won’t guarantee that at that EXACT moment, there are no open files, but… remember it’s just a bandwidth saving operation, not a critical one.
I got tired of typing but this covers a large part of our monitoring and maintainance.
See ya in post 2!