February 27, 2014
du command, disk usage with style!Monitoring the kindgdom… ZABBIX!
For a while, we’ve been trying to get proper monitoring at TakeLAN, my home servers, and customers after a lot of work, tuneup and changes we finally made it we got the perfect monitoring setup.
Monitoring…


On the owned datacenters, we’ve installed DHT22 sensors to monitor intake, and exhaust of the room temperature, with that we can get an idea on the heat produced, and it’s circulation. (There’s another sensor above some servers for extra monitoring) but we don’t pay attention to those since we also monitor the temp of the servers.

Those and our friendly Raspberry Pi 4 were enough to monitor both UPS’s and the temperature of one of the rooms.
The changes gave us peace of mind that at low cost, we could also keep historical data on it and use it as a primary jump host for techs and admins.
Setting those DHT22 was easy enough, we wrote some apps to monitor both and to calibrate them precisely at the temperature inside the room and the humidity.
This Raspberry also is in charge of reporting the temperature on other sensors, so it must be operative; we should change this single point of failure. Still, the UPS’s don’t support this operation in USB, and I’m looking into multiplexing options to detect a failure of this device and switch over.
Backups!
It’s no secret, I run multiple servers in my house, and I also have a detached garage which comes very handy for remote backups!
The backup happens over powerline (yeah yeah, I have two gig network wires running to the garage, but powerline was more comfortable!)
Again, it’s a raspberry pi 4 with 2 10 TB hard drives plugged in who receive a copy of my zpool, media, storage, LXC containers, and QEMU backups.
I guess if we need to grow, we’ll continue to use USB since we don’t write full blast to it, and it’s mainly archival… if we are going to recover more than likely, we’ll pick up the drives and plug them straight to a machine.

For this to happen, we had to retrofit the garage with an exhaust fan on the roof to keep the garage fresh enough for this device’s regular operation, and I also hooked them up in the roof gable to avoid vibrations of the air compressor and what not.
With this, we have covered the main points, monitoring, and backup.
Sync between datacenters
We opted for rsync + lsyncd.
This is the example we use to sync TX01 server to LA, KS and SEC.
-- General Settings
local sourcesandtargets = require('syncfolders')
settings {
logfile = "/var/log/lsyncd.log",
statusFile = "/var/run/lsyncd/lsyncd.status",
pidfile = "/var/run/lsyncd/lsyncd.pid",
maxDelays = 4000,
insist = true,
maxProcesses = 20
}
--------------------------------------------------------------------------
-- LAX TRANSFER DETAILS --
--------------------------------------------------------------------------
----------------------------------------------------------------- Bind Transfers
sync {
default.rsyncssh,
source = "/var/lib/bind",
targetdir = "/var/lib/bind",
host = "lsyncd-vmin.la.takelan.com",
excludeFrom = "/etc/lsyncd/exclude",
exclude = { "*.log", "*.tmp", "*~", "*.swp" },
delete = "running",
delay = 0,
settings { maxProcesses = 1 },
rsync = {
binary = "/etc/lsyncd/locking_rsync.sh",
backup = true,
backup_dir = "/var/lsyncdbackup/var/lib/bind/",
archive = true,
links = true,
update = true,
temp_dir = "/tmp/",
},
}
sync {
default.rsyncssh,
source = "/etc/bind",
targetdir = "/etc/bind",
host = "lsyncd-vmin.la.takelan.com",
excludeFrom = "/etc/lsyncd/exclude",
exclude = { "*.log", "*.tmp", "*~", "*.swp" },
delete = "running",
delay = 0,
settings { maxProcesses = 1 },
rsync = {
binary = "/etc/lsyncd/locking_rsync.sh",
backup = true,
backup_dir = "/var/lsyncdbackup/etc/bind/",
archive = true,
links = true,
update = true,
temp_dir = "/tmp/",
},
}
--------------------------------------------------------------------------
-- LAX TRANSFER DETAILS --
--------------------------------------------------------------------------
--------------------------------------------------------------------------
-- KAN TRANSFER DETAILS --
--------------------------------------------------------------------------
----------------------------------------------------------------- Bind Transfers
sync {
default.rsyncssh,
source = "/var/lib/bind",
targetdir = "/var/lib/bind",
host = "lsyncd-vmin.ks.takelan.com",
excludeFrom = "/etc/lsyncd/exclude",
exclude = { "*.log", "*.tmp", "*~", "*.swp" },
delete = "running",
delay = 0,
settings { maxProcesses = 1 },
rsync = {
binary = "/etc/lsyncd/locking_rsync.sh",
backup = true,
backup_dir = "/var/lsyncdbackup/var/lib/bind/",
archive = true,
links = true,
update = true,
temp_dir = "/tmp/",
},
}
sync {
default.rsyncssh,
source = "/etc/bind",
targetdir = "/etc/bind",
host = "lsyncd-vmin.ks.takelan.com",
excludeFrom = "/etc/lsyncd/exclude",
exclude = { "*.log", "*.tmp", "*~", "*.swp" },
delete = "running",
delay = 0,
settings { maxProcesses = 1 },
rsync = {
binary = "/etc/lsyncd/locking_rsync.sh",
backup = true,
backup_dir = "/var/lsyncdbackup/etc/bind/",
archive = true,
links = true,
update = true,
temp_dir = "/tmp/",
},
}
--------------------------------------------------------------------------
-- KAN TRANSFER DETAILS --
--------------------------------------------------------------------------
--------------------------------------------------------------------------
-- SEC TRANSFER DETAILS --
--------------------------------------------------------------------------
----------------------------------------------------------------- Default to transfer home files (mail, websites, etc)
for _, sourcesandtargets in ipairs( sourcesandtargets )
do
sync {
default.rsyncssh,
source = sourcesandtargets,
targetdir = sourcesandtargets,
host = "lsyncd-vmin.nj.takelan.com",
excludeFrom = "/etc/lsyncd/exclude",
exclude = { "*.log", "*.tmp", "*~", "*.swp" },
settings { maxProcesses = 1 },
delay = 300,
delete = "running",
rsync = {
binary = "/etc/lsyncd/locking_rsync.sh",
backup = true,
backup_dir = "/var/lsyncdbackup/",
archive = true,
links = true,
update = true,
append_verify = true,
temp_dir = "/tmp/",
},
}
end
----------------------------------------------------------------- Bind Transfers
sync {
default.rsyncssh,
source = "/var/lib/bind",
targetdir = "/var/lib/bind",
host = "lsyncd-vmin.nj.takelan.com",
excludeFrom = "/etc/lsyncd/exclude",
exclude = { "*.log", "*.tmp", "*~", "*.swp" },
delete = "running",
delay = 0,
rsync = {
binary = "/etc/lsyncd/locking_rsync.sh",
backup = true,
backup_dir = "/var/lsyncdbackup/var/lib/bind/",
archive = true,
links = true,
update = true,
temp_dir = "/tmp/",
},
}
sync {
default.rsyncssh,
source = "/etc/bind",
targetdir = "/etc/bind",
host = "lsyncd-vmin.nj.takelan.com",
excludeFrom = "/etc/lsyncd/exclude",
exclude = { "*.log", "*.tmp", "*~", "*.swp" },
delete = "running",
delay = 0,
rsync = {
binary = "/etc/lsyncd/locking_rsync.sh",
backup = true,
backup_dir = "/var/lsyncdbackup/etc/bind/",
archive = true,
links = true,
update = true,
temp_dir = "/tmp/",
},
}
----------------------------------------------------------------- Apache Transfers
sync {
default.rsyncssh,
source = "/etc/apache2",
targetdir = "/etc/apache2",
host = "lsyncd-vmin.nj.takelan.com",
excludeFrom = "/etc/lsyncd/exclude",
exclude = { "*.log", "*.tmp", "*~", "*.swp" },
delete = "running",
rsync = {
binary = "/etc/lsyncd/locking_rsync.sh",
backup = true,
backup_dir = "/var/lsyncdbackup/etc/apache2/",
archive = true,
links = true,
update = true,
temp_dir = "/tmp/",
},
}
----------------------------------------------------------------- Logrotate Transfers
sync {
default.rsyncssh,
source = "/etc/logrotate.d",
targetdir = "/etc/logrotate.d",
host = "lsyncd-vmin.nj.takelan.com",
excludeFrom = "/etc/lsyncd/exclude",
exclude = { "*.log", "*.tmp", "*~", "*.swp" },
delete = "running",
rsync = {
binary = "/etc/lsyncd/locking_rsync.sh",
backup = true,
backup_dir = "/var/lsyncdbackup/etc/logrotate.d/",
archive = true,
links = true,
update = true,
temp_dir = "/tmp/",
},
}
----------------------------------------------------------------- Cronjobs Transfers
sync {
default.rsyncssh,
source = "/var/spool/cron/crontabs",
targetdir = "/var/spool/cron/crontabs",
host = "lsyncd-vmin.nj.takelan.com",
excludeFrom = "/etc/lsyncd/exclude",
exclude = { "*.log", "*.tmp", "*~", "*.swp" },
delete = "running",
rsync = {
binary = "/etc/lsyncd/locking_rsync.sh",
backup = true,
backup_dir = "/var/lsyncdbackup/var/spool/cron/crontabs/",
archive = true,
links = true,
update = true,
temp_dir = "/tmp/",
},
}
sync {
default.rsyncssh,
source = "/etc/cron.d",
targetdir = "/etc/cron.d",
host = "lsyncd-vmin.nj.takelan.com",
excludeFrom = "/etc/lsyncd/exclude",
exclude = { "*.log", "*.tmp", "*~", "*.swp" },
delete = "running",
rsync = {
binary = "/etc/lsyncd/locking_rsync.sh",
backup = true,
backup_dir = "/var/lsyncdbackup/etc/cron.d/",
archive = true,
links = true,
update = true,
temp_dir = "/tmp/",
},
}
sync {
default.rsyncssh,
source = "/etc/cron.daily",
targetdir = "/etc/cron.daily",
host = "lsyncd-vmin.nj.takelan.com",
excludeFrom = "/etc/lsyncd/exclude",
exclude = { "*.log", "*.tmp", "*~", "*.swp" },
delete = "running",
rsync = {
binary = "/etc/lsyncd/locking_rsync.sh",
backup = true,
backup_dir = "/var/lsyncdbackup/etc/cron.daily/",
archive = true,
links = true,
update = true,
temp_dir = "/tmp/",
},
}
sync {
default.rsyncssh,
source = "/etc/cron.hourly",
targetdir = "/etc/cron.hourly",
host = "lsyncd-vmin.nj.takelan.com",
excludeFrom = "/etc/lsyncd/exclude",
exclude = { "*.log", "*.tmp", "*~", "*.swp" },
delete = "running",
rsync = {
binary = "/etc/lsyncd/locking_rsync.sh",
backup = true,
backup_dir = "/var/lsyncdbackup/etc/cron.hourly/",
archive = true,
links = true,
update = true,
temp_dir = "/tmp/",
},
}
sync {
default.rsyncssh,
source = "/etc/cron.monthly",
targetdir = "/etc/cron.monthly",
host = "lsyncd-vmin.nj.takelan.com",
excludeFrom = "/etc/lsyncd/exclude",
exclude = { "*.log", "*.tmp", "*~", "*.swp" },
delete = "running",
rsync = {
binary = "/etc/lsyncd/locking_rsync.sh",
backup = true,
backup_dir = "/var/lsyncdbackup/etc/cron.monthly/",
archive = true,
links = true,
update = true,
temp_dir = "/tmp/",
},
}
sync {
default.rsyncssh,
source = "/etc/cron.weekly",
targetdir = "/etc/cron.weekly",
host = "lsyncd-vmin.nj.takelan.com",
excludeFrom = "/etc/lsyncd/exclude",
exclude = { "*.log", "*.tmp", "*~", "*.swp" },
delete = "running",
rsync = {
binary = "/etc/lsyncd/locking_rsync.sh",
backup = true,
backup_dir = "/var/lsyncdbackup/etc/cron.weekly/",
archive = true,
links = true,
update = true,
temp_dir = "/tmp/",
},
}
------- THIS ONE MUST USE RSYNC DIRECTLY!
sync {
default.rsyncssh,
source = "/etc",
targetdir = "/etc",
host = "lsyncd-vmin.nj.takelan.com",
excludeFrom = "/etc/lsyncd/exclude",
exclude = { "*.log", "*.tmp", "*~", "*.swp" },
delete = "running",
rsync = {
binary = "/usr/bin/rsync",
backup = true,
backup_dir = "/var/lsyncdbackup/etc/crontab/",
archive = true,
links = true,
update = true,
_extra = { "--include=crontab", "--exclude=*" },
temp_dir = "/tmp/",
},
}
------- THIS ONE MUST USE RSYNC DIRECTLY!
----------------------------------------------------------------- Webmin Transfers
sync {
default.rsyncssh,
source = "/etc/webmin",
targetdir = "/etc/webmin",
host = "lsyncd-vmin.nj.takelan.com",
excludeFrom = "/etc/lsyncd/exclude",
exclude = { "*.log", "*.tmp", "*~", "*.swp" },
delete = "running",
rsync = {
binary = "/etc/lsyncd/locking_rsync.sh",
backup = true,
backup_dir = "/var/lsyncdbackup/etc/webmin",
archive = true,
links = true,
update = true,
temp_dir = "/tmp/",
},
}
--------------------------------------------------------------------------
-- SEC TRANSFER DETAILS --
--------------------------------------------------------------------------
This image may explain the sync.

As you see, we keep servers in sync for some critical services to make sure if a location fails, we can still respond, namely (SEC01 & TX01).
This way, we’ll ensure changes in one server will reach another now a classic sync problem should we copy open files?… No, not really, so for that, we create a wrapper around rsync.
#!/bin/bash
# REMEMBER TO MOUNT THIS FOLDER!
# sshfs#root@vmin01.tx.takelan.com:/opt/scripts/locks /opt/scripts/locks fuse delay_connect,defaults,idmap=user,IdentityFile=/root/.ssh/id_rsa,port=22,uid=0,gid=0,allow_other 0 0
### Definitions
OPENFILES_SLEEP_TIME=5
SLEEP_TIME=25
MAX_WAIT=3600
LOCKFILE_FOLDER='/opt/scripts/locks'
LOCK_FILE="$LOCKFILE_FOLDER/rsync-lock"
FOLDER_HOST='vmin01'
HOSTNAME=`hostname`
# Rsync wrapper to avoid copying partial - opened files...
RSYNC_BINARY="/usr/bin/rsync"
echo "Running locking with $@" >> /root/params.log
source=(${@: -2})
#---------------------------------------------------------------> Functions
# Call this function to decide the final destiny of the sync.
function checkOrDie() {
mountpoint $LOCKFILE_FOLDER > /dev/null 2>&1
if [ $? -ne 0 ]; then
echo "Failure... exiting due to mountpoint failure."
exit 0
fi
return 0
}
# Check if the mountpoint is mounted and working...
function checkMountpoint() {
mountpoint $LOCKFILE_FOLDER > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo "Success: Everything is okay."
else
umount -f $LOCKFILE_FOLDER
mount $LOCKFILE_FOLDER
checkOrDie
fi
return 0
}
# Remove lockfile if exists
function checkMountpointWorks() {
if [ $HOSTNAME == $FOLDER_HOST ]; then
echo "I'm the folder host... so skipping"
else
checkMountpoint
fi
return 0
}
# Remove lockfile if exists
function removeLockfile() {
if [ -f $LOCK_FILE ]; then
cat $LOCK_FILE | grep $HOSTNAME > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo "Lockfile exists and its mine, removing"
rm $LOCK_FILE
fi
fi
return 0
}
# Removes the lock even if it belongs to someone else
function forceRemoveLock(){
if [ -f $LOCK_FILE ]; then
LOCKCONTENT=`cat $LOCK_FILE`
echo "Lockfile exists and belongs to $LOCKCONTENT! since im forcing... removing"
rm $LOCK_FILE
fi
return 0
}
# Create the lockfile with my hostname
function createLockfile() {
echo $HOSTNAME > $LOCK_FILE
return 0
}
# Check if source has open files
function checkForOpenFiles() {
echo "Checking ${source[0]} for open files..."
lsof +D "${source[0]}" | tail -n +2 | awk '{ print $4 ";" $9 }' | grep -v 'cwd;' | grep -v 'dovecot' > /dev/null 2>&1
hasOpenFiles=$?
echo "Has open files, returned $hasOpenFiles"
while [ $hasOpenFiles -eq 0 ]; do
echo "It seems like we do have open files... blocking sync!"
# This could take a bit...
lsof +D "${source[0]}" | tail -n +2 | awk '{ print $4 ";" $9 }' | grep -v 'cwd;' | grep -v 'dovecot' > /dev/null 2>&1
hasOpenFiles=$?
echo "Has open files, returned $hasOpenFiles"
# Avoid CPU Pinning
sleep $OPENFILES_SLEEP_TIME
echo "Sleeping for $OPENFILES_SLEEP_TIME"
# Differential sleep
sleep $[ ( $RANDOM % 5 ) + 1 ]s
done
echo "Done checking... calling rsync!"
return 0
}
#---------------------------------------------------------------> Functions
#---------------------------------------------------------------> Main Program
# Make sure the mount is working properly
checkMountpointWorks
# Check if file exists, wait for global lock to go away
if [ ! -f $LOCK_FILE ]; then
echo "Executing lock..."
createLockfile
echo "Checking for open files..."
echo "Syncing!"
# Wait for the lock file to expire or until removed.
else
NUM_SECS=$(( $(date +%s) - $(stat -c %Y $LOCK_FILE) ))
while [ -f $LOCK_FILE ] && (( $NUM_SECS < $MAX_WAIT )); do
sleep $SLEEP_TIME
# Differential sleep
sleep $[ ( $RANDOM % 10 ) + 1 ]s
if [ -f $LOCK_FILE ]; then
NUM_SECS=$(( $(date +%s) - $(stat -c %Y $LOCK_FILE) ))
echo "Lockfile exists for: $NUM_SECS seconds..."
else
break
fi
done
echo "Maximum wait reached or file removed on remote end"
forceRemoveLock
createLockfile
echo "Syncing!"
fi
# Stop to check for open files
checkForOpenFiles
# Start syncing
$RSYNC_BINARY "$@"
rsync_res=$?
# Cleanup
echo "Rsync finished with status $rsync_res..."
removeLockfile
exit $rsync_res
#---------------------------------------------------------------> Main ProgramThis script will wrap the lsync rsync call and ensure that the origin doesn’t currently have open files, and thus we can initiate a copy.
Yes, this won’t guarantee that at that EXACT moment, there are no open files, but… remember it’s just a bandwidth saving operation, not a critical one.
I got tired of typing but this covers a large part of our monitoring and maintainance.
See ya in post 2!