Blog

For a while, we’ve been trying to get proper monitoring at TakeLAN, my home servers, and customers after a lot of work, tuneup and changes we finally made it we got the perfect monitoring setup.

Monitoring…

DHT22 sensor.

On the owned datacenters, we’ve installed DHT22 sensors to monitor intake, and exhaust of the room temperature, with that we can get an idea on the heat produced, and it’s circulation. (There’s another sensor above some servers for extra monitoring) but we don’t pay attention to those since we also monitor the temp of the servers.

This exact aluminum case with a Raspi4

Those and our friendly Raspberry Pi 4 were enough to monitor both UPS’s and the temperature of one of the rooms.

The changes gave us peace of mind that at low cost, we could also keep historical data on it and use it as a primary jump host for techs and admins.

Setting those DHT22 was easy enough, we wrote some apps to monitor both and to calibrate them precisely at the temperature inside the room and the humidity.
This Raspberry also is in charge of reporting the temperature on other sensors, so it must be operative; we should change this single point of failure. Still, the UPS’s don’t support this operation in USB, and I’m looking into multiplexing options to detect a failure of this device and switch over.

Backups!

It’s no secret, I run multiple servers in my house, and I also have a detached garage which comes very handy for remote backups!

The backup happens over powerline (yeah yeah, I have two gig network wires running to the garage, but powerline was more comfortable!)

Again, it’s a raspberry pi 4 with 2 10 TB hard drives plugged in who receive a copy of my zpool, media, storage, LXC containers, and QEMU backups.
I guess if we need to grow, we’ll continue to use USB since we don’t write full blast to it, and it’s mainly archival… if we are going to recover more than likely, we’ll pick up the drives and plug them straight to a machine.

Exact model being used for backups.

For this to happen, we had to retrofit the garage with an exhaust fan on the roof to keep the garage fresh enough for this device’s regular operation, and I also hooked them up in the roof gable to avoid vibrations of the air compressor and what not.

With this, we have covered the main points, monitoring, and backup.

Sync between datacenters

We opted for rsync + lsyncd.

This is the example we use to sync TX01 server to LA, KS and SEC.

-- General Settings
local sourcesandtargets = require('syncfolders')

settings {
        logfile = "/var/log/lsyncd.log",
        statusFile = "/var/run/lsyncd/lsyncd.status",
        pidfile = "/var/run/lsyncd/lsyncd.pid",
        maxDelays = 4000,
        insist = true,
        maxProcesses = 20
}

--------------------------------------------------------------------------
-- LAX TRANSFER DETAILS                                                 --
--------------------------------------------------------------------------
----------------------------------------------------------------- Bind Transfers

sync {
        default.rsyncssh,
        source = "/var/lib/bind",
        targetdir = "/var/lib/bind",
        host = "lsyncd-vmin.la.takelan.com",
        excludeFrom = "/etc/lsyncd/exclude",
        exclude = { "*.log", "*.tmp", "*~", "*.swp" },
        delete = "running",
        delay = 0,
        settings { maxProcesses = 1 },
        rsync = {
                binary = "/etc/lsyncd/locking_rsync.sh",
                backup = true,
                backup_dir = "/var/lsyncdbackup/var/lib/bind/",
                archive = true,
                links = true,
                update = true,
                temp_dir = "/tmp/",
        },
}

sync {
        default.rsyncssh,
        source = "/etc/bind",
        targetdir = "/etc/bind",
        host = "lsyncd-vmin.la.takelan.com",
        excludeFrom = "/etc/lsyncd/exclude",
        exclude = { "*.log", "*.tmp", "*~", "*.swp" },
        delete = "running",
        delay = 0,
        settings { maxProcesses = 1 },
        rsync = {
                binary = "/etc/lsyncd/locking_rsync.sh",
                backup = true,
                backup_dir = "/var/lsyncdbackup/etc/bind/",
                archive = true,
                links = true,
                update = true,
                temp_dir = "/tmp/",
        },
}
--------------------------------------------------------------------------
-- LAX TRANSFER DETAILS                                                 --
--------------------------------------------------------------------------

--------------------------------------------------------------------------
-- KAN TRANSFER DETAILS                                                 --
--------------------------------------------------------------------------
----------------------------------------------------------------- Bind Transfers

sync {
        default.rsyncssh,
        source = "/var/lib/bind",
        targetdir = "/var/lib/bind",
        host = "lsyncd-vmin.ks.takelan.com",
        excludeFrom = "/etc/lsyncd/exclude",
        exclude = { "*.log", "*.tmp", "*~", "*.swp" },
        delete = "running",
        delay = 0,
        settings { maxProcesses = 1 },
        rsync = {
                binary = "/etc/lsyncd/locking_rsync.sh",
                backup = true,
                backup_dir = "/var/lsyncdbackup/var/lib/bind/",
                archive = true,
                links = true,
                update = true,
                temp_dir = "/tmp/",
        },
}

sync {
        default.rsyncssh,
        source = "/etc/bind",
        targetdir = "/etc/bind",
        host = "lsyncd-vmin.ks.takelan.com",
        excludeFrom = "/etc/lsyncd/exclude",
        exclude = { "*.log", "*.tmp", "*~", "*.swp" },
        delete = "running",
        delay = 0,
        settings { maxProcesses = 1 },
        rsync = {
                binary = "/etc/lsyncd/locking_rsync.sh",
                backup = true,
                backup_dir = "/var/lsyncdbackup/etc/bind/",
                archive = true,
                links = true,
                update = true,
                temp_dir = "/tmp/",
        },
}
--------------------------------------------------------------------------
-- KAN TRANSFER DETAILS                                                 --
--------------------------------------------------------------------------

--------------------------------------------------------------------------
-- SEC TRANSFER DETAILS                                                 --
--------------------------------------------------------------------------
----------------------------------------------------------------- Default to transfer home files (mail, websites, etc)

for _, sourcesandtargets in ipairs( sourcesandtargets )
do
        sync {
                default.rsyncssh,
                source = sourcesandtargets,
                targetdir = sourcesandtargets,
                host = "lsyncd-vmin.nj.takelan.com",
                excludeFrom = "/etc/lsyncd/exclude",
                exclude = { "*.log", "*.tmp", "*~", "*.swp" },
                settings { maxProcesses = 1 },
                delay = 300,
                delete = "running",
                rsync = {
                        binary = "/etc/lsyncd/locking_rsync.sh",
                        backup = true,
                        backup_dir = "/var/lsyncdbackup/",
                        archive = true,
                        links = true,
                        update = true,
                        append_verify = true,
                        temp_dir = "/tmp/",
                },
        }
end

----------------------------------------------------------------- Bind Transfers

sync {
        default.rsyncssh,
        source = "/var/lib/bind",
        targetdir = "/var/lib/bind",
        host = "lsyncd-vmin.nj.takelan.com",
        excludeFrom = "/etc/lsyncd/exclude",
        exclude = { "*.log", "*.tmp", "*~", "*.swp" },
        delete = "running",
        delay = 0,
        rsync = {
                binary = "/etc/lsyncd/locking_rsync.sh",
                backup = true,
                backup_dir = "/var/lsyncdbackup/var/lib/bind/",
                archive = true,
                links = true,
                update = true,
                temp_dir = "/tmp/",
        },
}

sync {
        default.rsyncssh,
        source = "/etc/bind",
        targetdir = "/etc/bind",
        host = "lsyncd-vmin.nj.takelan.com",
        excludeFrom = "/etc/lsyncd/exclude",
        exclude = { "*.log", "*.tmp", "*~", "*.swp" },
        delete = "running",
        delay = 0,
        rsync = {
                binary = "/etc/lsyncd/locking_rsync.sh",
                backup = true,
                backup_dir = "/var/lsyncdbackup/etc/bind/",
                archive = true,
                links = true,
                update = true,
                temp_dir = "/tmp/",
        },
}

----------------------------------------------------------------- Apache Transfers

sync {
        default.rsyncssh,
        source = "/etc/apache2",
        targetdir = "/etc/apache2",
        host = "lsyncd-vmin.nj.takelan.com",
        excludeFrom = "/etc/lsyncd/exclude",
        exclude = { "*.log", "*.tmp", "*~", "*.swp" },
        delete = "running",
        rsync = {
                binary = "/etc/lsyncd/locking_rsync.sh",
                backup = true,
                backup_dir = "/var/lsyncdbackup/etc/apache2/",
                archive = true,
                links = true,
                update = true,
                temp_dir = "/tmp/",
        },
}

----------------------------------------------------------------- Logrotate Transfers

sync {
        default.rsyncssh,
        source = "/etc/logrotate.d",
        targetdir = "/etc/logrotate.d",
        host = "lsyncd-vmin.nj.takelan.com",
        excludeFrom = "/etc/lsyncd/exclude",
        exclude = { "*.log", "*.tmp", "*~", "*.swp" },
        delete = "running",
        rsync = {
                binary = "/etc/lsyncd/locking_rsync.sh",
                backup = true,
                backup_dir = "/var/lsyncdbackup/etc/logrotate.d/",
                archive = true,
                links = true,
                update = true,
                temp_dir = "/tmp/",
        },
}

----------------------------------------------------------------- Cronjobs Transfers

sync {
        default.rsyncssh,
        source = "/var/spool/cron/crontabs",
        targetdir = "/var/spool/cron/crontabs",
        host = "lsyncd-vmin.nj.takelan.com",
        excludeFrom = "/etc/lsyncd/exclude",
        exclude = { "*.log", "*.tmp", "*~", "*.swp" },
        delete = "running",
        rsync = {
                binary = "/etc/lsyncd/locking_rsync.sh",
                backup = true,
                backup_dir = "/var/lsyncdbackup/var/spool/cron/crontabs/",
                archive = true,
                links = true,
                update = true,
                temp_dir = "/tmp/",
        },
}

sync {
        default.rsyncssh,
        source = "/etc/cron.d",
        targetdir = "/etc/cron.d",
        host = "lsyncd-vmin.nj.takelan.com",
        excludeFrom = "/etc/lsyncd/exclude",
        exclude = { "*.log", "*.tmp", "*~", "*.swp" },
        delete = "running",
        rsync = {
                binary = "/etc/lsyncd/locking_rsync.sh",
                backup = true,
                backup_dir = "/var/lsyncdbackup/etc/cron.d/",
                archive = true,
                links = true,
                update = true,
                temp_dir = "/tmp/",
        },
}

sync {
        default.rsyncssh,
        source = "/etc/cron.daily",
        targetdir = "/etc/cron.daily",
        host = "lsyncd-vmin.nj.takelan.com",
        excludeFrom = "/etc/lsyncd/exclude",
        exclude = { "*.log", "*.tmp", "*~", "*.swp" },
        delete = "running",
        rsync = {
                binary = "/etc/lsyncd/locking_rsync.sh",
                backup = true,
                backup_dir = "/var/lsyncdbackup/etc/cron.daily/",
                archive = true,
                links = true,
                update = true,
                temp_dir = "/tmp/",
        },
}

sync {
        default.rsyncssh,
        source = "/etc/cron.hourly",
        targetdir = "/etc/cron.hourly",
        host = "lsyncd-vmin.nj.takelan.com",
        excludeFrom = "/etc/lsyncd/exclude",
        exclude = { "*.log", "*.tmp", "*~", "*.swp" },
        delete = "running",
        rsync = {
                binary = "/etc/lsyncd/locking_rsync.sh",
                backup = true,
                backup_dir = "/var/lsyncdbackup/etc/cron.hourly/",
                archive = true,
                links = true,
                update = true,
                temp_dir = "/tmp/",
        },
}

sync {
        default.rsyncssh,
        source = "/etc/cron.monthly",
        targetdir = "/etc/cron.monthly",
        host = "lsyncd-vmin.nj.takelan.com",
        excludeFrom = "/etc/lsyncd/exclude",
        exclude = { "*.log", "*.tmp", "*~", "*.swp" },
        delete = "running",
        rsync = {
                binary = "/etc/lsyncd/locking_rsync.sh",
                backup = true,
                backup_dir = "/var/lsyncdbackup/etc/cron.monthly/",
                archive = true,
                links = true,
                update = true,
                temp_dir = "/tmp/",
        },
}

sync {
        default.rsyncssh,
        source = "/etc/cron.weekly",
        targetdir = "/etc/cron.weekly",
        host = "lsyncd-vmin.nj.takelan.com",
        excludeFrom = "/etc/lsyncd/exclude",
        exclude = { "*.log", "*.tmp", "*~", "*.swp" },
        delete = "running",
        rsync = {
                binary = "/etc/lsyncd/locking_rsync.sh",
                backup = true,
                backup_dir = "/var/lsyncdbackup/etc/cron.weekly/",
                archive = true,
                links = true,
                update = true,
                temp_dir = "/tmp/",
        },
}

------- THIS ONE MUST USE RSYNC DIRECTLY!
sync {
        default.rsyncssh,
        source = "/etc",
        targetdir = "/etc",
        host = "lsyncd-vmin.nj.takelan.com",
        excludeFrom = "/etc/lsyncd/exclude",
        exclude = { "*.log", "*.tmp", "*~", "*.swp" },
        delete = "running",
        rsync = {
                binary = "/usr/bin/rsync",
                backup = true,
                backup_dir = "/var/lsyncdbackup/etc/crontab/",
                archive = true,
                links = true,
                update = true,
                _extra = { "--include=crontab", "--exclude=*" },
                temp_dir = "/tmp/",
        },
}
------- THIS ONE MUST USE RSYNC DIRECTLY!
----------------------------------------------------------------- Webmin Transfers

sync {
        default.rsyncssh,
        source = "/etc/webmin",
        targetdir = "/etc/webmin",
        host = "lsyncd-vmin.nj.takelan.com",
        excludeFrom = "/etc/lsyncd/exclude",
        exclude = { "*.log", "*.tmp", "*~", "*.swp" },
        delete = "running",
        rsync = {
                binary = "/etc/lsyncd/locking_rsync.sh",
                backup = true,
                backup_dir = "/var/lsyncdbackup/etc/webmin",
                archive = true,
                links = true,
                update = true,
                temp_dir = "/tmp/",
        },
}
--------------------------------------------------------------------------
-- SEC TRANSFER DETAILS                                                 --
--------------------------------------------------------------------------

This image may explain the sync.

Picture from zabbix to show the interconnection between datacenters.

As you see, we keep servers in sync for some critical services to make sure if a location fails, we can still respond, namely (SEC01 & TX01).

This way, we’ll ensure changes in one server will reach another now a classic sync problem should we copy open files?… No, not really, so for that, we create a wrapper around rsync.

#!/bin/bash
# REMEMBER TO MOUNT THIS FOLDER!
# sshfs#root@vmin01.tx.takelan.com:/opt/scripts/locks /opt/scripts/locks fuse delay_connect,defaults,idmap=user,IdentityFile=/root/.ssh/id_rsa,port=22,uid=0,gid=0,allow_other 0 0

### Definitions
OPENFILES_SLEEP_TIME=5
SLEEP_TIME=25
MAX_WAIT=3600
LOCKFILE_FOLDER='/opt/scripts/locks'
LOCK_FILE="$LOCKFILE_FOLDER/rsync-lock"
FOLDER_HOST='vmin01'
HOSTNAME=`hostname`
# Rsync wrapper to avoid copying partial - opened files...
RSYNC_BINARY="/usr/bin/rsync"

echo "Running locking with $@" >> /root/params.log
source=(${@: -2})

#---------------------------------------------------------------> Functions

# Call this function to decide the final destiny of the sync.
function checkOrDie() {
    mountpoint $LOCKFILE_FOLDER > /dev/null 2>&1
    if [ $? -ne 0 ]; then
        echo "Failure... exiting due to mountpoint failure."
        exit 0
    fi
    return 0
}

# Check if the mountpoint is mounted and working...
function checkMountpoint() {
    mountpoint $LOCKFILE_FOLDER > /dev/null 2>&1

    if [ $? -eq 0 ]; then
        echo "Success: Everything is okay."
    else
        umount -f $LOCKFILE_FOLDER
        mount $LOCKFILE_FOLDER
        checkOrDie
    fi
    return 0
}

# Remove lockfile if exists
function checkMountpointWorks() {
    if [ $HOSTNAME == $FOLDER_HOST ]; then
        echo "I'm the folder host... so skipping"
    else
        checkMountpoint
    fi
    return 0
}

# Remove lockfile if exists
function removeLockfile() {
    if [ -f $LOCK_FILE ]; then
        cat $LOCK_FILE | grep $HOSTNAME > /dev/null 2>&1
        if [ $? -eq 0 ]; then
            echo "Lockfile exists and its mine, removing"
            rm $LOCK_FILE
        fi
    fi
    return 0
}

# Removes the lock even if it belongs to someone else
function forceRemoveLock(){
    if [ -f $LOCK_FILE ]; then
        LOCKCONTENT=`cat $LOCK_FILE`
        echo "Lockfile exists and belongs to $LOCKCONTENT! since im forcing... removing"
        rm $LOCK_FILE
    fi
    return 0
}

# Create the lockfile with my hostname
function createLockfile() {
    echo $HOSTNAME > $LOCK_FILE
    return 0
}

# Check if source has open files
function checkForOpenFiles() {
        echo "Checking ${source[0]} for open files..."
        lsof +D "${source[0]}" | tail -n +2 | awk '{ print $4 ";" $9 }' | grep -v 'cwd;' | grep -v 'dovecot' > /dev/null 2>&1
        hasOpenFiles=$?
        echo "Has open files, returned $hasOpenFiles"
        while [ $hasOpenFiles -eq 0 ]; do
                echo "It seems like we do have open files... blocking sync!"
                # This could take a bit...
                lsof +D "${source[0]}" | tail -n +2 | awk '{ print $4 ";" $9 }' | grep -v 'cwd;' | grep -v 'dovecot' > /dev/null 2>&1
                hasOpenFiles=$?
                echo "Has open files, returned $hasOpenFiles"
                # Avoid CPU Pinning
                sleep $OPENFILES_SLEEP_TIME
                echo "Sleeping for $OPENFILES_SLEEP_TIME"
        # Differential sleep
        sleep $[ ( $RANDOM % 5 )  + 1 ]s
        done
        echo "Done checking... calling rsync!"
    return 0
}

#---------------------------------------------------------------> Functions

#---------------------------------------------------------------> Main Program

# Make sure the mount is working properly
checkMountpointWorks

# Check if file exists, wait for global lock to go away
if [ ! -f $LOCK_FILE ]; then
    echo "Executing lock..."
    createLockfile
        echo "Checking for open files..."
    echo "Syncing!"

# Wait for the lock file to expire or until removed.
else
    NUM_SECS=$(( $(date +%s) - $(stat -c %Y $LOCK_FILE) ))
    while [ -f $LOCK_FILE ] && (( $NUM_SECS < $MAX_WAIT )); do
        sleep $SLEEP_TIME
        # Differential sleep
        sleep $[ ( $RANDOM % 10 )  + 1 ]s
        if [ -f $LOCK_FILE ]; then
            NUM_SECS=$(( $(date +%s) - $(stat -c %Y $LOCK_FILE) ))
            echo "Lockfile exists for: $NUM_SECS seconds..."
        else
            break
            fi
    done
    echo "Maximum wait reached or file removed on remote end"
    forceRemoveLock
    createLockfile
    echo "Syncing!"
fi

# Stop to check for open files
checkForOpenFiles

# Start syncing
$RSYNC_BINARY "$@"
rsync_res=$?

# Cleanup
echo "Rsync finished with status $rsync_res..."
removeLockfile
exit $rsync_res
#---------------------------------------------------------------> Main Program

This script will wrap the lsync rsync call and ensure that the origin doesn’t currently have open files, and thus we can initiate a copy.
Yes, this won’t guarantee that at that EXACT moment, there are no open files, but… remember it’s just a bandwidth saving operation, not a critical one.

I got tired of typing but this covers a large part of our monitoring and maintainance.

See ya in post 2!

ddemuro
administrator

Sr. Software Engineer with over 10 years of experience. Hobbist photographer and mechanic. Tinkering soul in an endeavor to better understand this world. Love traveling, drinking coffee, and investments.

You may also like

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: