Jun 02, 2017

Upgrading ssh to mosh with UDP hole punching to connect to a host behind NAT

There are way more tools that happily forward TCP ports than ones for UDP.

Case in point - it's usually easy to forward ssh port through a bunch of hosts and NATs, with direct and reverse ssh tunnels, ProxyCommand stuff, tools like pwnat, etc, but for mosh UDP connection it's not that trivial.

Which sucks, because its performance and input prediction stuff is exactly what's lacking in super-laggy multi-hop ssh connections forwarded back-and-forth between continents through such tunnels.

There are quite a few long-standing discussions on how to solve it properly in mosh, which didn't get any traction so far, unfortunately:

One obvious way to make it work, is to make some tunnel (like OpenVPN) from destination host (server) to a client, and use mosh over that.

But that's some extra tools and configuration to keep around on both sides, and there is much easier way that works perfectly for most cases - knowing both server and client IPs, pre-pick ports for mosh-server and mosh-client, then punch hole in the NAT for these before starting both.

How it works:

  • Pick some UDP ports that server and client will be using, e.g. 34700 for server and 34701 for client.
  • Send UDP packet from server:34700 to client:34701.
  • Start mosh-server, listening on server:34700.
  • Connect to that with mosh-client, using client:34701 as a UDP source port.

NAT on the router(s) in-between the two will see this exchange as a server establishing "udp connection" to a client, and will allow packets in both directions to flow through between these two ports.

Once mosh-client establishes the connection and keepalive packets will start bouncing there all the time, it will be up indefinitely.

mosh is generally well-suited for running manually from an existing console, so all that's needed to connect in a simple case is:

server% mosh-server new

client% MOSH_KEY=NN07GbGqQya1bqM+ZNY+eA mosh-client <server-ip> 60001

With hole-punching, two additional wrappers are required with the current mosh version (1.3.0):

  • One for mosh-server to send UDP packet to the client IP, using same port on which server will then be started: mosh-nat
  • And a wrapper for mosh-client to force its socket to bind to specified local UDP port, which was used as a dst by mosh-server wrapper above: mosh-nat-bind.c

Making connection using these two is as easy as with stock mosh above:

server% ./mosh-nat
mosh-client command:
  MNB_PORT=34730 LD_PRELOAD=./mnb.so
    MOSH_KEY=rYt2QFJapgKN5GUqKJH2NQ mosh-client <server-addr> 34730

client% MNB_PORT=34730 LD_PRELOAD=./mnb.so \
  MOSH_KEY=rYt2QFJapgKN5GUqKJH2NQ mosh-client 34730

(with server at, client at and using port 34730 on both ends in this example)

Extra notes:

  • "mnb.so" used with LD_PRELOAD is that mosh-nat-bind.c wrapper, which can be compiled using: gcc -nostartfiles -fpic -shared -ldl -D_GNU_SOURCE mosh-nat-bind.c -o mnb.so
  • Both mnb.so and mosh-nat only work with IPv4, IPv6 shouldn't use NAT anyway.
  • 34730 is the default port for -c/--client-port and -s/--server-port opts in mosh-nat script.
  • Started mosh-server waits for 60s (default) for mosh-client to connect.
  • Continous operation relies on mosh keepalive packets without interruption, as mentioned, and should break on (long enough) net hiccups, unlike direct mosh connections established to server that has no NAT in front of it (or with a dedicated port forwarding).
  • No roaming of any kind is possible here, again, unlike with original mosh - if src IP/port changes, connection will break.
  • New MOSH_KEY is generated by mosh-server on every run, and is only good for one connection, as server should rotate it after connection gets established, so is pretty safe/easy to use.
  • If client is behind NAT as well, its visible IP should be used, not internal one.
  • Should only work when NAT on either side doesn't rewrite source ports.

Last point can be a bummer with some "Carrier-grade" NATs, which do rewrite src ports out of necessity, but can be still worked around if it's only on the server side by checking src port of the hole-punching packet in tcpdump and using that instead of whatever it was supposed to be originally.

Requires just python to run wrapper script on the server and no additional configuration of any kind.

Both linked wrappers are from here:

May 14, 2017

ssh reverse tunnel ("ssh -R") caveats and tricks

"ssh -R" a is kinda obvious way to setup reverse access tunnel from something remote that one'd need to access, e.g. raspberry pi booted from supplied img file somewhere behind the router on the other side of the world.

Being part of OpenSSH, it's available on any base linux system, and trivial to automate on startup via loop in a shell script, crontab or a systemd unit, e.g.:


ExecStart=/usr/bin/ssh -oControlPath=none -oControlMaster=no \
  -oServerAliveInterval=6 -oServerAliveCountMax=10 -oConnectTimeout=180 \
  -oPasswordAuthentication=no -oNumberOfPasswordPrompts=0 \
  -NnT -R "1234:localhost:22" tun-user@tun-host


On the other side, ideally in a dedicated container or VM, there'll be sshd "tun-user" with an access like this (as a single line):

command="echo >&2 'No shell access!'; exit 1",
  no-X11-forwarding,no-agent-forwarding,no-pty ssh-ed25519 ...

Or have sshd_config section with same restrictions and only keys in authorized_keys, e.g.:

Match User tun-*
 # GatewayPorts yes
 X11Forwarding no
 AllowAgentForwarding no
 PermitTTY no
 ForceCommand echo 'no shell access!'; exit 1

And that's it, right?

No additional stuff needed, "ssh -R" will connect reliably on boot, keep restarting and reconnecting in case of any errors, even with keepalives to detect dead connections and restart asap.


There's a bunch of common pitfalls listed below.

  • Problem 1:

    When device with a tunnel suddenly dies for whatever reason - power or network issues, kernel panic, stray "kill -9" or what have you - connection on sshd machine will hang around for a while, as keepalive options are only used by the client.

    Along with (dead) connection, listening port will stay open as well, and "ssh -R" from e.g. power-cycled device will not be able to bind it, and that client won't treat it as a fatal error either!

    Result: reverse-tunnels don't survive any kind of non-clean reconnects.


    • TCPKeepAlive in sshd_config - to detect dead connections there faster, though probably still not fast enough for e.g. emergency reboot.
    • Detect and kill sshd pids without listening socket, forcing "ssh -R" to reconnect until it can actually bind one.
    • If TCPKeepAlive is not good or reliable enough, kill all sshd pids associated with listening sockets that don't produce the usual "SSH-2.0-OpenSSH_7.4" greeting line.
  • Problem 2:

    Running sshd on any reasonably modern linux, you get systemd session for each connection, and killing sshd pids as suggested above will leave logind sessions from these, potentially creating hundreds or thousands of them over time.


    • "UsePAM no" to disable pam_systemd.so along with the rest of the PAM.
    • Dedicated PAM setup for ssh tunnel logins on this dedicated system, not using pam_systemd.
    • Occasional cleanup via loginctl list-sessions/terminate-session for ones that are in "closing"/"abandoned" state.

    Killing sshd pids might be hard to avoid on fast non-clean reconnect, since reconnected "ssh -R" will hang around without a listening port forever, as mentioned.

  • Problem 3:

    If these tunnels are not configured on per-system basis, but shipped in some img file to use with multiple devices, they'll all try to bind same listening port for reverse-tunnels, so only one of these will work.


    • More complex script to generate listening port for "ssh -R" based on machine id, i.e. serial, MAC, local IP address, etc.

    • Get free port to bind to out-of-band from the server somehow.

      Can be through same ssh connection, by checking ss/netstat output or /proc/net/tcp there, if commands are allowed there (probably a bad idea for random remote devices).

  • Problem 4:

    Device identification in the same "mutliple devices" scenario.

    I.e. when someone sets up 5 RPi boards on the other end, how to tell which tunnel leads to each specific board?

    Can usually be solved by:

    • Knowing/checking quirks specific to each board, like dhcp hostname, IP address, connected hardware, stored files, power-on/off timing, etc.
    • Blinking LEDs via /sys/class/leds, ethtool --identify or GPIO pins.
    • Output on connected display - just "wall" some unique number (e.g. reverse-tunnel port) or put it to whatever graphical desktop.
  • Problem 5:

    Round-trip through some third-party VPS can add significant console lag, making it rather hard to use.

    More general problem than with just "ssh -R", but when doing e.g. "EU -> US -> RU" trip and back, console becomes quite unusable without something like mosh, which can't be used over that forwarded tcp port anyway!

    Kinda defeats the purpose of the whole thing, though laggy console (with an option to upgrade it, once connected) is still better than nothing.

Not an exhaustive or universally applicable list, of course, but hopefully shows that it's actually more hassle than "just run ssh -R on boot" to have something robust here.

So choice of ubiquitous / out-of-the-box "ssh -R" over installing some dedicated tunneling thing like OpenVPN is not as clear-cut in favor of the former as it would seem, taking all such quirks (handled well by dedicated tunneling apps) into consideration.

As I've bumped into all of these by now, addressed them by:

  • ssh-tunnels-cleanup script to (optionally) do three things, in order:

    • Find/kill sshd pids without associated listening socket ("ssh -R" that re-connected quickly and couldn't bind one).
    • Probe all sshd listening sockets with ncat (nc that comes with nmap) and make sure there's an "SSH-2.0-..." banner there, otherwise kill.
    • Cleanup all dead loginctl sessions, if any.

    Only affects/checks sshd pids for specific user prefix (e.g. "tun-"), to avoid touching anything but dedicated tunnels.

  • ssh-reverse-mux-server / ssh-reverse-mux-client scripts.

    For listening port negotiation with ssh server, using bunch of (authenticated) UDP packets.

    Essentially a wrapper for "ssh -R" on the client, to also pass all the required options, replacing ExecStart= line in above systemd example with e.g.:

    ExecStart=/usr/local/bin/ssh-reverse-mux-client \
      --mux-port=2200 --ident-rpi -s pkt-mac-key.aGPwhpya tun-user@tun-host

    ssh-reverse-mux-server on the other side will keep .db file of --ident strings (--ident-rpi uses hash of RPi board serial from /proc/cpuinfo) and allocate persistent port number (from specified range) to each one, which client will use with actual ssh command.

    Simple symmetric key (not very secret) is used to put MAC into packets and ignore any noise traffic on either side that way.


  • Hook in ssh-reverse-mux-client above to blink bits of allocated port on some available LED.

    Sample script to do the morse-code-like bit-blinking:

    And additional hook option for command above:

    ... -c 'sudo -n led-blink-arg -f -l led0 -n 2/4-2200'

    (with last arg-num / bits - decrement spec there to blink only last 4 bits of the second passed argument, which is listening port, e.g. "1011" for "2213")

Given how much OpenSSH does already, having all this functionality there (or even some hooks for that) would probably be too much to ask.

...at least until it gets rewritten as some systemd-accessd component :P

Jan 29, 2017

Proxying ssh user connections to gitolite host transparently

Recently bumped into apparently not well-supported scenario of accessing gitolite instance transparently on a host that is only accessible through some other gateway (often called "bastion" in ssh context) host.

Something like this:

|               |   git@myhost.net:myrepo
|  dev-machine  ---------------------------+
|               |                          |
+---------------+                          |
      git@gitolite:myrepo     |                   |
  +----------------------------  myhost.net (gw)  |
  |                           |                   |
+-v-------------------+       +-------------------+
|                     |
|    gitolite (gl)    |
|  host/container/vm  |
|                     |

Here gitolite instance might be running on a separate machine, or on the same "myhost.net", but inside a container or vm with separate sshd daemon.

From any dev-machine you want to simply use git@myhost.net:myrepo to access repositories, but naturally that won't work because in normal configuration you'd hit sshd on gw host (myhost.net) and not on gl host.

There are quite a few common options to work around this:

  • Use separate public host/IP for gitolite, e.g. git.myhost.net (!= myhost.net).

  • TCP port forwarding or similar tricks.

    E.g. simply forward ssh port connections in a "gw:22 -> gl:22" fashion, and have gw-specific sshd listen on some other port, if necessary.

    This can be fairly easy to use with something like this for odd-port sshd in ~/.ssh/config:

    Host myhost.net
      Port 1234
    Host git.myhost.net
      Port 1235

    Can also be configured in git via remote urls like ssh://git@myhost.net:1235/myrepo.

  • Use ssh port forwarding to essentially do same thing as above, but with resulting git port accessible on localhost.

  • Configure ssh to use ProxyCommand, which will login to gw host and setup forwarding through it.

All of these, while suitable for some scenarios, are still nowhere near what I'd call "transparent", and require some additional configuration for each git client beyond just git add remote origin git@myhost.net:myrepo.

One advantage of such lower-level forwarding is that ssh authentication to gitolite is only handled on gitolite host, gw host has no clue about that.

If dropping this is not a big deal (e.g. because gw host has root access to everything in gl container anyway), there is a rather easy way to forward only git@myhost.net connections from gw to gl host, authenticating them only on gw instead, described below.

Gitolite works by building ~/.ssh/authorized_keys file with essentially command="gitolite-shell gl-key-id" <gl-key> for each public key pushed to gitolite-admin repository.

Hence to proxy connections from gw, similar key-list should be available there, with key-commands ssh'ing into gitolite user/host and running above commands there (with original git commands also passed through SSH_ORIGINAL_COMMAND env-var).

To keep such list up-to-date, post-update trigger/hook for gitolite-admin repo is needed, which can use same git@gw login (with special "gl auth admin" key) to update key-list on gw host.

Steps to implement/deploy whole thing:

  • useradd -m git on gw and run ssh-keygen -t ed25519 on both gw and gl hosts for git/gitolite user.

  • Setup all connections for git@gw to be processed via single "gitolite proxy" command, disallowing anything else, exactly like gitolite does for its users on gl host.

    gitolite-proxy.py script (python3) that I came up with for this purpose can be found here: https://github.com/mk-fg/gitolite-ssh-user-proxy/

    It's rather simple and does two things:

    • When run with --auth-update argument, receives gitolite authorized_keys list, and builds local ~/.ssh/authorized_keys from it and authorized_keys.base file.

    • Similar to gitolite-shell, when run as gitolite-proxy key-id, ssh'es into gl host, passing key-id and git command to it.

      This is done in a straightforward os.execlp('ssh', 'ssh', '-qT', ...) manner, no extra processing or any error-prone stuff like that.

    When installing it (to e.g. /usr/local/bin/gitolite-proxy as used below), be sure to set/update "gl_host_login = ..." line at the top there.

    For --auth-update, ~/.ssh/authorized_keys.base (note .base) file on gw should have this single line (split over two lines for readability, must be all on one line for ssh!):

    command="/usr/local/bin/gitolite-proxy --auth-update",no-port-forwarding
      ,no-X11-forwarding,no-agent-forwarding,no-pty ssh-ed25519 AAA...4u3FI git@gl

    Here ssh-ed25519 AAA...4u3FI git@gl is the key from ~git/.ssh/id_ed25519.pub on gitolite host.

    Also run:

    # install -m0600 -o git -g git ~git/.ssh/authorized_keys{.base,}
    # install -m0600 -o git -g git ~git/.ssh/authorized_keys{.base,.old}
    To have initial auth-file, not yet populated with gitolite-specific keys/commands.
    Note that only these two files need to be writable for git user on gw host.
  • From gitolite (gl) host and user, run: ssh -qT git@gw < ~/.ssh/authorized_keys

    This is to test gitolite-proxy setup above - should populate ~git/.ssh/authorized_keys on gw host and print back gw host key and proxy script to run as command="..." for it (ignore them, will be installed by trigger).

  • Add trigger that'd run after gitolite-admin repository updates on gl host.

    • On gl host, put this to ~git/.gitolite.rc right before ENABLE line:

      LOCAL_CODE => "$rc{GL_ADMIN_BASE}/local",
      POST_COMPILE => ['push-authkeys'],
    • Commit/push push-authkeys trigger script (also from gitolite-ssh-user-proxy repo) to gitolite-admin repo as local/triggers/push-authkeys, updating gw_proxy_login line in there.

    gitolite docs on adding triggers: http://gitolite.com/gitolite/gitolite.html#triggers

Once proxy-command is in place on gw and gitolite-admin hook runs at least once (to setup gw->gl access and proxy-command), git@gw (git@myhost.net) ssh login spec can be used in exactly same way as git@gl.

That is, fully transparent access to gitolite on a different host through that one user, while otherwise allowing to use sshd on a gw host, without any forwarding tricks necessary for git clients.

Whole project, with maybe a bit more refined process description and/or whatever fixes can be found on github here: https://github.com/mk-fg/gitolite-ssh-user-proxy/

Huge thanks to sitaramc (gitolite author) for suggesting how to best setup gitolite triggers for this purpose on the ML.

Oct 22, 2015

Arch Linux chroot and sshd from boot on rooted Android with SuperSU

Got myself Android device only recently, and first thing I wanted to do, of course, was to ssh into it.

But quick look at the current F-Droid and Play apps shows that they either based on a quite limited dropbear (though perfectly fine for one-off shell access) or "based on openssh code", where infamous Debian OpenSSL code patch comes to mind, not to mention that most look like ad-ridden proprietary piece of crap.

Plus I'd want a proper package manager, shell (zsh), tools and other stuff in there, not just some baseline bash and busybox, and chroot with regular linux distro is a way to get all that plus standard OpenSSH daemon.

Under the hood, modern Android (5.11 in my case, with CM 12) phone is not much more than Java VM running on top of SELinux-enabled (which I opted to keep for Android stuff) Linux kernel, on top of some multi-core ARMv7 CPU (quadcore in my case, rather identical to one in RPi2).

Steps I took to have sshd running on boot with such device:

  • Flush stock firmware (usually loaded with adware and crapware) and install some pre-rooted firmware image from xda-developers.com or 4pda.ru.

    My phone is a second-hand Samsung Galaxy S3 Neo Duos (GT-I9300I), and I picked resurrection remix CM-based ROM for it, built for this phone from t3123799, with Open GApps arm 5.1 Pico (minimal set of google stuff).

    That ROM (like most of them, it appears) comes with bash, busybox and - most importantly - SuperSU, which runs its unrestricted init, and is crucial to getting chroot-script to start on boot. All three of these are needed.

    Under Windows, there's Odin suite for flashing zip with all the goodies to USB-connected phone booted into "download mode".

    On Linux, there's Heimdall (don't forget to install adb with that, e.g. via pacman -S android-tools android-udev on Arch), which can dd img files for diff phone partitions, but doesn't seem to support "one zip with everything" format that most firmwares come in.

    Instead of figuring out which stuff from zip to upload where with Heimdall, I'd suggest grabbing a must-have TWRP recovery.img (small os that boots in "recovery mode", grabbed one for my phone from t2906840), flashing it with e.g. heimdall flash --RECOVERY recovery.twrp- and then booting into it to install whatever main OS and apps from zip's on microSD card.

    TWRP (or similar CWM one) is really useful for a lot of OS-management stuff like app-packs installation (e.g. Open GApps) or removal, updates, backups, etc, so I'd definitely suggest installing one of these as recovery.img as the first thing on any Android device.

  • Get or make ARM chroot tarball.

    I did this by bootstrapping a stripped-down ARMv7 Arch Linux ARM chroot on a Raspberry Pi 2 that I have around:

    # pacman -Sg base
     ### Look through the list of packages,
     ###  drop all the stuff that won't ever be useful in chroot on the phone,
     ###  e.g. pciutils, usbutils, mdadm, lvm2, reiserfsprogs, xfsprogs, jfsutils, etc
     ### pacstrap chroot with whatever is left
    # mkdir droid-chroot
    # pacstrap -i -d droid-chroot bash bzip2 coreutils diffutils file filesystem \
        findutils gawk gcc-libs gettext glibc grep gzip iproute2 iputils less \
        licenses logrotate man-db man-pages nano pacman perl procps-ng psmisc \
        sed shadow sysfsutils tar texinfo util-linux which
     ### Install whatever was forgotten in pacstrap
    # pacman -r droid-chroot -S --needed atop busybox colordiff dash fping git \
        ipset iptables lz4 openssh patch pv rsync screen xz zsh fcron \
        python2 python2-pip python2-setuptools python2-virtualenv
     ### Make a tar.gz out of it all
    # rm -f droid-chroot/var/cache/pacman/pkg/*
    # tar -czf droid-chroot.tar.gz droid-chroot

    Same can be obviously done with debootstrap or whatever other distro-of-choice bootstrapping tool, but likely has to be done on a compatible architecture - something that can run ARMv7 binaries, like RPi2 in my case (though VM should do too) - to run whatever package hooks upon installs.

    Easier way (that won't require having spare ARM box or vm) would be to take pre-made image for any ARMv7 platform, from http://os.archlinuxarm.org/os/ list or such, but unless there's something very generic (which e.g. "ArchLinuxARM-armv7-latest.tar.gz" seem to be), there's likely be some platform-specific cruft like kernel, modules, firmware blobs, SoC-specific tools and such... nothing that can't be removed at any later point with package manager or simple "rm", of course.

    While architecture in my case is ARMv7, which is quite common nowadays (2015), other devices can have Intel SoCs with x86 or newer 64-bit ARMv8 CPUs. For x86, bootstrapping can obviously be done on pretty much any desktop/laptop machine, if needed.

  • Get root shell on device and unpack chroot tarball there.

    adb shell (pacman -S android-tools android-udev or other distro equivalent, if missing) should get "system" shell on a USB-connected phone.

    (btw, "adb" access have to be enabled on the phone via some common "tap 7 times on OS version in Settings-About then go to Developer-options" dance, if not already)

    With SuperSU (or similar "su" package) installed, next step would be running "su" there to get unrestricted root, which should work.

    /system and /data should be on ext4 (check mount | grep ext4 for proper list), f2fs or such "proper" filesystem, which is important to have for all the unix permission bits and uid/gid info which e.g. FAT can't handle (loopback img with ext4 can be created in that case, but shouldn't be necessary in case of internal flash storage, which should have proper fs'es already).

    /system is a bad place for anything custom, as it will be completely flushed on most main-OS changes (when updating ROM from zip with TWRP, for instance), and is mounted with "ro" on boot anyway.

    Any subdir in /data seem to work fine, though one obvious pre-existing place - /data/local - is probably a bad idea, as it is used by some Android dev tools already.

    With busybox and proper bash on the phone, unpacking tarball from e.g. microSD card should be easy:

    # mkdir -m700 /data/chroots
    # cd /data/chroots
    # tar -xpf /mnt/sdcard/droid-chroot.tar.gz

    It should already work, too, so...

    # cd droid-chroot
    # mount -o bind /dev dev \
        && mount -o bind /dev/pts dev/pts \
        && mount -t proc proc proc \
        && mount -t sysfs sysfs sys
    # env -i TERM=$TERM SHELL=/bin/zsh HOME=/root $(which chroot) . /bin/zsh

    ...should produce a proper shell in a proper OS, yay! \o/

    Furthermore, to be able to connect there directly, without adb or USB cable, env -i $(which chroot) . /bin/sshd should work too.

    For sshd in particular, one useful thing to do here is:

    # $(which chroot) . /bin/ssh-keygen -A

    ...to populate /etc/ssh with keys, which are required to start sshd.

  • Setup init script to run sshd or whatever init-stuff from that chroot on boot.

    Main trick here is to run it with unrestricted SELinux context (unless SELinux is disabled entirely, I guess).

    This makes /system/etc/init.d using "sysinit_exec" and /data/local/userinit.sh with "userinit_exec" unsuitable for the task, only something like "init" ("u:r:init:s0") will work.

    SELinux on Android is documented in Android docs, and everything about SELinux in general applies there, of course, but some su-related roles like above "userinit_exec" actually come with CyanogenMod or whatever similar hacks on top of the base Android OS.

    Most relevant info on this stuff comes with SuperSU though (or rather libsuperuser) - http://su.chainfire.eu/

    That doc has info on how to patch policies, to e.g. transition to unrestricted role for chroot init, setup sub-roles for stuff in there (to also use SELinux in a chroot), which contexts are used where, and - most useful in this case - which custom "init" dirs are used at which stages of the boot process.

    Among other useful stuff, it specifies/describes /system/su.d init-dir, from which SuperSU runs scripts/binaries with unrestricted "init" context, and very early in the process too, hence it is most suitable for starting chroot from.

    So, again, from root (after "su") shell:

    # mount -o remount,rw /system
    # mkdir -m700 /system/su.d
    # cat >/system/su.d/chroots.sh <<EOF
    exec /data/local/chroots.bash
    # chmod 700 /system/su.d/chroots.sh
    # cat >/data/local/chroots.bash <<EOF
    export PATH=/sbin:/vendor/bin:/system/sbin:/system/bin:/system/xbin
    [[ $(du -m "$log" | awk '{print $1}') -gt 20 ]] && mv "$log"{,.old}
    exec >>"$log" 2>&1
    echo " --- Started $(TZ=UTC date) --- "
    log -p i -t chroots "Starting chroot: droid-chroot"
    /data/chroots/droid-chroot.sh &
    log -p i -t chroots "Finished chroots init"
    echo " --- Finished $(TZ=UTC date) --- "
    # chmod 700 /data/local/chroots.bash
    # cd /data/chroots
    # mkdir -p droid-chroot/mnt/storage
    # ln -s droid-chroot/init.sh droid-chroot.sh
    # cat >droid-chroot/init.sh <<EOF
    set -e -o pipefail
    usage() {
      bin=$(basename $0)
      echo >&2 "Usage: $bin [ stop | chroot ]"
      exit ${1:-0}
    [[ "$#" -gt 1 || "$1" = -h || "$1" = --help ]] && usage
    cd /data/chroots/droid-chroot
    sshd_pid=$(cat run/sshd.pid 2>/dev/null ||:)
    mountpoint -q dev || mount -o bind /dev dev
    mountpoint -q dev/pts || mount -o bind /dev/pts dev/pts
    mountpoint -q proc || mount -t proc proc proc
    mountpoint -q sys || mount -t sysfs sysfs sys
    mountpoint -q tmp || mount -o nosuid,nodev,size=20%,mode=1777 -t tmpfs tmpfs tmp
    mountpoint -q run || mount -o nosuid,nodev,size=20% -t tmpfs tmpfs run
    mountpoint -q mnt/storage || mount -o bind /data/media/0 mnt/storage
    case "$1" in
        [[ -z "$sshd_pid" ]] || kill "$sshd_pid"
        exit 0 ;;
        exec env -i\
          TERM="$TERM" SHELL=/bin/zsh HOME=/root\
          /system/xbin/chroot . /bin/zsh ;;
      *) [[ -z "$1" ]] || usage 1 ;;
    [[ -n "$sshd_pid" ]]\
      && kill -0 "$sshd_pid" 2>/dev/null\
      || exec env -i /system/xbin/chroot . /bin/sshd
    # chmod 700 droid-chroot/init.sh

    To unpack all that wall-of-shell a bit:

    • Very simple /system/su.d/chroots.sh is created, so that it can easily be replaced if/when /system gets flushed by some update, and also so that it won't need to be edited (needing rw remount) ever.

    • /data/local/chroots.bash is an actual init script for whatever chroots, with Android logging stuff (accessible via e.g. adb logcat, useful to check if script was ever started) and simplier more reliable (and rotated) log in /data/local/chroots.log.

    • /data/chroots/droid-chroot.sh is a symlink to init script in /data/chroots/droid-chroot/init.sh, so that this script can be easily edited from inside of the chroot itself.

    • /data/chroots/droid-chroot/init.sh is the script that mounts all the stuff needed for the chroot and starts sshd there.

      Can also be run from adb root shell to do the same thing, with "stop" arg to kill that sshd, or with "chroot" arg to do all the mounts and chroot into the thing from whatever current sh.

      Basically everything to with that chroot from now on can/should be done through that script.

    "cat" commands can obviously be replaced with "nano" and copy-paste there, or copying same (or similar) scripts from card or whatever other paths (to avoid pasting them into shell, which might be less convenient than Ctrl+S in $EDITOR).

  • Reboot, test sshd, should work.

Anything other than sshd can also be added to that init script, to make some full-featured dns + web + mail + torrents server setup start in chroot.

With more than a few daemons, it'd probably be a good idea to start just one "daemon babysitter" app from there, such as runit, daemontools or whatever. Maybe even systemd will work, though unlikely, given how it needs udev, lots of kernel features and apis initialized in its own way, and such.

Obvious caveat for running a full-fledged linux separately from main OS is that it should probably be managed through local webui's or from some local terminal app, and won't care much about power management and playing nice with Android stuff.

Android shouldn't play nice with such parasite OS either, cutting network or suspending device when it feels convenient, without any regard for conventional apps running there, though can be easily configured not to.

As I'm unlikely to want this device as a phone ever (who needs these, anyway?), turning it into something more like wireless RPi2 with a connected management terminal (represented by Android userspace) sounds like the only good use for it so far.

Update 2016-05-16: Added note on ssh-keygen and rm for pacman package cache after pacstrap.

Aug 08, 2013

Encrypted root on a remote vds

Most advice wrt encryption on a remote hosts (VPS, VDS) don't seem to involve full-disk encryption as such, but is rather limited to encrypting /var and /home, so that machine will boot from non-crypted / and you'll be able to ssh to it, decrypt these parts manually, then start services that use data there.

That seem to be in contrast with what's generally used on local machines - make LUKS container right on top of physical disk device, except for /boot (if it's not on USB key) and don't let that encryption layer bother you anymore.

Two policies seem to differ in that former one is opt-in - you have to actively think which data to put onto encrypted part (e.g. /etc/ssl has private keys? move to /var, shred from /etc), while the latter is opt-out - everything is encrypted, period.

So, in spirit of that opt-out way, I thought it'd be a drag to go double-think wrt which data should be stored where and it'd be better to just go ahead and put everything possible to encrypted container for a remote host as well, leaving only /boot with kernel and initramfs in the clear.

Naturally, to enter encryption password and not have it stored alongside LUKS header, some remote login from the network is in order, and sshd seem to be the secure and easy way to go about it.
Initramfs in question should then also be able to setup network, which luckily dracut can. Openssh sshd is a bit too heavy for it though, but there are much lighter sshd's like dropbear.

Searching around for someone to tie the two things up, found a bit incomplete and non-packaged solutions like this RH enhancement proposal and a set of hacky scripts and instructions in dracut-crypt-wait repo on bitbucket.

Approach outlined in RH bugzilla is to make dracut "crypt" module to operate normally and let cryptsetup query for password in linux console, but also start sshd in the background, where one can login and use a simple tool to echo password to that console (without having it echoed).
dracut-crypt-wait does a clever hack of removing "crypt" module hook instead and basically creates "rescure" console on sshd, where user have to manually do all the decryption necessary and then signal initramfs to proceed with the boot.

I thought first way was rather more elegant and clever, allowing dracut to figure out which device to decrypt and start cryptsetup with all the necessary, configured and documented parameters, also still allowing to type passphrase into console - best of both worlds, so went along with that one, creating dracut-crypt-sshd project.

As README there explains, using it is as easy as adding it into dracut.conf (or passing to dracut on command line) and adding networking to grub.cfg, e.g.:

menuentry "My Linux" {
        linux /vmlinuz ro root=LABEL=root
                rd.luks.uuid=7a476ea0 rd.lvm.vg=lvmcrypt rd.neednet=1
        initrd /dracut.xz

("ip=dhcp" might be simplier way to go, but doesn't yield default route in my case)

And there, you'll have sshd on that IP port 2222 (configurable), with pre-generated (during dracut build) keys, which might be a good idea to put into "known_hosts" for that ip/port somewhere. "authorized_keys" is taken from /root/.ssh by default, but also configurable via dracut.conf, if necessary.

Apart from sshd, that module includes two tools for interaction with console - console_peek and console_auth (derived from auth.c in the bugzilla link above).

Logging in to that sshd then yields sequence like this:

[214] Aug 08 13:29:54 lastlog_perform_login: Couldn't stat /var/log/lastlog: No such file or directory
[214] Aug 08 13:29:54 lastlog_openseek: /var/log/lastlog is not a file or directory!

# console_peek
[    1.711778] Write protecting the kernel text: 4208k
[    1.711875] Write protecting the kernel read-only data: 1116k
[    1.735488] dracut: dracut-031
[    1.756132] systemd-udevd[137]: starting version 206
[    1.760022] tsc: Refined TSC clocksource calibration: 2199.749 MHz
[    1.760109] Switching to clocksource tsc
[    1.809905] systemd-udevd[145]: renamed network interface eth0 to enp0s9
[    1.974202] 8139too 0000:00:09.0 enp0s9: link up, 100Mbps, full-duplex, lpa 0x45E1
[    1.983151] dracut: sshd port: 2222
[    1.983254] dracut: sshd key fingerprint: 2048 0e:14:...:36:f9  root@congo (RSA)
[    1.983392] dracut: sshd key bubblebabble: 2048 xikak-...-poxix  root@congo (RSA)
[185] Aug 08 13:29:29 Failed reading '-', disabling DSS
[186] Aug 08 13:29:29 Running in background
[    2.093869] dracut: luksOpen /dev/sda3 luks-...
Enter passphrase for /dev/sda3:
[213] Aug 08 13:29:50 Child connection from
[213] Aug 08 13:29:54 Pubkey auth succeeded for 'root' with key md5 0b:97:bb:...

# console_auth

First command - "console_peek" - allows to see which password is requested (if any) and second one allows to login.
Note that fingerprints of host keys are also echoed to console on sshd start, in case one has access to console but still needs sshd later.
I quickly found out that such initramfs with sshd is also a great and robust rescue tool, especially if "debug" and/or "rescue" dracut modules are enabled.
And as it includes fairly comprehensive network-setup options, might be a good way to boot multiple different OS'es with same (machine-specific) network parameters,

Probably obligatory disclaimer for such post should mention that crypto above won't save you from malicious hoster or whatever three-letter-agency that will coerce it into cooperation, should it take interest in your poor machine - it'll just extract keys from RAM image (especially if it's a virtualized VPS) or backdoor kernel/initramfs and force a reboot.

Threat model here is more trivial - be able to turn off and decomission host without fear of disks/images then falling into some other party's hands, which might also happen if hoster eventually goes bust or sells/scraps disks due to age or bad blocks.

Also, even minor inconvenience like forcing to extract keys like outlined above might be helpful in case of quite well-known "we came fishing to a datacenter, shut everything down, give us all the hardware in these racks" tactic employed by some agencies.

Absolute security is a myth, but these measures are fairly trivial and practical to be employed casually to cut off at least some number of basic threats.

So, yay for dracut, the amazingly cool and hackable initramfs project, which made it that easy.

Code link: https://github.com/mk-fg/dracut-crypt-sshd

Jun 12, 2011

Using csync2 for security-sensitive paths

Usually I was using fabric to clone similar stuff to many machines, but since I've been deploying csync2 everywhere to sync some web templates and I'm not the only one introducing changes, it ocurred to me that it'd be great to use it for scripts as well.
Problem I see there is security - most scripts I need to sync are cronjobs executed as root, so updating some script one one (compromised) machine with "rm -Rf /*" and running csync2 to push this change to other machines will cause a lot of trouble.

So I came up with simple way to provide one-time keys to csync2 hosts, which will be valid only when I want them to.

Idea is to create FIFO socket in place of a key on remote hosts, then just pipe a key into each socket while script is running on my dev machine. Simpliest form of such "pipe" I could come up with is an "ssh host 'cat >remote.key.fifo'", no fancy sockets, queues or protocols.
That way, even if one host is compromised changes can't be propagnated to other hosts without access to fifo sockets there and knowing the right key. Plus running sync for that "privileged" group accidentally will just result in a hang 'till the script will push data to fifo socket - nothing will break down or crash horribly, just wait.
Key can be spoofed of course, and sync can be timed to the moment the keys are available, so the method is far from perfect, but it's insanely fast and convenient.
Implementation is fairly simple twisted eventloop, spawning ssh processes (guess twisted.conch or stuff like paramiko can be used for ssh implementation there, but neither performance nor flexibility is an issue with ssh binary).
Script also (by default) figures out the hosts to connect to from the provided group name(s) and the local copy of csync2 configuration file, so I don't have to specify keep separate list of these or specify them each time.
As always, twisted makes it insanely simple to write such IO-parallel loop.

csync2 can be configured like this:

group sbin_sync {
    host host1 host2;
    key /var/lib/csync2/privileged.key;
    include /usr/local/sbin/*.sh

And then I just run it with something like "./csync2_unlocker.py sbin_sync" when I need to replicate updates between hosts.


Feb 17, 2010

Listening to music over the 'net with authentication and cache

Having seen people really obsessed with the music, I don't consider myself to be much into it, yet I've managed to accumulate more than 70G of it, and counting. That's probably because I don't like to listen to something on a loop over and over, so, naturally, it's quite a bother to either keep the collection on every machine I use or copy the parts of it just to listen and replace.

Ideal solution for me is to mount whole hoard right from home server, and mounting it over the internet means that I need some kind of authentication.
Since I also use it at work, encryption is also nice, so I can always pass this bandwith as something work-friendly and really necessary, should it arouse any questions.
And while bandwith at work is pretty much unlimited, it's still controlled, so I wouldn't like to overuse it too much, and listening to oggs, mp3 and flacs for the whole work-day can generate traffic of 500-800 megs, and that's quite excessive to that end, in my own estimation.

The easiest way for me was trusty sshfs - it's got the best authentication, nice performance and encryption off-the-shelf with just one simple command. Problem here is the last aforementioned point - sshfs would generate as much bandwith as possible, caching content only temporarily in volatile RAM.

Persistent caching seem to be quite easy to implement in userspace with either fuse layer over network filesystem or something even simplier (and more hacky), like aufs and inotify, catching IN_OPEN events and pulling files in question to intermediate layer of fs-union.

Another thing I've considered was fs-cache in-kernel mechanism, which appeared in the main tree since around 2.6.30, but the bad thing about was that while being quite efficient, it only worked for NFS or AFS.
Second was clearly excessive for my purposes, and the first one I've come to hate for being extremely ureliable and limiting. In fact, NFS never gave me anything but trouble in the past, yet since I haven't found any decent implementations of the above ideas, I'd decided to give it (plus fs-cache) a try.
Setting up nfs server is no harder than sharing dir on windows - just write a line to /etc/exports and fire up nfs initscript. Since nfs4 seems superior than nfs in every way, I've used that version.
Trickier part is authentication. With nfs' "accept-all" auth model and kerberos being out of question, it has to be implemented on some transport layer in the middle.
Luckily, ssh is always there to provide a secure authenticated channel and nfs actually supports tcp these days. So the idea is to start nfs on localhost on server and use ssh tunnel to connecto to it from the client.

Setting up tunnel was quite straightforward, although I've put together a simple script to avoid re-typing the whole thing and to make sure there aren't any dead ssh processes laying around.

PID="/tmp/.$(basename $0).$(echo "$1.$2" | md5sum | cut -b-5)"
touch "$PID"
flock -n 3 3<"$PID" || exit 0
exec 3>"$PID"
( flock -n 3 || exit 0
  exec ssh\
   -qyTnN $3 -L "$1" "$2" ) &
echo $! >&3
exit 0

That way, ssh process is daemonized right away. Simple locking is also implemented, based on tunnel and ssh destination, so it might be put as a cronjob (just like "ssh_tunnel 2049: user@remote") to keep the link alive.

Then I've put a line like this to /etc/exports:

...and tried to "mount -t nfs4 localhost:/ /mnt/music" on the remote.
Guess what? "No such file or dir" error ;(
Ok, nfs3-way to "mount -t nfs4 localhost:/var/export/leech /mnt/music" doesn't work as well. No indication of why it is whatsoever.

Then it gets even better - "mount -t nfs localhost:/var/export/leech /mnt/music" actually works (locally, since nfs3 defaults to udp).
Completely useless errors and nothing on the issue in manpages was quite weird, but prehaps I haven't looked at it well enough.

Gotcha was in the fact that it wasn't allowed to mount nfs4 root, so tweaking exports file like this...


...and "mount -t nfs4 localhost:/music /mnt/music" actually solved the issue.

Why can't I use one-line exports and why the fuck it's not on the first (or any!) line of manpage escapes me completely, but at least it works now even from remote. Hallelujah.

Next thing is the cache layer. Luckily, it doesn't look as crappy as nfs and tying them together can be done with a single mount parameter. One extra thing needed, aside from the kernel part, here, is cachefilesd.
Strange thing it's not in gentoo portage yet (since it's kinda necessary for kernel mechanism and quite aged already), but there's an ebuild in b.g.o (now mirrored to my overlay, as well).
Setting it up is even simplier.
Config is well-documented and consists of five lines only, the only relevant of which is the path to fs-backend, oh, and the last one seem to need user_xattr support enabled.

fstab lines for me were these:

/dev/dump/nfscache /var/fscache ext4 user_xattr
localhost:/music /mnt/music nfs4 ro,nodev,noexec,intr,noauto,user,fsc

First two days got me 800+ megs in cache and from there it was even better bandwidth-wise, so, all-in-all, this nfs circus was worth it.

Another upside of nfs was that I could easily share it with workmates just by binding ssh tunnel endpoint to a non-local interface - all that's needed from them is to issue the mount command, although I didn't came to like to implementation any more than I did before.
Wonder if it's just me, but whatever...

Feb 14, 2010

My "simple" (ok, not quite) backup system - implementation (backup host)

According to the general plan, with backed-up side scripts in place, some backup-grab mechanism is needed on the backup host.

So far, sshd provides secure channel and authentication, launching control script as a shell, backed-up side script provides hostname:port for one-shot ssh link on the commandline, with private key to this link and backup-exclusion paths list piped in.

All that's left to do on this side is to read the data from a pipe and start rsync over this link, with a few preceding checks, like a free space check, so backup process won't be strangled by its abscence and as many as possible backups will be preserved for as long as possible, removing them right before receiving new ones.

Historically, this script also works with any specified host, interactively logging into it as root for rsync operation, so there's bit of interactive voodoo involved, which isn't relevant for the remotely-initiated backup case.

Ssh parameters for rsync transport are passed to rsync itself, since it starts ssh process, via "--rsh" option. Inside the script,these are accumulated in bak_src_ext variable

Note that in case then this script is started as a shell, user is not a root, yet it needs to store filesystem metadata like uids, gids, acls, etc.
To that end, rsync can employ user_xattr's, although it looks extremely unportable and inproper to me, since nothing but rsync will translate them back to original metadata, so rsync need to be able to change fs metadata directly, and to that end there's posix capabilities.

I use custom module for capability manipulation, as well as other convenience modules here and there, their purpose is quite obvious and replacing these with stdlib functions should be pretty straightforward, if necessary.

Activating the inherited capabilities:

bak_user = os.getuid()
if bak_user:
    from fgc.caps import Caps
    import pwd
    os.putenv('HOME', pwd.getpwuid(bak_user).pw_dir)

But first things first - there's data waiting on commandline and stdin. Getting the hostname and port...

bak_src = argz[0]
try: bak_src, bak_src_ext = bak_src.split(':')
except: bak_src_ext = tuple()
else: bak_src_ext = '-p', bak_src_ext

...and the key / exclusions:

bak_key = bak_sub('.key_{0}'.format(bak_host))
password, reply = it.imap(
    op.methodcaller('strip', spaces), sys.stdin.read().split('\n\n\n', 1) )
open(bak_key, 'w').write(password)
sh.chmod(bak_key, 0400)
bak_src_ext += '-i', os.path.realpath(bak_key)

Then, basic rsync invocation options can be constructed:

sync_optz = [ '-HaAXz',
    '--rsync-path=ionice -c3 rsync',
    '--rsh=ssh {0}'.format(' '.join(bak_src_ext)) ]

Excluded paths list here is written to a local file, to keep track which paths were excluded in each backup. "--super" option is actually necessary if local user is not root, rsync drops all the metadata otherwise. "HaAX" is like "preserve all" flags - Hardlinks, ownership/modes/times ("a" flag), Acl's, eXtended attrs. "--rsh" here is the ssh command, with parameters, determined above.

Aside from that, there's also need to specify hardlink destination path, which should be a previous backup, and that traditionnaly is the domain of ugly perlisms - regexps.

bakz_re = re.compile(r'^([^.].*)\.\d+-\d+-\d+.\d+$') # host.YYYY-mm-dd.unix_time
bakz = list( bak for bak in os.listdir(bak_root)
 if bakz_re.match(bak) ) # all backups
bakz_host = sorted(it.ifilter(op.methodcaller(
    'startswith', bak_host ), bakz), reverse=True)

So, the final sync options come to these:

src = '{0}:/'.format(src)
sync_optz = list(dta.chain( sync_optz, '--link-dest={0}'\
        .format(os.path.realpath(bakz_host[0])), src, bak_path ))\
    if bakz_host else list(dta.chain(sync_optz, src, bak_path))

The only interlude is to cleanup backup partition if it gets too crowded:

## Free disk space check / cleanup
ds, df = sh.df(bak_root)
min_free = ( max(min_free_avg( (ds-df) / len(bakz)), min_free_abs*G)
    if min_free_avg and bakz else min_free_abs*G )

def bakz_rmq():
    '''Iterator that returns bakz in order of removal'''
    bakz_queue = list( list(bakz) for host,bakz in it.groupby(sorted(bakz),
        key=lambda bak: bakz_re.match(bak).group(1)) )
    while bakz_queue:
        if len(bakz_queue[-1]) <= min_keep: break
        yield bakz_queue[-1].pop()

if df < min_free:
    for bak in bakz_rmq():
        log.info('Removing backup: {0}'.format(bak))
        sh.rr(bak, onerror=False)
        ds, df = sh.df(bak_root)
        if df >= min_free: break
        log.fatal( 'Not enough space on media:'
                ' {0:.1f}G, need {1:.1f}G, {2} backups min)'\
            .format( op.truediv(df, G),
                op.truediv(min_free, G), min_keep ), crash=2 )

And from here it's just to start rsync and wait 'till the job's done.

This thing works for months now, and saved my day on many occasions, but the most important thing here I think is the knowledge that the backup is there should you need one, so you never have to worry about breaking your system or losing anything important there, whatever you do.

Here's the full script.

Actually, there's more to the story, since just keeping backups on single local harddisk (raid1 of two disks, actually) isn't enough for me.
Call this paranoia, but setting up system from scratch and restoring all the data I have is a horrible nightmare, and there are possibility of fire, robbery, lighting, voltage surge or some other disaster that can easily take this disk(s) out of the picture, and few gigabytes of space in the web come almost for free these days - there are p2p storages like wuala, dropbox, google apps/mail with their unlimited quotas...
So, why not upload all this stuff there and be absolutely sure it'd never go down, whatever happens? Sure thing.
Guess I'll write a note on the topic as much to document it for myself as for the case someone might find it useful as well, plus the ability to link it instead of explaining ;)

Feb 13, 2010

My "simple" (ok, not quite) backup system - implementation (backed-up side)

As I've already outlined before, my idea of backups comes down to these points:

  • No direct access to backup storage from backed-up machine, no knowledge about backup storage layout there.
  • No any-time access from backup machine to backed-up one. Access should be granted on the basis of request from backed-up host, for one connection only.
  • Read-only access to filesystem only, no shell or network access.
  • Secure transfer channel.
  • Incremental, yet independent backups, retaining all fs metadata.
  • No extra strain on network (wireless) or local disk space.
  • Non-interactive usage (from cron).
  • No root involved on any side at any time.

And the idea is to implement these with openssh, rsync and a pair of scripts.

Ok, the process is initiated by backed-up host, which will spawn sshd for single secure backup channel, so first thing to do is to invoke of ssh-keygen and get the pair of one-time keys from it.

As an extra precaution, there's no need to write private key to local filesystem, as it's only needed by ssh-client on a remote (backup) host.
Funny thing is that ssh-keygen doesn't actually allow that, although it's possible to make it use fifo socket instead of file.
FIFO socket implies blocking I/O however, so one more precaution should be taken for script not to hang indefinitely.

A few convenience functions here and there are imported from fgc module, but can be replaced by standard counterparts (POpen, unlink, etc) without problem - no magic there.

Here we go:

def unjam(sig, frm):
    raise RuntimeError, 'no data from ssh-keygen'
signal.signal(signal.SIGALRM, unjam)

keygen = exe.proc( 'ssh-keygen', '-q',
    '-t', 'rsa', '-b', '2048', '-N', '', '-f', key )

key_sub = open(key).read()
sh.rm(key, onerror=False)
if keygen.wait(): raise RuntimeError, 'ssh-keygen has failed'

Public key can then be used to generate one-time ACL file, aka "authorized_hosts" file:

keygen = open(key_pub, 'r').read().strip(spaces)
open(key_pub, 'w').write(
    'from="{0}" {1}\n'.format(remote_ip, keygen) )

So, we have an ACL file and matching private key. It's time to start sshd:

sshd = exe.proc( '/usr/sbin/sshd', '-6', '-de', '-p{0}'.format(port),
    '-oChallengeResponseAuthentication=no', # no password prompt
    '-oAllowAgentForwarding=no', # no need for this
    '-oAllowTcpForwarding=no', # no port-forwarding
    '-oPermitTunnel=no', # no tunneling
    '-oCompression=no', # minor speedup, since it's handled by rsync
    '-oForceCommand=/usr/bin/ppy {0} -c'\
        .format(os.path.realpath(__file__)), # enforce this script checks
        .format(os.path.realpath(key_pub)), silent=True )

A bit of an explaination here.

"silent" keyword here just eats verbose stdout/stderr, since it's not needed for these purposes.

According to original plan, I use "ForceCommand" to start the same initiator-script (but with "-c" parameter), so it will invoke rsync (and rsync only) with some optional checks and scheduling priority enforcements.

Plus, since initial script and sshd are started by ordinary user, we'd need to get dac_read_search capability for rsync to be able to read (and only read) every single file on local filesystem.
That's where ppy binary comes in, launching this script with additional capabilities, defined for the script file.
Script itself doesn't need to make the caps effective - just pass as inherited further to rsync binary, and that's where it, and I mean cap_dac_read_search, should be activated and used.
To that end, system should have aforementioned wrapper (ppy) with permitted-effective caps, to provide them in the first place, python binary with "cap_dac_read_search=i" and rsync with "cap_dac_read_search=ei" (since it doesn't have option to activate caps from it's code).
This may look like an awful lot of privileged bits, but it's absolutely not! Inheritable caps are just that - inheritable, they won't get set by this bit by itself.
In fact, one can think of whole fs as suid-inheritable, and here inheritance only works for a small fragment of root's power and that only for three files, w/o capability to propagnate anywhere else, if there'd be some exec in a bogus commandline.

Anyway, everything's set and ready for backup host to go ahead and grab local fs.

Note that backup of every file isn't really necessary, since sometimes most heavy ones are just caches, games or media content, readily available for downloading from the net, so I just glance at my fs with xdiskusage tool (which is awesome, btw, even for remote servers' df monitoring: "ssh remote du -k / | xdiskusage") to see if it's in need of cleanup and to add largest paths to backup-exclude list.

Actually, I thought of dynamically excluding pretty much everything that can be easily rebuilt by package manager (portage in my case), but decided that I have space for these, and backing it all up makes "rm -rf", updates or compiler errors (since I'm going to try icc) much less scary anyway.

Ok, here goes the backup request:

ssh = exe.proc( 'ssh', remote,
    '{0}:{1}'.format(os.uname()[1], port), stdin=exe.PIPE )

if ssh.wait(): raise RuntimeError, 'Remote call failed'

"remote" here is some unprivileged user on a backup host with backup-grab script set as a shell. Pubkey auth is used, so no interaction is required.

And that actually concludes locally-initiated operations - it's just wait to confirm that the task's completed.
Now backup host have the request, to-be-backed-up hostname and port on the commandline, with private key and paths-to-exclude list piped through.

One more thing done locally though is the invocation of this script when backup host will try to grab fs, but it's simple and straightforward as well:

cmd = os.getenv('SSH_ORIGINAL_COMMAND')
if not cmd: parser.error('No SSH_ORIGINAL_COMMAND in ENV')
if not re.match(
        r'^(ionice -c\d( -n\d)? )?rsync --server', cmd ):
    parser.error('Disallowed command: {0}'.format(cmd))
try: cmd, argz = cmd.split(' ', 1)
except ValueError: argz = ''
os.execlp(cmd, os.path.basename(cmd), *argz.split())

Rsync takes control from here and reads fs tree, checking files and their attributes against previous backups with it's handy rolling-checksums, creating hardlinks on match and transferring only mismatching pieces, if any, but more on that later, in the next post about implementation of the other side of this operation.

Full version of this script can be found here.

Member of The Internet Defense League