Apr 06, 2013

Fighting storage bitrot and decay

Everyone is probably aware that bits do flip here and there in the supposedly rock-solid, predictable and deterministic hardware, but somehow every single data-management layer assumes that it's not its responsibility to fix or even detect these flukes.

Bitrot in RAM is a known source of bugs, but short of ECC, dunno what one can do without huge impact on performance.

Disks, on the other hand, seem to have a lot of software layers above them, handling whatever data arrangement, compression, encryption, etc, and the fact that bits do flip in magnetic media seem to be just as well-known (study1, study2, study3, ...).
In fact, these very issues seem to be the main idea behind well known storage behemoth ZFS.
So it really bugged me for quite a while that any modern linux system seem to be completely oblivious to the issue.

Consider typical linux storage stack on a commodity hardware:

  • You have closed-box proprietary hdd brick at the bottom, with no way to tell what it does to protect your data - aside from vendor marketing pitches, that is.

  • Then you have well-tested and robust linux driver for some ICH storage controller.

    I wouldn't bet that it will corrupt anything at this point, but it doesn't do much else to the data but pass around whatever it gets from the flaky device either.

  • Linux blkdev layer above, presenting /dev/sdX. No checks, just simple mapping.

  • device-mapper.

    Here things get more interesting.

    I tend to use lvm wherever possible, but it's just a convenience layer (or a set of nice tools to setup mappings) on top of dm, no checks of any kind, but at least it doesn't make things much worse either - lvm metadata is fairly redundant and easy to backup/recover.

    dm-crypt gives no noticeable performance overhead, exists either above or under lvm in the stack, and is nice hygiene against accidental leaks (selling or leasing hw, theft, bugs, etc), but lacking authenticated encryption modes it doesn't do anything to detect bit-flips.
    Worse, it amplifies the issue.
    In the most common CBC mode one flipped bit in the ciphertext will affect a few other bits of data until the end of the dm block.
    Current dm-crypt default (since the latest cryptsetup-1.6.X, iirc) is XTS block encryption mode, which somewhat limits the damage, but dm-crypt has little support for changing modes on-the-fly, so tough luck.
    But hey, there is dm-verity, which sounds like exactly what I want, except it's read-only, damn.
    Read-only nature is heavily ingrained in its "hash tree" model of integrity protection - it is hashes-of-hashes all the way up to the root hash, which you specify on mount, immutable by design.

    Block-layer integrity protection is a bit weird anyway - lots of unnecessary work potential there with free space (can probably be somewhat solved by TRIM), data that's already journaled/checksummed by fs and just plain transient block changes which aren't exposed for long and one might not care about at all.

  • Filesystem layer above does the right thing sometimes.

    COW fs'es like btrfs and zfs have checksums and scrubbing, so seem to be a good options.
    btrfs was slow as hell on rotating plates last time I checked, but zfs port might be worth a try, though if a single cow fs works fine on all kinds of scenarios where I use ext4 (mid-sized files), xfs (glusterfs backend) and reiserfs (hard-linked backups, caches, tiny-file sub trees), then I'd really be amazed.

    Other fs'es plain suck at this. No care for that sort of thing at all.

  • Above-fs syscall-hooks kernel layers.

    IMA/EVM sound great, but are also for immutable security ("integrity") purposes ;(

    In fact, this layer is heavily populated by security stuff like LSM's, which I can't imagine being sanely used for bitrot-detection purposes.
    Security tools are generally oriented towards detecting any changes, intentional tampering included, and are bound to produce a lot of false-positives instead of legitimate and actionable alerts.

    Plus, upon detecting some sort of failure, these tools generally don't care about the data anymore acting as a Denial-of-Service attack on you, which is survivable (everything can be circumvented), but fighting your own tools doesn't sound too great.

  • Userspace.

    There is tripwire, but it's also a security tool, unsuitable for the task.

    Some rare discussions of the problem pop up here and there, but alas, I failed to salvage anything useable from these, aside from ideas and links to subject-relevant papers.

Scanning github, bitbucket and xmpp popped up bitrot script and a proof-of-concept md-checksums md layer, which apparently haven't even made it to lkml.

So, naturally, following long-standing "... then do it yourself" motto, introducing fs-bitrot-scrubber tool for all the scrubbing needs.

It should be fairly well-described in the readme, but the gist is that it's just a simple userspace script to checksum file contents and check changes there over time, taking all the signs of legitimate file modifications and the fact that it isn't the only thing that needs i/o in the system into account.

Main goal is not to provide any sort of redundancy or backups, but rather notify of the issue before all the old backups (or some cluster-fs mirrors in my case) that can be used to fix it are rotated out of existance or overidden.

Don't suppose I'll see such decay phenomena often (if ever), but I don't like having the odds, especially with an easy "most cases" fix within grasp.

If I'd keep lot of important stuff compressed (think what will happen if a single bit is flipped in the middle of few-gigabytes .xz file) or naively (without storage specifics and corruption in mind) encrypted in cbc mode (or something else to the same effect), I'd be worried about the issue so much more.

Wish there'd be something common out-of-the-box in the linux world, but I guess it's just not the time yet (hell, there's not even one clear term in the techie slang for it!) - with still increasing hdd storage sizes and much more vulnerable ssd's, some more low-level solution should materialize eventually.

Here's me hoping to raise awareness, if only by a tiny bit.

github project link

Mar 25, 2013

Secure cloud backups with Tahoe-LAFS

There's plenty of public cloud storage these days, but trusting any of them with any kind of data seem reckless - service is free to corrupt, monetize, leak, hold hostage or just drop it then.
Given that these services are provided at no cost, and generally without much ads, guess reputation and ToS are the things stopping them from acting like that.
Not trusting any single one of these services looks like a sane safeguard against them suddenly collapsing or blocking one's account.
And not trusting any of them with plaintext of the sensitive data seem to be a good way to protect it from all the shady things that can be done to it.

Tahoe-LAFS is a great capability-based secure distributed storage system, where you basically do "tahoe put somefile" and get capability string like "URI:CHK:iqfgzp3ouul7tqtvgn54u3ejee:...u2lgztmbkdiuwzuqcufq:1:1:680" in return.

That string is sufficient to find, decrypt and check integrity of the file (or directory tree) - basically to get it back in what guaranteed to be the same state.
Neither tahoe node state nor stored data can be used to recover that cap.
Retreiving the file afterwards is as simple as GET with that cap in the url.

With remote storage providers, tahoe node works as a client, so all crypto being client-side, actual cloud provider is clueless about the stuff you store, which I find to be quite important thing, especially if you stripe data across many of these leaky and/or plain evil things.

Finally got around to connecting a third backend (box.net) to tahoe today, so wanted to share a few links on the subject:

Oct 23, 2011

dm-crypt password caching between dracut and systemd, systemd password agent

Update 2015-11-25: with "ask-password" caching implemented as of systemd-227 (2015-10-07), better way would be to use that in-kernel caching, though likely requires systemd running in initramfs (e.g. dracut had that for a while).

Up until now I've used lvm on top of single full-disk dm-crypt partition.
It seems easiest to work with - no need to decrypt individual lv's, no confusion between what's encrypted (everything but /boot!) and what's not, etc.
Main problem with it though is that it's harder to have non-encrypted parts, everything is encrypted with the same keys (unless there're several dm-crypt layers) and it's bad for SSD - dm-crypt still (as of 3.0) doesn't pass any TRIM requests through, leading to nasty write amplification effect, even more so with full disk given to dm-crypt+lvm.
While there's hope that SSD issues will be kinda-solved (with an optional security trade-off) in 3.1, it's still much easier to keep different distros or some decrypted-when-needed partitions with dm-crypt after lvm, so I've decided to go with the latter for new 120G SSD.
Also, such scheme allows to re-create encrypted lvs, issuing TRIM for the old ones, thus recycling the blocks even w/o support for this in dm-crypt.
Same as with previous initramfs, I've had simple "openct" module (udev there makes it even easier) in dracut to find inserted smartcard and use it to obtain encryption key, which is used once to decrypt the only partition on which everything resides.
Since the only goal of dracut is to find root and get-the-hell-outta-the-way, it won't even try to decrypt all the /var and /home stuff without serious ideological changes.
The problem is actually solved in generic distros by plymouth, which gets the password(s), caches it, and provides it to dracut and systemd (or whatever comes as the real "init"). I don't need splash, and actually hate it for hiding all the info that scrolls in it's place, so plymouth is a no-go for me.

Having a hack to obtain and cache key for dracut by non-conventional means anyway, I just needed to pass it further to systemd, and since they share common /run tmpfs these days, it basically means not to rm it in dracut after use.

Luckily, system-wide password handling mechanism in systemd is well-documented and easily extensible beyond plymouth and default console prompt.

So whole key management in my system goes like this now:

  • dracut.cmdline: create udev rule to generate key.
  • dracut.udev.openct: find smartcard, run rule to generate and cache key in /run/initramfs.
  • dracut.udev.crypt: check for cached key or prompt for it (caching result), decrypt root, run systemd.
  • systemd: start post-dracut-crypt.path unit to monitor /run/systemd/ask-password for password prompts, along with default .path units for fallback prompts via wall/console.
  • systemd.udev: discover encrypted devices, create key requests.
  • systemd.post-dracut-crypt.path: start post-dracut-crypt.service to read cached passwords from /run/initramfs and use these to satisfy requests.
  • systemd.post-dracut-crypt-cleanup.service (after local-fs.target is activated): stop post-dracut-crypt.service, flush caches, generate new one-time keys for decrypted partitions.
End result is passwordless boot with this new layout, which seem to be only possible to spoof by getting root during that process somehow, with altering unencrypted /boot to run some extra code and revert it back being the most obvious possibility.
It's kinda weird that there doesn't seem to be any caching in place already, surely not everyone with dm-crypt are using plymouth?

Most complicated piece here is probably the password agent (in python), which can actually could've been simplier if I haven't followed the proper guidelines and thought a bit around them.

For example, whole inotify handling thing (I've used it via ctypes) can be dropped with .path unit with DirectoryNotEmpty= activation condition - it's there already, PolicyKit authorization just isn't working at such an early stage, there doesn't seem to be much need to check request validity since sending replies to sockets is racy anyway, etc
Still, a good excercise.

Python password agent for systemd. Unit files to start and stop it on demand.

Jun 14, 2010

No IPSec on-a-stick for me ;(

Guess being a long user of stuff like OpenSSH, iproute2 and VDE rots your brain - you start thinking that building any sort of tunnel is a bliss. Well, it's not. At least not "any sort".

This day I've dedicated to set up some basic IPSec tunnel and at first that seemed an easy task - it's long ago in kernel (kame implementation, and it's not the only one for linux), native for IPv6 (which I use in a local network), has quite a lot of publicity (and guides), it's open (and is quite simple, even with IKE magic) and there are at least three major userspace implementations: openswan, ipsec-tools (racoon, kame) and Isakmpd. Hell, it's even supported on Windows. What's more to ask for?
Well, prehaps I made a bad decision starting with openswan and "native" kame NETKEY, but my experience wasn't quite a nice one.

I chose openswan because it looks like more extensive implementation than the rest, and is supported by folks like Red Hat, plus it is fairly up to date and looks constantly developed. Another cherry in it was apparent smartcard support via nss now and opensc in the past.

First alarm bell should've been the fact that openswan actually doesn't compile without quite extensive patching.
Latest version of it in ebuild form (which isn't quite enough for me anyway, since I use exheres these days) is 2.6.23. That's more than half a year old, and even that one is masked in gentoo due to apparent bugs and the ebuild is obviously blind-bumped from some previous version, since it doesn't take things like opensc->nss move (finalized in 2.6.23) into account.
Okay, hacking my own ebuild and exheres for it was fun enough, at least I've got a firm grasp of what it's capable of, but seeing pure-Makefile build system and hard-coded paths in such a package was a bit unexpected. Took me some time to deal with include paths, then lib paths, then it turned out to had an open bug which prevents it's build on linux (wtf!?), and then it crashes on install phase due to some ever-crappy XML stuff.
At least the docs are good enough (even though it's not easy to build them), so I set up an nss db, linked smartcard to it, and got a... segfault? Right on ipsec showhostkey? Right, there's this bug in 2.6.26, although in my case it's probably another one, since the patch doesn't fixes the problem. Great!
Ok, gdb showed that it's something like get-nss-password failing (although it should be quite a generic interface, delegated from nss), even with nsspassword in place and nss itself working perfectly.
Scratch that, simple nss-generated keys (not even certificates) as described in the most basic tutorial, and now it's pluto daemon crashing with just a "Jun 14 15:40:25 daemon.err<27> ipsec__plutorun[-]: /usr/lib/ipsec/_plutorun: line 244: 6229 Aborted ..." line in syslog as soon as both ends off tunnel are up.
Oh, and of course it messes up the connection between hosts in question, so it wouldn't be too easy to ssh between them and debug the problem.

Comparing to ssh or pretty much any tunneling I've encountered to this point, it's still quite a remarkably epic fail. Guess I'll waste a bit more time on this crap, since success seems so close, but it's quite amazing how crappy such projects can still be these days. Of course, at least it's free, right?

Jun 13, 2010

Drop-in ccrypt replacement for bournal

There's one great app - bournal ("when nobody cares what you have to say!"). Essentialy it's a bash script, providing a simple interface to edit and encrypt journal entries.
Idea behind it is quite opposite of blogging - keep your thoughts as far away from everyone as possible. I've used the app for quite a while, ever since I've noticed it among freshmeat release announcements. It's useful to keep some thoughts or secrets (like keys or passwords) somewhere, aside from the head, even if you'd never read these again.

Anyway, encryption there is done by the means of ccrypt utility, which is sorta CLI for openssl. I don't get the rationale behind using it instead of openssl directly (like "openssl enc ..."), and there are actually even better options, like gnupg, which won't need a special logic to keep separate stream-cipher password, like it's done in bournal.

So today, as I needed bournal on exherbo laptop, I've faced the need to get ccrypt binary just for that purpose again. Worse yet, I have to recall and enter a password I've used there, and I don't actually need it to just encrypt an entry... as if assymetric encryption, gpg-agent, smartcards and all the other cool santa helpers don't exist yet.
I've decided to hack up my "ccrypt" which will use all-too-familiar gpg and won't ask me for any passwords my agent or scd already know, and in an hour or so, I've succeeded.
And here goes - ccrypt, relying only on "gpg -e -r $EMAIL" and "gpg -d". EMAIL should be in the env, btw.
It actually works as ccencrypt, ccdecrypt, ccat as well, and can do recursive ops just like vanilla ccrypt, which is enough for bournal.

Apr 25, 2010

LUKS + dm-crypt rootfs without password via smartcard

While I'm on a vacation, I've decided to try out new distro I've been meaning to for quite awhile - exherbo.
Mostly it's the same source-based gentoo-linux derivative, yet it's not cloned from gentoo, like funtoo or sabayon, but built from scratch by the guys who've seen gentoo and it's core concepts (like portage or baselayout) as quite a stagnant thing.
While I don't share much of the disgust they have for gentoo legacy, the ideas incorporated in that distro sound quite interesting, but I digress...

I don't believe in fairly common practice of "trying out" something new on a VM - it just don't work for me, probably because I see it as a stupid and posintless thing on some subconscious level, so I've decided to put it onto one of my two laptops, which kinda needed a good cleanup anyway.

While at it, I thought it'd be a good idea to finally dump that stupid practice of entering fs-password on boot, yet I did like the idea of encrypted fs, especially in case of laptop, so I've needed to devise reasonably secure yet paswordless boot method.

I use in-kernel LUKS-enabled dm-crypt (with the help of cryptsetup tool), and I need some initrd (or init-fs) for LVM-root anyway.

There are lots of guides on how to do that with a key from a flash drive but I don't see it particulary secure, since the key can always be copied from a drive just about anywhere, plus I don't trust the flash drives much as they seem to fail me quite often.
As an alternative to that, I have a smartcard-token, which can have a key that can't be copied in any way.
Problem is, of course, that I need to see some key to decrypt filesystem, so my idea was to use that key to sign some temporary data which then used to as an encryption secret.
Furthermore, I thought it'd be nice to have a "dynamic key" that'd change on every bootup, so even if anything would be able to snatch it from fs and use token to sign it, that data would be useless after a single reboot.
Initrd software is obviously a busybox, lvm and a smartcard-related stuff.
Smartcard I have is Alladin eToken PRO 64k, it works fine with OpenSC but not via pcsc-lite, which seem to be preferred hardware abstraction, but with openct, which seems a bit obsolete way. I haven't tried pcsc-lite in quite a while though, so maybe now it supports eToken as well, but since openct works fairly stable for me, I thought I'd stick with it anyway.

Boot sequence comes down to these:

  • Mount pseudofs like proc/sys, get encrypted partition dev and real-rootfs signature (for findfs tool, like label or uuid) from cmdline.
  • Init openct, find smartcard in /sys by hardcoded product id and attach it to openct.
  • Mount persistent key-material storage (same /boot in my case).
  • Read "old" key, replace it with a hashed version, aka "new key".
  • Sign old key using smartcard, open fs with the resulting key.
  • Drop this key from LUKS storage, add a signed "new" key to it.
  • Kill openct processes, effectively severing link with smartcard.
  • Detect and activate LVM volume groups.
  • Find (findfs) and mount rootfs among currently-available partitions.
  • Umount proc/sys, pivot_root, chroot.
  • Here comes the target OS' init.

Took me some time to assemble and test this stuff, although it was fun playing with linux+busybox mini-OS. Makes me somewhat wonder about what takes several GiBs of space in a full-fledged OS when BB contains pretty much everything in less than one MiB ;)

And it's probably a good idea to put some early check of /boot partition (hashes, mounts, whatever) into booted OS init-scripts to see if it was not altered in any significant way. Not really a guarantee that something nasty weren't done to it (and then cleaned up, for example) plus there's no proof that actual OS was booted up from it and the kernel isn't tainted in some malicious way, but should be enough against some lame tampering or pranks, should these ever happen.

Anyway, here's the repo with all the initrd stuff, should anyone need it.

← Previous Page 2 of 2
Member of The Internet Defense League