Jul 16, 2014

(yet another) Dynamic DNS thing for tinydns (djbdns)

Tried to find any simple script to update tinydns (part of djbdns) zones that'd be better than ssh dns_update@remote_host update.sh, but failed - they all seem to be hacky php scripts, doomed to run behind httpds, send passwords in url, query random "myip" hosts or something like that.

What I want instead is something that won't be making http, tls or ssh connections (and stirring all the crap behind these), but would rather just send udp or even icmp pings to remotes, which should be enough for update, given source IPs of these packets and some authentication payload.

So yep, wrote my own scripts for that - tinydns-dynamic-dns-updater project.

Tool sends UDP packets with 100 bytes of "( key_id || timestamp ) || Ed25519_sig" from clients, authenticating and distinguishing these server-side by their signing keys ("key_id" there is to avoid iterating over them all, checking which matches signature).

Server zone files can have "# dynamic: ts key1 key2 ..." comments before records (separated from static records after these by comments or empty lines), which says that any source IPs of packets with correct signatures (and more recent timestamps) will be recorded in A/AAAA records (depending on source AF) that follow instead of what's already there, leaving anything else in the file intact.

Zone file only gets replaced if something is actually updated and it's possible to use dynamic IP for server as well, using dynamic hostname on client (which is resolved for each delayed packet).

Lossy nature of UDP can be easily mitigated by passing e.g. "-n5" to the client script, so it'd send 5 packets (with exponential delays by default, configurable via --send-delay), plus just having the thing on fairly regular intervals in crontab.

Putting server script into socket-activated systemd service file also makes all daemon-specific pains like using privileged ports (and most other security/access things), startup/daemonization, restarts, auto-suspend timeout and logging woes just go away, so there's --systemd flag for that too.

Given how easy it is to run djbdns/tinydns instance, there really doesn't seem to be any compelling reason not to use your own dynamic dns stuff for every single machine or device that can run simple python scripts.

Github link: tinydns-dynamic-dns-updater

May 12, 2014

My Firefox Homepage

Wanted to have some sort of "homepage with my fav stuff, arranged as I want to" in firefox for a while, and finally got resolve to do something about it - just finished a (first version of) script to generate the thing - firefox-homepage-generator.

Default "grid of page screenshots" never worked for me, and while there are other projects that do other layouts for different stuff, they just aren't flexible enough to do whatever horrible thing I want.

In this particular case, I wanted to experiment with chaotic tag cloud of bookmarks (so they won't ever be in the same place), relations graph for these tags and random picks for "links to read" from backlog.

Result is a dynamic d3 + d3.layout.cloud (interactive example of this layout) page without much style:

homepage screenshot
"Mark of Chaos" button in the corner can fly/re-pack tags around.
Clicking tag shows bookmarks tagged as such and fades all other tags out in proportion to how they're related to the clicked one (i.e. how many links share the tag with others).

Started using FF bookmarks again in a meaningful way only recently, so not much stuff there yet, but it does seem to help a lot, especially with these handy awesome bar tricks.

Not entirely sure how useful the cloud visualization or actually having a homepage would be, but it's a fun experiment and a nice place to collect any useful web-surfing-related stuff I might think of in the future.

Repo link: firefox-homepage-generator

Sep 26, 2013

FAT32 driver in python

Wrote a driver for still common FAT32 recently, while solving the issue with shuffle on cheap "usb stick with microsd slot" mp3 player.

It's kinda amazing how crappy firmware in these things can be.

Guess one should know better than to get such crap with 1-line display, gapful playback, weak battery, rewind at non-accelerating ~3x speed, no ability to pick tracks while playing and plenty of other annoying "features", but the main issue I've had with the thing by far is missing shuffle functionality - it only plays tracks in static order in which they were uploaded (i.e. how they're stored on fs).

Seems like whoever built the thing made it deliberately hard to shuffle the tracks offline - just one sort by name would've made things a lot easier, and it's clear that device reads the full dir listing from the time it spends opening dirs with lots of files.

Most obvious way to do such "offline shuffle", given how the thing orders files, is to re-upload tracks in different order, which is way too slow and wears out flash ram.

Second obvious for me was to dig into FAT32 and just reorder entries there, which is what the script does.

It's based off example of a simplier fat16 parser in construct module repo and processes all the necessary metadata structures like PDRs, FATs (cluster maps) and directory tables with vfat long-name entries inside.

Given that directory table on FAT32 is just an array (with vfat entries linked to dentries after them though), it's kinda easy just to shuffle entries there and write data back to clusters from where it was read.

One less obvious solution to shuffle, coming from understanding how vfat lfn entries work, is that one can actually force fs driver to reorder them by randomizing filename length, as it'll be forced to move longer entries to the end of the directory table.

But that idea came a bit too late, and such parser can be useful for extending FAT32 to whatever custom fs (with e.g. FUSE or 9p interface) or implementing some of the more complex hacks.

It's interestng that fat dentries can (and apparently known to) store unix-like modes and uid/gid instead of some other less-useful attrs, but linux driver doesn't seem to make use of it.

OS'es also don't allow symlinks or hardlinks on fat, while technically it's possible, as long as you keep these read-only - just create dentries that point to the same cluster.

Should probably work for both files and dirs and allow to create multiple hierarchies of the same files, like several dirs where same tracks are shuffled with different seed, alongside dirs where they're separated by artist/album/genre or whatever other tags.

It's very fast and cheap to create these, as each is basically about "(name_length + 32B) * file_count" in size, which is like just 8 KiB for dir structure holding 100+ files.

So plan is to extend this small hack to use mutagen to auto-generate such hierarchies in the future, or maybe hook it directly into beets as an export plugin, combined with transcoding, webui and convenient music-db there.

Also, can finally tick off "write proper on-disk fs driver" from "things to do in life" list ;)

Apr 08, 2013

TCP Hijacking for The Greater Good

As discordian folk celebrated Jake Day yesterday, decided that I've had it with random hanging userspace state-machines, stuck forever with tcp connections that are not legitimately dead, just waiting on both sides.

And since pretty much every tool can handle transient connection failures and reconnects, decided to come up with some simple and robust-enough solution to break such links without (or rather before) patching all the apps to behave.

One last straw was davfs2 failing after a brief net-hiccup, with my options limited to killing everything that uses (and is hanging dead on) its mount, then going kill/remount way.
As it uses stateless http connections, I bet it's not even an issue for it to repeat whatever request it tried last and it sure as hell handles network failures, just not well in all cases.

I've used such technique to test some twisted-things in the past, so it was easy to dig scapy-automata code for doing that, though the real trick is not to craft/send FIN or RST packet, but rather to guess TCP seq/ack numbers to stamp it with.

Alas, none of the existing tools (e.g. tcpkill) seem to do anything clever in this regard.

cutter states that

There is a feature of the TCP/IP protocol that we could use to good effect here - if a packet (other than an RST) is received on a connection that has the wrong sequence number, then the host responds by sending a corrective "ACK" packet back.

But neither the tool itself nor the technique described seem to work, and I actually failed to find (or recall) any mentions (or other uses) of such corrective behavior. Maybe it was so waaay back, dunno.

Naturally, as I can run such tool on the host where socket endpoint is, local kernel has these numbers stored, but apparently no one really cared (or had a legitimate enough use-case) to expose these to the userspace... until very recently, that is.

Recent work of Parallels folks on CRIU landed getsockopt(sk, SOL_TCP, TCP_QUEUE_SEQ, ...) in one the latest mainline kernel releases.
Trick is then just to run that syscall in the pid that holds the socket fd, which looks like a trivial enough task, but looking over crtools (which unfortunately doesn't seem to work with vanilla kernel yet) and ptrace-parasite tricks of compiling and injecting shellcode, decided that it's just too much work for me, plus they share the same x86_64-only codebase, and I'd like to have the thing working on ia32 machines as well.
Caching all the "seen" seq numbers in advance looks tempting, especially since for most cases, relevant traffic is processed already by nflog-zmq-pcap-pipe and Snort, which can potentially dump "(endpoint1-endpoint2, seq, len)" tuples to some fast key-value backend.
Invalidation of these might be a minor issue, but I'm not too thrilled about having some dissection code to pre-cache stuff that's already cached in every kernel anyway.

Patching kernel to just expose stuff via /proc looks like bit of a burden as well, though an isolated module code would probably do the job well. Weird that there doesn't seem to be one of these around already, closest one being tcp_probe.c code, which hooks into tcp_recv code-path and doesn't really get seqs without some traffic either.

One interesting idea that got my attention and didn't require a single line of extra code was proposed on the local xmpp channel - to use tcp keepalives.

Sure, they won't make kernel drop connection when it's userspace that hangs on both ends, with connection itself being perfectly healthy, but every one of these carries a seq number that can be spoofed and used to destroy that "healthy" state.

Pity these are optional and can't be just turned on for all sockets system-wide on linux (unlike some BSD systems, apparently), and nothing uses these much by choice (which can be seen in netstat --timer).

Luckily, there's a dead-simple LD_PRELOAD code of libkeepalive which can be used to enforce system-wide opt-out behavior for these (at least for non-static binaries).
For suid stuff (like mount.davfs, mentioned above), it has to be in /etc/ld.so.preload, not just env, but as I need it "just in case" for all the connections, that seems fine in my case.

And tuning keepalives to be frequent-enough seem to be a no-brainer and shouldn't have any effect on 99% of legitimate connections at all, as they probably pass some traffic every other second, not after minutes or hours.

net.ipv4.tcp_keepalive_time = 900
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 156

(default is to send empty keepalive packet after 2 hours of idleness)

With that, tool has to run ~7 min on average to kill any tcp connection in the system, which totally acceptable, and no fragile non-portable ptrace-shellcode magic involved (at least yet, I bet it'd be much easier to do in the future).

Code and some docs for the tool/approach can be found on github.

More of the same (update 2013-08-11):

Actually, lacking some better way to send RST/FIN from a machine to itself than swapping MACs (and hoping that router is misconfigured enough to bounce packet "from itself" back) or "-j REJECT --reject-with tcp-reset" (plus a "recent" match or transient-port matching, to avoid blocking reconnect as well), countdown for a connection should be ~7 + 15 min, as only next keepalive will reliably produce RST response.

With a bit of ipset/iptables/nflog magic, it was easy to make the one-time REJECT rule, snatching seq from dropped packet via NFLOG and using that to produce RST for the other side as well.

Whole magic there goes like this:

-A conn_cutter ! -p tcp -j RETURN
-A conn_cutter -m set ! --match-set conn_cutter src,src -j RETURN
-A conn_cutter -p tcp -m recent --set --name conn_cutter --rsource
-A conn_cutter -p tcp -m recent ! --rcheck --seconds 20\
        --hitcount 2 --name conn_cutter --rsource -j NFLOG
-A conn_cutter -p tcp -m recent ! --rcheck --seconds 20\
        --hitcount 2 --name conn_cutter --rsource -j REJECT --reject-with tcp-reset

-I OUTPUT -j conn_cutter

"recent" matcher there is a bit redundant in most cases, as outgoing connections usually use transient-range tcp ports, which shouldn't match for different attempts, but some apps might bind these explicitly.

ipset turned out to be quite a neat thing to avoid iptables manipulations (to add/remove match).

It's interesting that this set of rules handles RST to both ends all by itself if packet arrives from remote first - response (e.g. ACK) from local socket will get RST but won't reach remote, and retransmit from remote will get RST because local port is legitimately closed by then.

Current code allows to optionally specify ipset name, whether to use nflog (via spin-off scapy-nflog-capture driver) or raw sockets, and doesn't do any mac-swapping, only sending RST to remote (which, again, should still be sufficient with frequent-enough keepalives).

Now, if only some decade-old undocumented code didn't explicitly disable these nice keepalives...

Apr 06, 2013

Fighting storage bitrot and decay

Everyone is probably aware that bits do flip here and there in the supposedly rock-solid, predictable and deterministic hardware, but somehow every single data-management layer assumes that it's not its responsibility to fix or even detect these flukes.

Bitrot in RAM is a known source of bugs, but short of ECC, dunno what one can do without huge impact on performance.

Disks, on the other hand, seem to have a lot of software layers above them, handling whatever data arrangement, compression, encryption, etc, and the fact that bits do flip in magnetic media seem to be just as well-known (study1, study2, study3, ...).
In fact, these very issues seem to be the main idea behind well known storage behemoth ZFS.
So it really bugged me for quite a while that any modern linux system seem to be completely oblivious to the issue.

Consider typical linux storage stack on a commodity hardware:

  • You have closed-box proprietary hdd brick at the bottom, with no way to tell what it does to protect your data - aside from vendor marketing pitches, that is.

  • Then you have well-tested and robust linux driver for some ICH storage controller.

    I wouldn't bet that it will corrupt anything at this point, but it doesn't do much else to the data but pass around whatever it gets from the flaky device either.

  • Linux blkdev layer above, presenting /dev/sdX. No checks, just simple mapping.

  • device-mapper.

    Here things get more interesting.

    I tend to use lvm wherever possible, but it's just a convenience layer (or a set of nice tools to setup mappings) on top of dm, no checks of any kind, but at least it doesn't make things much worse either - lvm metadata is fairly redundant and easy to backup/recover.

    dm-crypt gives no noticeable performance overhead, exists either above or under lvm in the stack, and is nice hygiene against accidental leaks (selling or leasing hw, theft, bugs, etc), but lacking authenticated encryption modes it doesn't do anything to detect bit-flips.
    Worse, it amplifies the issue.
    In the most common CBC mode one flipped bit in the ciphertext will affect a few other bits of data until the end of the dm block.
    Current dm-crypt default (since the latest cryptsetup-1.6.X, iirc) is XTS block encryption mode, which somewhat limits the damage, but dm-crypt has little support for changing modes on-the-fly, so tough luck.
    But hey, there is dm-verity, which sounds like exactly what I want, except it's read-only, damn.
    Read-only nature is heavily ingrained in its "hash tree" model of integrity protection - it is hashes-of-hashes all the way up to the root hash, which you specify on mount, immutable by design.

    Block-layer integrity protection is a bit weird anyway - lots of unnecessary work potential there with free space (can probably be somewhat solved by TRIM), data that's already journaled/checksummed by fs and just plain transient block changes which aren't exposed for long and one might not care about at all.

  • Filesystem layer above does the right thing sometimes.

    COW fs'es like btrfs and zfs have checksums and scrubbing, so seem to be a good options.
    btrfs was slow as hell on rotating plates last time I checked, but zfs port might be worth a try, though if a single cow fs works fine on all kinds of scenarios where I use ext4 (mid-sized files), xfs (glusterfs backend) and reiserfs (hard-linked backups, caches, tiny-file sub trees), then I'd really be amazed.

    Other fs'es plain suck at this. No care for that sort of thing at all.

  • Above-fs syscall-hooks kernel layers.

    IMA/EVM sound great, but are also for immutable security ("integrity") purposes ;(

    In fact, this layer is heavily populated by security stuff like LSM's, which I can't imagine being sanely used for bitrot-detection purposes.
    Security tools are generally oriented towards detecting any changes, intentional tampering included, and are bound to produce a lot of false-positives instead of legitimate and actionable alerts.

    Plus, upon detecting some sort of failure, these tools generally don't care about the data anymore acting as a Denial-of-Service attack on you, which is survivable (everything can be circumvented), but fighting your own tools doesn't sound too great.

  • Userspace.

    There is tripwire, but it's also a security tool, unsuitable for the task.

    Some rare discussions of the problem pop up here and there, but alas, I failed to salvage anything useable from these, aside from ideas and links to subject-relevant papers.

Scanning github, bitbucket and xmpp popped up bitrot script and a proof-of-concept md-checksums md layer, which apparently haven't even made it to lkml.

So, naturally, following long-standing "... then do it yourself" motto, introducing fs-bitrot-scrubber tool for all the scrubbing needs.

It should be fairly well-described in the readme, but the gist is that it's just a simple userspace script to checksum file contents and check changes there over time, taking all the signs of legitimate file modifications and the fact that it isn't the only thing that needs i/o in the system into account.

Main goal is not to provide any sort of redundancy or backups, but rather notify of the issue before all the old backups (or some cluster-fs mirrors in my case) that can be used to fix it are rotated out of existance or overidden.

Don't suppose I'll see such decay phenomena often (if ever), but I don't like having the odds, especially with an easy "most cases" fix within grasp.

If I'd keep lot of important stuff compressed (think what will happen if a single bit is flipped in the middle of few-gigabytes .xz file) or naively (without storage specifics and corruption in mind) encrypted in cbc mode (or something else to the same effect), I'd be worried about the issue so much more.

Wish there'd be something common out-of-the-box in the linux world, but I guess it's just not the time yet (hell, there's not even one clear term in the techie slang for it!) - with still increasing hdd storage sizes and much more vulnerable ssd's, some more low-level solution should materialize eventually.

Here's me hoping to raise awareness, if only by a tiny bit.

github project link

Mar 25, 2013

Secure cloud backups with Tahoe-LAFS

There's plenty of public cloud storage these days, but trusting any of them with any kind of data seem reckless - service is free to corrupt, monetize, leak, hold hostage or just drop it then.
Given that these services are provided at no cost, and generally without much ads, guess reputation and ToS are the things stopping them from acting like that.
Not trusting any single one of these services looks like a sane safeguard against them suddenly collapsing or blocking one's account.
And not trusting any of them with plaintext of the sensitive data seem to be a good way to protect it from all the shady things that can be done to it.

Tahoe-LAFS is a great capability-based secure distributed storage system, where you basically do "tahoe put somefile" and get capability string like "URI:CHK:iqfgzp3ouul7tqtvgn54u3ejee:...u2lgztmbkdiuwzuqcufq:1:1:680" in return.

That string is sufficient to find, decrypt and check integrity of the file (or directory tree) - basically to get it back in what guaranteed to be the same state.
Neither tahoe node state nor stored data can be used to recover that cap.
Retreiving the file afterwards is as simple as GET with that cap in the url.

With remote storage providers, tahoe node works as a client, so all crypto being client-side, actual cloud provider is clueless about the stuff you store, which I find to be quite important thing, especially if you stripe data across many of these leaky and/or plain evil things.

Finally got around to connecting a third backend (box.net) to tahoe today, so wanted to share a few links on the subject:

Feb 08, 2013

Headless Skype to IRC gateway part 4 - skyped bikeshed

As suspected before, ended up rewriting skyped glue daemon.

There were just way too many bad practices (from my point of view) accumulated there (incomplete list can be found in the github issue #7, as well as some PRs I've submitted), and I'm quite puzzled why the thing actually works, given quite weird socket handling going on there, but one thing should be said: it's there and it works.
As software goes, that's the most important metric by far.

But as I'm currently purely a remote worker (not sure if I qualify for "freelancer", being just a drone), and skype is being quite critical for comms in this field, just working thing that silently drops errors and messages is not good enough.

Rewritten version is a generic eventloop with non-blocking sockets and standard handle_in/handle_out low-level recv/send/buffer handlers, with handle_<event> and dispatch_<event> callbacks on higher level and explicit conn_state var.
It also features full-fledged and configurable python logging, with debug options, (at least) warnings emitted on every unexpected event and proper non-broad exception handling.

Regardless of whether the thing will be useful upstream, it should finally put a final dot into skype setup story for me, as the whole setup seem to be robust and reliable enough for my purposes now.

Unless vmiklos will find it useful enough to merge, I'll probably maintain the script in this bitlbee fork, rebasing it on top of stable upstream bitlbee.

Jan 28, 2013

Headless Skype to IRC gateway part 3 - bitlbee + skyped

As per previous entry, with mock-desktop setup of Xvfb, fluxbox, x11vnc and skype in place, the only thing left is to use skype interfaces (e.g. dbus) to hook it up with existing IRC setup and maybe insulate skype process from the rest of the system.

Last bit is even easier than usual, since all the 32-bit libs skype needs are collected in one path, so no need to allow it to scan whatever system paths. Decided to go with the usual simplistic apparmor-way here - apparmor.profile, don't see much reason to be more paranoid here.

Also, libasound, used in skype gets quite noisy log-wise about not having the actual hardware on the system, but I felt bad about supressing the whole stderr stream from skype (to not miss the crash/hang info there), so had to look up a way to /dev/null alsa-lib output.
General way seem to be having "null" module as "default" sink
pcm.!default {
  type null
ctl.!default {
  type null

(libasound can be pointed to a local config by ALSA_CONFIG_PATH env var)

That "null" module is actually a dynamically-loaded .so, but alsa prints just a single line about it being missing instead of an endless stream of complaints for missing hw, so the thing works, by accident.

Luckily, bitlbee has support for skype, thanks to vmiklos, with sane way to run bitlbee and skype setup on different hosts (as it actually is in my case) through "skyped" daemon talking to skype and bitlbee connecting to its tcp (tls-wrapped) socket.

Using skyped shipped with bitlbee (which is a bit newer than on bitlbee-skype github) somewhat worked, with no ability to reconnect to it (hangs after handling first connection), ~1/4 chance of connection from bitlbee failing, it's persistence in starting skype (even though it's completely unnecessary in my case - systemd can do it way better) and such.

It's fairly simple python script though, based on somewhat unconventional Skype4Py module, so was able to fix most annoying of these issues (code can be found in the skype-space repo).
Will try to get these merged into bitlbee as I'm not the only one having these issues, apparently (e.g. #966), but so many things seem to be broken in that code (esp. wrt socket-handling), I think some major rewrite is in order, but that might be much harder to push upstream.
One interesting quirk of skyped is that it uses TLS to protect connections (allowing full control of the skype account) between bitlbee module and the daemon, but it doesn't bothers with any authorization, making that traffic as secure as plaintext to anyone in-between.
Quite a bit worse is that it's documented that the traffic is "encrypted", which might get one to think "ok, so running that thing on vps I don't need ssh-tunnel wrapping", which is kinda sad.
Add to that the added complexity it brings, segfaults in the plugin (crashing bitlbee), unhandled errors like
Traceback (most recent call last):
  File "./skyped", line 209, in listener
  File "/usr/lib64/python2.7/ssl.py", line 381, in wrap_socket
  File "/usr/lib64/python2.7/ssl.py", line 143, in __init__
  File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake
error: [Errno 104] Connection reset by peer
...and it seem to be classic "doing it wrong" pattern.
Not that much of an issue in my case, but I guess there should at least be a big red warning for that.

Functionality-wise, pretty much all I needed is there - one-to-one chats, bookmarked channels (as irc channels!), file transfers (just set "accept all" for these) with notifications about them, user info, contact list (add/remove with allow/deny queries),

But the most important thing by far is that it works at all, saving me plenty of work to code whatever skype-control interface over irc, though I'm very tempted to rewrite "skyped" component, which is still a lot easier with bitlbee plugin on the other end.

Units and configs for the whole final setup can be found on github.

Jan 21, 2013

PyParsing vs Yapps

As I've been decompiling dynamic E config in the past anyway, wanted to back it up to git repo along with the rest of them.

Quickly stumbled upon a problem though - while E doesn't really modify it without me making some conscious changes, it reorders (or at least eet produces such) sections and values there, making straight dump to git a bit more difficult.
Plus, I have a pet project to update background, and it also introduces transient changes, so some pre-git processing was in order.

e.cfg looks like this:

group "E_Config" struct {
  group "xkb.used_options" list {
    group "E_Config_XKB_Option" struct {
      value "name" string: "grp:caps_toggle";
  group "xkb.used_layouts" list {
    group "E_Config_XKB_Layout" struct {
      value "name" string: "us";

Simple way to make it "canonical" is just to order groups/values there alphabetically, blanking-out some transient ones.

That needs a parser, and while regexps aren't really suited to that kind of thing, pyparsing should work:

number = pp.Regex(r'[+-]?\d+(\.\d+)?')
string = pp.QuotedString('"') | pp.QuotedString("'")
value_type = pp.Regex(r'\w+:')
group_type = pp.Regex(r'struct|list')

value = number | string
block_value = pp.Keyword('value')\
  + string + value_type + value + pp.Literal(';')

block = pp.Forward()
block_group = pp.Keyword('group') + string\
  + group_type + pp.Literal('{') + pp.OneOrMore(block) + pp.Literal('}')
block << (block_group | block_value)

config = pp.StringStart() + block + pp.StringEnd()

Fun fact: this parser doesn't work.

Bails with some error in the middle of the large (~8k lines) real-world config, while working for all the smaller pet samples.

I guess some buffer size must be tweaked (kinda unusual for python module though), maybe I made a mistake there, or something like that.

So, yapps2-based parser:

parser eet_cfg:
  ignore: r'[ \t\r\n]+'
  token END: r'$'
  token N: r'[+\-]?[\d.]+'
  token S: r'"([^"\\]*(\\.[^"\\]*)*)"'
  token VT: r'\w+:'
  token GT: r'struct|list'

  rule config: block END {{ return block }}

  rule block: block_group {{ return block_group }}
    | block_value {{ return block_value }}
  rule block_group:
    'group' S GT r'\{' {{ contents = list() }}
    ( block {{ contents.append(block) }} )*
    r'\}' {{ return Group(S, GT, contents) }}

  rule value: S {{ return S }} | N {{ return N }}
  rule block_value: 'value' S VT value ';' {{ return Value(S, VT, value) }}

Less verbose (even with more processing logic here) and works.

Embedded in a python code (doing the actual sorting), it all looks like this (might be useful to work with E configs, btw).

yapps2 actually generates quite readable code from it, and it was just simplier (and apparently more bugproof) to write grammar rules in it.

ymmv, but it's a bit of a shame that pyparsing seem to be the about the only developed parser-generator of such kind for python these days.

Had to package yapps2 runtime to install it properly, applying some community patches (from debian package) in process and replacing some scary cli code from 2003. Here's a fork.

Feb 28, 2012

Late adventures with time-series data collection and representation

When something is wrong and you look at the system, most often you'll see that... well, it works. There's some cpu, disk, ram usage, some number of requests per second on different services, some stuff piling up, something in short supply here and there...

And there's just no way of telling what's wrong without answers to the questions like "so, what's the usual load average here?", "is the disk always loaded with requests 80% of time?", "is it much more requests than usual?", etc, otherwise you might be off to some wild chase just to find out that load has always been that high, or solve the mystery of some unoptimized code that's been there for ages, without doing anything about the problem in question.

Historical data is the answer, and having used rrdtool with stuff like (customized) cacti and snmpd (with some my hacks on top) in the past, I was overjoyed when I stumbled upon a graphite project at some point.

From then on, I strived to collect as much metrics as possible, to be able to look at history of anything I want (and lots of values can be a reasonable symptom for the actual problem), without any kind of limitations.
carbon-cache does magic by batching writes and carbon-aggregator does a great job at relieving you of having to push aggregate metrics along with a granular ones or sum all these on graphs.

Initially, I started using it with just collectd (and still using it), but there's a need for something to convert metric names to a graphite hierarcy.

After looking over quite a few solutions to collecd-carbon bridge, decided to use bucky, with a few fixes of my own and quite large translation config.

Bucky can work anywhere, just receiving data from collectd network plugin, understands collectd types and properly translates counter increments to N/s rates. It also includes statsd daemon, which is brilliant at handling data from non-collector daemons and scripts and more powerful metricsd implementation.
Downside is that it's only maintained in forks, has bugs in less-used code (like metricsd), quite resource-hungry (but can be easily scaled-out) and there's kinda-official collectd-carbon plugin now (although I found it buggy as well, not to mention much less featureful, but hopefully that'll be addressed in future collectd versions).

Some of the problems I've noticed with such collectd setup:

  • Disk I/O metrics are godawful or just doesn't work - collected metrics of read/write either for processes of device are either zeroes, have weird values detached from reality (judging by actual problems and tools like atop and sysstat provide) or just useless.
  • Lots of metrics for network and memory (vmem, slab) and from various plugins have naming, inconsistent with linux /proc or documentation names.
  • Some useful metrics that are in, say, sysstat doesn't seem to work with collectd, like sensor data, nfsv4, some paging and socket counters.
  • Some metrics need non-trivial post-processing to be useful - disk utilization % time is one good example.
  • Python plugins leak memory on every returned value. Some plugins (ping, for example) make collectd segfault several times a day.
  • One of the most useful info is the metrics from per-service cgroup hierarchies, created by systemd - there you can compare resource usage of various user-space components, totally pinpointing exactly what caused the spikes on all the other graphs at some time.
  • Second most useful info by far is produced from logs and while collectd has a damn powerful tail plugin, I still found it to be too limited or just too complicated to use, while simple log-tailing code does the better job and is actually simplier due to more powerful language than collectd configuration. Same problem with table plugin and /proc.
  • There's still a need for lagre post-processing chunk of code and pushing the values to carbon.
Of course, I wanted to add systemd cgroup metrics, some log values and missing (and just properly-named) /proc tables data, and initially I wrote a collectd plugin for that. It worked, leaked memory, occasionally crashed (with collectd itself), used some custom data types, had to have some metric-name post-processing code chunk in bucky...
Um, what the hell for, when sending metric value directly takes just "echo some.metric.name $val $(printf %(%s)T -1) >/dev/tcp/carbon_host/2003"?

So off with collectd for all the custom metrics.

Wrote a simple "while True: collect_and_send() && sleep(till_deadline);" loop in python, along with the cgroup data collectors (there are even proper "block io" and "syscall io" per-service values!), log tailer and sysstat data processor (mainly for disk and network metrics which have batshit-crazy values in collectd plugins).

Another interesting data-collection alternative I've explored recently is ganglia.
Redundant gmond collectors and aggregators, communicating efficiently over multicast are nice. It has support for python plugins, and is very easy to use - pulling data from gmond node network can be done with one telnet or nc command, and it's fairly comprehensible xml, not some binary protocol. Another nice feature is that it can re-publish values only on some significant changes (where you define what "significant" is), thus probably eliminating traffic for 90% of "still 0" updates.
But as I found out while trying to use it as a collectd replacement (forwarding data to graphite through amqp via custom scripts), there's a fatal flaw - gmond plugins can't handle dynamic number of values, so writing a plugin that collects metrics from systemd services' cgroups without knowing how many of these will be started in advance is just impossible.
Also it has no concept for timestamps of values - it only has "current" ones, making plugins like "sysstat data parser" impossible to implement as well.
collectd, in contrast, has no constraint on how many values plugin returns and has timestamps, but with limitations on how far backwards they are.

Pity, gmond looked like a nice, solid and resilent thing otherwise.

I still like the idea to pipe graphite metrics through AMQP (like rocksteady does), routing them there not only to graphite, but also to some proper threshold-monitoring daemon like shinken (basically nagios, but distributed and more powerful), with alerts, escalations, trending and flapping detection, etc, but most of the existing solutions all seem to use graphite and whisper directly, which seem kinda wasteful.

Looking forward, I'm actually deciding between replacing collectd completely for a few most basic metrics it now collects, pulling them from sysstat or just /proc directly or maybe integrating my collectors back into collectd as plugins, extending collectd-carbon as needed and using collectd threshold monitoring and matches/filters to generate and export events to nagios/shinken... somehow first option seem to be more effort-effective, even in the long run, but then maybe I should just work more with collectd upstream, not hack around it.

← Previous Next → Page 2 of 5
Member of The Internet Defense League