Mar 25, 2015

gnuplot for live "last 30 seconds" sliding window of "free" (memory) data

Was looking at a weird what-looks-like-a-memleak issue somewhere in the system on changing desktop background (somewhat surprisingly complex operation btw) and wanted to get a nice graph of "last 30s of free -m output", with some labels and easy access to data.

A simple enough task for gnuplot, but resulting in a somewhat complicated solution, as neither "free" nor gnuplot are perfect tools for the job.

First thing is that free -m -s1 doesn't actually give a machine-readable data, and I was too lazy to find something better (should've used sysstat and sar!) and thought "let's just parse that with awk":

free -m -s $interval |
  awk '
    BEGIN {
      exports="total used free shared available"
      agg="free_plot.datz"
      dst="free_plot.dat"}
    $1=="total" {
      for (n=1;n<=NF;n++)
        if (index(exports,$n)) headers[n+1]=$n }
    $1=="Mem:" {
      first=1
      printf "" >dst
      for (n in headers) {
        if (!first) {
          printf " " >>agg
          printf " " >>dst }
        printf "%d", $n >>agg
        printf "%s", headers[n] >>dst
        first=0 }
      printf "\n" >>agg
      printf "\n" >>dst
      fflush(agg)
      close(dst)
      system("tail -n '$points' " agg " >>" dst) }'

That might be more awk than one ever wants to see, but I imagine there'd be not too much space to wiggle around it, as gnuplot is also somewhat picky in its input (either that or you can write same scripts there).

I thought that visualizing "live" stream of data/measurements would be kinda typical task for any graphing/visualization solution, but meh, apparently not so much for gnuplot, as I haven't found better way to do it than "reread" command.

To be fair, that command seem to do what I want, just not in a much obvious way, seamlessly updating output in the same single window.

Next surprising quirk was "how to plot only last 30 points from big file", as it seem be all-or-nothing with gnuplot, and googling around, only found that people do it via the usual "tail" before the plotting.

Whatever, added that "tail" hack right to the awk script (as seen above), need column headers there anyway.

Then I also want nice labels - i.e.:

  • How much available memory was there at the start of the graph.
  • How much of it is at the end.
  • Min for that parameter on the graph.
  • Same, but max.
stats won't give first/last values apparently, unless I missed those in the PDF (only available format for up-to-date docs, le sigh), so one solution I came up with is to do a dry-run plot command with set terminal unknown and "grab first value" / "grab last value" functions to "plot".
Which is not really a huge deal, as it's just a preprocessed batch of 30 points, not a huge array of data.

Ok, so without further ado...

src='free_plot.dat'
y0=100; y1=2000;
set xrange [1:30]
set yrange [y0:y1]

# --------------------
set terminal unknown
stats src using 5 name 'y' nooutput

is_NaN(v) = v+0 != v
y_first=0
grab_first_y(y) = y_first = y_first!=0 && !is_NaN(y_first) ? y_first : y
grab_last_y(y) = y_last = y

plot src u (grab_first_y(grab_last_y($5)))
x_first=GPVAL_DATA_X_MIN
x_last=GPVAL_DATA_X_MAX

# --------------------
set label 1 sprintf('first: %d', y_first) at x_first,y_first left offset 5,-1
set label 2 sprintf('last: %d', y_last) at x_last,y_last right offset 0,1
set label 3 sprintf('min: %d', y_min) at 0,y0-(y1-y0)/15 left offset 5,0
set label 4 sprintf('max: %d', y_max) at 0,y0-(y1-y0)/15 left offset 5,1

# --------------------
set terminal x11 nopersist noraise enhanced
set xlabel 'n'
set ylabel 'megs'

set style line 1 lt 1 lw 1 pt 2 pi -1 ps 1.5
set pointintervalbox 2

plot\
  src u 5 w linespoints linestyle 1 t columnheader,\
  src u 1 w lines title columnheader,\
  src u 2 w lines title columnheader,\
  src u 3 w lines title columnheader,\
  src u 4 w lines title columnheader,\

# --------------------
pause 1
reread

Probably the most complex gnuplot script I composed to date.

Yeah, maybe I should've just googled around for an app that does same thing, though I like how this lore potentially gives ability to plot whatever other stuff in a similar fashion.

That, and I love all the weird stuff gnuplot can do.

For instance, xterm apparently has some weird "plotter" interface hardware terminals had in the past:

gnuplot and Xterm Tektronix 4014 Mode

And there's also the famous "dumb" terminal for pseudographics too.

Regular x11 output looks nice and clean enough though:

gnuplot x11 output

It updates smoothly, with line crawling left-to-right from the start and then neatly flowing through. There's a lot of styling one can do to make it prettier, but I think I've spent enough time on such a trivial thing.

Didn't really help much with debugging though. Oh well...

Full "free | awk | gnuplot" script is here on github.

May 19, 2014

Displaying any lm_sensors data (temperature, fan speeds, voltage, etc) in conky

Conky sure has a ton of sensor-related hw-monitoring options, but it still doesn't seem to be enough to represent even just the temperatures from this "sensors" output:

atk0110-acpi-0
Adapter: ACPI interface
Vcore Voltage:      +1.39 V  (min =  +0.80 V, max =  +1.60 V)
+3.3V Voltage:      +3.36 V  (min =  +2.97 V, max =  +3.63 V)
+5V Voltage:        +5.08 V  (min =  +4.50 V, max =  +5.50 V)
+12V Voltage:      +12.21 V  (min = +10.20 V, max = +13.80 V)
CPU Fan Speed:     2008 RPM  (min =  600 RPM, max = 7200 RPM)
Chassis Fan Speed:    0 RPM  (min =  600 RPM, max = 7200 RPM)
Power Fan Speed:      0 RPM  (min =  600 RPM, max = 7200 RPM)
CPU Temperature:    +42.0°C  (high = +60.0°C, crit = +95.0°C)
MB Temperature:     +43.0°C  (high = +45.0°C, crit = +75.0°C)

k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +30.6°C  (high = +70.0°C)
                       (crit = +90.0°C, hyst = +88.0°C)

radeon-pci-0400
Adapter: PCI adapter
temp1:        +51.0°C

Given the summertime, and faulty noisy cooling fans, decided that it'd be nice to be able to have an idea about what kind of temperatures hw operates on under all sorts of routine tasks.

Conky is extensible via lua, which - among other awesome things there are - allows to code caches for expensive operations (and not just repeat them every other second) and parse output of whatever tools efficiently (i.e. without forking five extra binaries plus perl).

Output of "sensors" though not only is kinda expensive to get, but also hardly parseable, likely unstable, and tool doesn't seem to have any "machine data" option.

lm_sensors includes a libsensors, which still doesn't seem possible to call from conky-lua directly (would need some kind of ffi), but easy to write the wrapper around - i.e. this sens.c 50-liner, to dump info in a useful way:

atk0110-0-0__in0_input 1.392000
atk0110-0-0__in0_min 0.800000
atk0110-0-0__in0_max 1.600000
atk0110-0-0__in1_input 3.360000
...
atk0110-0-0__in3_max 13.800000
atk0110-0-0__fan1_input 2002.000000
atk0110-0-0__fan1_min 600.000000
atk0110-0-0__fan1_max 7200.000000
atk0110-0-0__fan2_input 0.000000
...
atk0110-0-0__fan3_max 7200.000000
atk0110-0-0__temp1_input 42.000000
atk0110-0-0__temp1_max 60.000000
atk0110-0-0__temp1_crit 95.000000
atk0110-0-0__temp2_input 43.000000
atk0110-0-0__temp2_max 45.000000
atk0110-0-0__temp2_crit 75.000000
k10temp-0-c3__temp1_input 31.500000
k10temp-0-c3__temp1_max 70.000000
k10temp-0-c3__temp1_crit 90.000000
k10temp-0-c3__temp1_crit_hyst 88.000000
radeon-0-400__temp1_input 51.000000

It's all lm_sensors seem to know about hw in a simple key-value form.

Still not keen on running that on every conky tick, hence the lua cache:

sensors = {
  values=nil,
  cmd="sens",
  ts_read_i=120, ts_read=0,
}

function conky_sens_read(name, precision)
  local ts = os.time()
  if os.difftime(ts, sensors.ts_read) > sensors.ts_read_i then
    local sh = io.popen(sensors.cmd, 'r')
    sensors.values = {}
    for p in string.gmatch(sh:read('*a'), '(%S+ %S+)\n') do
      local n = string.find(p, ' ')
      sensors.values[string.sub(p, 0, n-1)] = string.sub(p, n)
    end
    sh:close()
    sensors.ts_read = ts
  end

  if sensors.values[name] then
    local fmt = string.format('%%.%sf', precision or 0)
    return string.format(fmt, sensors.values[name])
  end
  return ''
end

Which can run the actual "sens" command every 120s, which is perfectly fine with me, since I don't consider conky to be an "early warning" system, and more of an "have an idea of what's the norm here" one.

Config-wise, it'd be just cpu temp: ${lua sens_read atk0110-0-0__temp1_input}C, or a more fancy template version with a flashing warning and hidden for missing sensors:

template3 ${color lightgrey}${if_empty ${lua sens_read \2}}${else}\
${if_match ${lua sens_read \2} > \3}${color red}\1: ${lua sens_read \2}C${blink !!!}\
${else}\1: ${color}${lua sens_read \2}C${endif}${endif}

It can then be used simply as ${template3 cpu atk0110-0-0__temp1_input 60} or ${template3 gpu radeon-0-400__temp1_input 80}, with 60 and 80 being manually-specified thresholds beyond which indicator turns red and has blinking "!!!" to get more attention.

Overall result in my case is something like this:

conky sensors display

sens.c (plus Makefile with gcc -Wall -lsensors for it) and my conky config where it's utilized can be all found in de-setup repo on github (or my git mirror, ofc).

Jun 09, 2013

cjdns per-IP (i.e. per-peer) traffic accounting

I've been using Hyperboria darknet for about a month now, and after late influx of russian users there (after this article) got plently of peers, so node is forwarding a bit of network traffic.

Being a dorknet-proper, of course, you can't see what kind of traffic it is or to whom it goes (though cjdns doesn't have anonymity as a goal), but I thought it'd be nice to at least know when my internet lags due to someone launching DoS flood or abusing torrents.

Over the Internet (called "clearnet" here), cjdns peers using udp, but linux conntrack seem to be good enough to track these "connections" just as if they were stateful tcp flows.

Simple-ish traffic accounting on vanilla linux usually boils down to ulogd2, which can use packet-capturing interfaces (raw sockets via libpcap, netfilter ULOG and NFLOG targets), but it's kinda heavy-handed here - traffic is opaque, only endpoints matter, so another one of its interfaces seem to be better option - conntrack tables/events.

Handy conntrack-tools (or /proc/net/{ip,nf}_conntrack) cat track all the connections, including simple udp-based ones (like cjdns uses), producing entries like:

udp 17 179 \
        src=110.133.5.117 dst=188.226.51.71 sport=52728 dport=8131 \
        src=188.226.51.71 dst=110.133.5.117 sport=8131 dport=52728 \
        [ASSURED] mark=16 use=1

First trick is to enable the packet/byte counters there, which is a simple, but default-off sysctl knob:

# sysctl -w net.netfilter.nf_conntrack_acct=1

That will add "bytes=" and "packets=" values there for both directions.

Of course, polling the table is a good way to introduce extra hangs into system (/proc files are basically hooks that tend to lock stuff to get consistent reads) and loose stuff in-between polls, so luckily there's an event-based netlink interface and ulogd2 daemon to monitor that.

One easy way to pick both incoming and outgoing udp flows in ulogd2 is to add connmarks to these:

-A INPUT -p udp --dport $cjdns_port -j CONNMARK --set-xmark 0x10/0x10
-A OUTPUT -p udp --sport $cjdns_port -j CONNMARK --set-xmark 0x10/0x10

Then setup filtering by these in ulogd.conf:

...

stack=log:NFCT,mark:MARK,ip2str:IP2STR,print:PRINTFLOW,out:GPRINT

[log]
accept_proto_filter=udp

[mark]
mark=0x10
mask=0x10

[out]
file="/var/log/ulogd2/cjdns.log"

Should produce parseable log of all the traffic flows with IPs and such.

Fairly simple script can then be used to push this data to graphite, munin, ganglia, cacti or whatever time-series graphing/processing tool. Linked script is for graphite "carbon" interface.

Update: obsoleted/superseded by cjdns "InterfaceController_peerStats" admin api function and graphite-metrics cjdns_peer_stats collector.

Apr 06, 2013

Fighting storage bitrot and decay

Everyone is probably aware that bits do flip here and there in the supposedly rock-solid, predictable and deterministic hardware, but somehow every single data-management layer assumes that it's not its responsibility to fix or even detect these flukes.

Bitrot in RAM is a known source of bugs, but short of ECC, dunno what one can do without huge impact on performance.

Disks, on the other hand, seem to have a lot of software layers above them, handling whatever data arrangement, compression, encryption, etc, and the fact that bits do flip in magnetic media seem to be just as well-known (study1, study2, study3, ...).
In fact, these very issues seem to be the main idea behind well known storage behemoth ZFS.
So it really bugged me for quite a while that any modern linux system seem to be completely oblivious to the issue.

Consider typical linux storage stack on a commodity hardware:

  • You have closed-box proprietary hdd brick at the bottom, with no way to tell what it does to protect your data - aside from vendor marketing pitches, that is.

  • Then you have well-tested and robust linux driver for some ICH storage controller.

    I wouldn't bet that it will corrupt anything at this point, but it doesn't do much else to the data but pass around whatever it gets from the flaky device either.

  • Linux blkdev layer above, presenting /dev/sdX. No checks, just simple mapping.

  • device-mapper.

    Here things get more interesting.

    I tend to use lvm wherever possible, but it's just a convenience layer (or a set of nice tools to setup mappings) on top of dm, no checks of any kind, but at least it doesn't make things much worse either - lvm metadata is fairly redundant and easy to backup/recover.

    dm-crypt gives no noticeable performance overhead, exists either above or under lvm in the stack, and is nice hygiene against accidental leaks (selling or leasing hw, theft, bugs, etc), but lacking authenticated encryption modes it doesn't do anything to detect bit-flips.
    Worse, it amplifies the issue.
    In the most common CBC mode one flipped bit in the ciphertext will affect a few other bits of data until the end of the dm block.
    Current dm-crypt default (since the latest cryptsetup-1.6.X, iirc) is XTS block encryption mode, which somewhat limits the damage, but dm-crypt has little support for changing modes on-the-fly, so tough luck.
    But hey, there is dm-verity, which sounds like exactly what I want, except it's read-only, damn.
    Read-only nature is heavily ingrained in its "hash tree" model of integrity protection - it is hashes-of-hashes all the way up to the root hash, which you specify on mount, immutable by design.

    Block-layer integrity protection is a bit weird anyway - lots of unnecessary work potential there with free space (can probably be somewhat solved by TRIM), data that's already journaled/checksummed by fs and just plain transient block changes which aren't exposed for long and one might not care about at all.

  • Filesystem layer above does the right thing sometimes.

    COW fs'es like btrfs and zfs have checksums and scrubbing, so seem to be a good options.
    btrfs was slow as hell on rotating plates last time I checked, but zfs port might be worth a try, though if a single cow fs works fine on all kinds of scenarios where I use ext4 (mid-sized files), xfs (glusterfs backend) and reiserfs (hard-linked backups, caches, tiny-file sub trees), then I'd really be amazed.

    Other fs'es plain suck at this. No care for that sort of thing at all.

  • Above-fs syscall-hooks kernel layers.

    IMA/EVM sound great, but are also for immutable security ("integrity") purposes ;(

    In fact, this layer is heavily populated by security stuff like LSM's, which I can't imagine being sanely used for bitrot-detection purposes.
    Security tools are generally oriented towards detecting any changes, intentional tampering included, and are bound to produce a lot of false-positives instead of legitimate and actionable alerts.

    Plus, upon detecting some sort of failure, these tools generally don't care about the data anymore acting as a Denial-of-Service attack on you, which is survivable (everything can be circumvented), but fighting your own tools doesn't sound too great.

  • Userspace.

    There is tripwire, but it's also a security tool, unsuitable for the task.

    Some rare discussions of the problem pop up here and there, but alas, I failed to salvage anything useable from these, aside from ideas and links to subject-relevant papers.

Scanning github, bitbucket and xmpp popped up bitrot script and a proof-of-concept md-checksums md layer, which apparently haven't even made it to lkml.

So, naturally, following long-standing "... then do it yourself" motto, introducing fs-bitrot-scrubber tool for all the scrubbing needs.

It should be fairly well-described in the readme, but the gist is that it's just a simple userspace script to checksum file contents and check changes there over time, taking all the signs of legitimate file modifications and the fact that it isn't the only thing that needs i/o in the system into account.

Main goal is not to provide any sort of redundancy or backups, but rather notify of the issue before all the old backups (or some cluster-fs mirrors in my case) that can be used to fix it are rotated out of existance or overidden.

Don't suppose I'll see such decay phenomena often (if ever), but I don't like having the odds, especially with an easy "most cases" fix within grasp.

If I'd keep lot of important stuff compressed (think what will happen if a single bit is flipped in the middle of few-gigabytes .xz file) or naively (without storage specifics and corruption in mind) encrypted in cbc mode (or something else to the same effect), I'd be worried about the issue so much more.

Wish there'd be something common out-of-the-box in the linux world, but I guess it's just not the time yet (hell, there's not even one clear term in the techie slang for it!) - with still increasing hdd storage sizes and much more vulnerable ssd's, some more low-level solution should materialize eventually.

Here's me hoping to raise awareness, if only by a tiny bit.

github project link

Feb 08, 2013

Headless Skype to IRC gateway part 4 - skyped bikeshed

As suspected before, ended up rewriting skyped glue daemon.

There were just way too many bad practices (from my point of view) accumulated there (incomplete list can be found in the github issue #7, as well as some PRs I've submitted), and I'm quite puzzled why the thing actually works, given quite weird socket handling going on there, but one thing should be said: it's there and it works.
As software goes, that's the most important metric by far.

But as I'm currently purely a remote worker (not sure if I qualify for "freelancer", being just a drone), and skype is being quite critical for comms in this field, just working thing that silently drops errors and messages is not good enough.

Rewritten version is a generic eventloop with non-blocking sockets and standard handle_in/handle_out low-level recv/send/buffer handlers, with handle_<event> and dispatch_<event> callbacks on higher level and explicit conn_state var.
It also features full-fledged and configurable python logging, with debug options, (at least) warnings emitted on every unexpected event and proper non-broad exception handling.

Regardless of whether the thing will be useful upstream, it should finally put a final dot into skype setup story for me, as the whole setup seem to be robust and reliable enough for my purposes now.

Unless vmiklos will find it useful enough to merge, I'll probably maintain the script in this bitlbee fork, rebasing it on top of stable upstream bitlbee.

Feb 28, 2012

Late adventures with time-series data collection and representation

When something is wrong and you look at the system, most often you'll see that... well, it works. There's some cpu, disk, ram usage, some number of requests per second on different services, some stuff piling up, something in short supply here and there...

And there's just no way of telling what's wrong without answers to the questions like "so, what's the usual load average here?", "is the disk always loaded with requests 80% of time?", "is it much more requests than usual?", etc, otherwise you might be off to some wild chase just to find out that load has always been that high, or solve the mystery of some unoptimized code that's been there for ages, without doing anything about the problem in question.

Historical data is the answer, and having used rrdtool with stuff like (customized) cacti and snmpd (with some my hacks on top) in the past, I was overjoyed when I stumbled upon a graphite project at some point.

From then on, I strived to collect as much metrics as possible, to be able to look at history of anything I want (and lots of values can be a reasonable symptom for the actual problem), without any kind of limitations.
carbon-cache does magic by batching writes and carbon-aggregator does a great job at relieving you of having to push aggregate metrics along with a granular ones or sum all these on graphs.

Initially, I started using it with just collectd (and still using it), but there's a need for something to convert metric names to a graphite hierarcy.

After looking over quite a few solutions to collecd-carbon bridge, decided to use bucky, with a few fixes of my own and quite large translation config.

Bucky can work anywhere, just receiving data from collectd network plugin, understands collectd types and properly translates counter increments to N/s rates. It also includes statsd daemon, which is brilliant at handling data from non-collector daemons and scripts and more powerful metricsd implementation.
Downside is that it's only maintained in forks, has bugs in less-used code (like metricsd), quite resource-hungry (but can be easily scaled-out) and there's kinda-official collectd-carbon plugin now (although I found it buggy as well, not to mention much less featureful, but hopefully that'll be addressed in future collectd versions).

Some of the problems I've noticed with such collectd setup:

  • Disk I/O metrics are godawful or just doesn't work - collected metrics of read/write either for processes of device are either zeroes, have weird values detached from reality (judging by actual problems and tools like atop and sysstat provide) or just useless.
  • Lots of metrics for network and memory (vmem, slab) and from various plugins have naming, inconsistent with linux /proc or documentation names.
  • Some useful metrics that are in, say, sysstat doesn't seem to work with collectd, like sensor data, nfsv4, some paging and socket counters.
  • Some metrics need non-trivial post-processing to be useful - disk utilization % time is one good example.
  • Python plugins leak memory on every returned value. Some plugins (ping, for example) make collectd segfault several times a day.
  • One of the most useful info is the metrics from per-service cgroup hierarchies, created by systemd - there you can compare resource usage of various user-space components, totally pinpointing exactly what caused the spikes on all the other graphs at some time.
  • Second most useful info by far is produced from logs and while collectd has a damn powerful tail plugin, I still found it to be too limited or just too complicated to use, while simple log-tailing code does the better job and is actually simplier due to more powerful language than collectd configuration. Same problem with table plugin and /proc.
  • There's still a need for lagre post-processing chunk of code and pushing the values to carbon.
Of course, I wanted to add systemd cgroup metrics, some log values and missing (and just properly-named) /proc tables data, and initially I wrote a collectd plugin for that. It worked, leaked memory, occasionally crashed (with collectd itself), used some custom data types, had to have some metric-name post-processing code chunk in bucky...
Um, what the hell for, when sending metric value directly takes just "echo some.metric.name $val $(printf %(%s)T -1) >/dev/tcp/carbon_host/2003"?

So off with collectd for all the custom metrics.

Wrote a simple "while True: collect_and_send() && sleep(till_deadline);" loop in python, along with the cgroup data collectors (there are even proper "block io" and "syscall io" per-service values!), log tailer and sysstat data processor (mainly for disk and network metrics which have batshit-crazy values in collectd plugins).

Another interesting data-collection alternative I've explored recently is ganglia.
Redundant gmond collectors and aggregators, communicating efficiently over multicast are nice. It has support for python plugins, and is very easy to use - pulling data from gmond node network can be done with one telnet or nc command, and it's fairly comprehensible xml, not some binary protocol. Another nice feature is that it can re-publish values only on some significant changes (where you define what "significant" is), thus probably eliminating traffic for 90% of "still 0" updates.
But as I found out while trying to use it as a collectd replacement (forwarding data to graphite through amqp via custom scripts), there's a fatal flaw - gmond plugins can't handle dynamic number of values, so writing a plugin that collects metrics from systemd services' cgroups without knowing how many of these will be started in advance is just impossible.
Also it has no concept for timestamps of values - it only has "current" ones, making plugins like "sysstat data parser" impossible to implement as well.
collectd, in contrast, has no constraint on how many values plugin returns and has timestamps, but with limitations on how far backwards they are.

Pity, gmond looked like a nice, solid and resilent thing otherwise.

I still like the idea to pipe graphite metrics through AMQP (like rocksteady does), routing them there not only to graphite, but also to some proper threshold-monitoring daemon like shinken (basically nagios, but distributed and more powerful), with alerts, escalations, trending and flapping detection, etc, but most of the existing solutions all seem to use graphite and whisper directly, which seem kinda wasteful.

Looking forward, I'm actually deciding between replacing collectd completely for a few most basic metrics it now collects, pulling them from sysstat or just /proc directly or maybe integrating my collectors back into collectd as plugins, extending collectd-carbon as needed and using collectd threshold monitoring and matches/filters to generate and export events to nagios/shinken... somehow first option seem to be more effort-effective, even in the long run, but then maybe I should just work more with collectd upstream, not hack around it.

Member of The Internet Defense League