My blog_title

Jan 25, 2013

Static pelican blog

Ditched bloog engine here in favor of static pelican yesterday, and while I was able to remember about keeping legacy links working, pretty sure I forgot about guids on the feed, so apologies to anyone who might care. Guess it's pointless to fix these now.

All the entries can be found on github now in rst-format, though older ones might be a bit harder to read in the source, as they were mostly auto-converted by pandoc and I only checked if they're still rendered correctly to html.

As appengine also made me migrate from master-slave db replication to the shiny high-replication blobstore, I wonder if hosting static html here now counts as abuse...

posted on 2013-01-25 12:46 YEKT

social

Jan 21, 2013

PyParsing vs Yapps

As I've been decompiling dynamic E config in the past anyway, wanted to back it up to git repo along with the rest of them.

Quickly stumbled upon a problem though - while E doesn't really modify it without me making some conscious changes, it reorders (or at least eet produces such) sections and values there, making straight dump to git a bit more difficult.

Plus, I have a pet project to update background, and it also introduces transient changes, so some pre-git processing was in order.

e.cfg looks like this:

group "E_Config" struct {
  group "xkb.used_options" list {
    group "E_Config_XKB_Option" struct {
      value "name" string: "grp:caps_toggle";
    }
  }
  group "xkb.used_layouts" list {
    group "E_Config_XKB_Layout" struct {
      value "name" string: "us";
...

Simple way to make it "canonical" is just to order groups/values there alphabetically, blanking-out some transient ones.

That needs a parser, and while regexps aren't really suited to that kind of thing, pyparsing should work:

number = pp.Regex(r'[+-]?\d+(\.\d+)?')
string = pp.QuotedString('"') | pp.QuotedString("'")
value_type = pp.Regex(r'\w+:')
group_type = pp.Regex(r'struct|list')

value = number | string
block_value = pp.Keyword('value')\
  + string + value_type + value + pp.Literal(';')

block = pp.Forward()
block_group = pp.Keyword('group') + string\
  + group_type + pp.Literal('{') + pp.OneOrMore(block) + pp.Literal('}')
block << (block_group | block_value)

config = pp.StringStart() + block + pp.StringEnd()

Fun fact: this parser doesn't work.

Bails with some error in the middle of the large (~8k lines) real-world config, while working for all the smaller pet samples.

I guess some buffer size must be tweaked (kinda unusual for python module though), maybe I made a mistake there, or something like that.

So, yapps2-based parser:

parser eet_cfg:
  ignore: r'[ \t\r\n]+'
  token END: r'$'
  token N: r'[+\-]?[\d.]+'
  token S: r'"([^"\\]*(\\.[^"\\]*)*)"'
  token VT: r'\w+:'
  token GT: r'struct|list'

  rule config: block END {{ return block }}

  rule block: block_group {{ return block_group }}
    | block_value {{ return block_value }}
  rule block_group:
    'group' S GT r'\{' {{ contents = list() }}
    ( block {{ contents.append(block) }} )*
    r'\}' {{ return Group(S, GT, contents) }}

  rule value: S {{ return S }} | N {{ return N }}
  rule block_value: 'value' S VT value ';' {{ return Value(S, VT, value) }}

Less verbose (even with more processing logic here) and works.

Embedded in a python code (doing the actual sorting), it all looks like this (might be useful to work with E configs, btw).

yapps2 actually generates quite readable code from it, and it was just simpler (and apparently more bugproof) to write grammar rules in it.

ymmv, but it's a bit of a shame that pyparsing seem to be the about the only developed parser-generator of such kind for python these days.

Had to package yapps2 runtime to install it properly, applying some community patches (from debian package) in process and replacing some scary cli code from 2003. Here's a fork.

posted on 2013-01-21 04:15 YEKT

python

Jan 16, 2013

Migrating configuration / settings to E17 (enlightenment) 0.17.0 from older E versions

It's a documented feature that 0.17.0 release (even if late pre-release version was used before) throws existing configuration out of the window.

I'm not sure what warranted such a drastic usability bomb, but it's not actually as bad as it seems - like 95% of configuration (and 100% of *important* parts of it) can be just re-used (even if you've already started new version!) with just a little bit of extra effort (thanks to ppurka in #e for pointing me in the right direction here).

Sooo wasn't looking forward to restore all the keyboard bindings, for one thing (that's why I actually did the update just one week ago or so).

E is a bit special (at least among wm's - fairly sure some de's do similar things as well) in that it keeps its settings on disk compiled and compressed (with eet) - but it's much easier to work with than it might sound like at first.

So, to get the bits of config migrated, one just has to pull the old (pre-zero) config out, then start zero-release e to generate new config, decompile both of these, pull compatible bits from old into the new one, then compile it and put back into "~/.e/e/config"

Before zero update, config can be found in "~/.e/e/config/standard/e.cfg"

If release version was started already and dropped the config, then old one should be "~/.e/e/config/standard/e.1.cfg" (or any different number instead of "1" there, just mentally substitute it in examples below).

Note that "standard" there is a profile name, if it might be called differently, check "~/.e/e/config/profile.cfg" (profile name should be readable there, or use "eet -x ~/.e/e/config/profile.cfg config").

"eet -d ~/.e/e/config/standard/e.cfg config" should produce perfectly readable version of the config to stdout.

Below is how I went about the whole process.

Make a repository to track changes (will help if the process might take more merge-test iterations than one):

% mkdir e_config_migration
% cd e_config_migration
% git init

Before zero update:

% cp ~/.e/e/config/standard/e.cfg e_pre_zero
% eet -d e_pre_zero config > e_pre_zero.cfg

Start E-release (wipes the config, produces a "default" new one there).

% cp ~/.e/e/config/standard/e.cfg e_zero
% eet -d e_zero config > e_zero.cfg
% git add e_*
% git commit -m "Initial pre/post configurations"
% emacs e_pre_zero.cfg e_zero.cfg

Then copy all the settings that were used in any way to e_zero.cfg.

I copied pretty much all the sections with relevant stuff, checking that the keys in them are the same - and they were, but I've used 0.17.0alpha8 before going for release, so if not, I'd just try "filling the blanks", or, failing that, just using old settings as a "what has to be setup through settings-panel" reference.

To be more specific - "xkb" options/layouts (have 2 of them setup), shelves/gadgets (didn't have these, and was lazy to click-remove existing ones), "remembers" (huge section, copied all of it, worked!), all "bindings" (pain to setup these).

After all these sections, there's a flat list of "value" things, which turned out to contain quite a lot of hard-to-find-in-menus parameters, so here's what I did:

copy that list (~200 lines) from old config to some file - say, "values.old", then from a new one to e.g. "values.new".
sort -u values.old > values.old.sorted; sort -u values.new > values.new.sorted
diff -uw values.{old,new}.sorted

Should show everything that might need to be changed in the new config with descriptive names and reveal all the genuinely new parameters.

Just don't touch "config_version" value, so E won't drop the resulting config.

After all the changes:

% eet -e e_zero config e_zero.cfg 1
% git commit -a -m Merged-1
% cp e_zero ~/.e/e/config/standard/e.cfg
% startx

New config worked for me for all the changes I've made - wasn't sure if I can copy *that* much from the start, but it turned out that almost no reconfiguration was necessary.

Caveat is, of course, that you should know what you're doing here, and be ready to handle issues / rollback, if any, that's why putting all these changes in git might be quite helpful.

posted on 2013-01-16 18:59 YEKT

desktop

Sep 16, 2012

Terms of Service - Didn't Read

Right now I was working on python-skydrive module and further integration of MS SkyDrive into tahoe-lafs as a cloud backend, to keep the stuff you really care about safe.

And even if you don't trust SkyDrive to keep stuff safe, you still have to register your app with these guys, especially if it's an open module, because "You are solely and entirely responsible for all uses of Live Connect occurring under your Client ID." and it's unlikely that a generic python interface author will vouch for all it's uses like that.

What do "register app" mean? Agreeing to yet another "Terms of Service", of course!

Do anyone ever reads these?

What the hell "You may only use the Live SDK and Live Connect APIs to create
software." sentence means there?
Did you know that "You are solely and entirely responsible for all uses of
Live Connect occurring under your Client ID." (and that's an app-id, given out
to the app developers, not users)?
How many more of such "interesting" stuff is there?

I hardly care enough to read, but there's an app for exactly that, and it's relatively well-known by now.

What might be not as well-known, is that there's now a campaign on IndieGoGo to keep the thing alive and make it better.

Please consider supporting the movement in any way, even just by spreading the word, right now, it's really one of the areas where filtering-out of all the legalese crap and noise is badly needed.

http://www.indiegogo.com/terms-of-service-didnt-read

posted on 2012-09-16 19:32 YEKT

policy documentation social

Aug 16, 2012

A new toy to play with - TI Launchpad with MSP430 MCU

A friend gave me this thing to play with (and eventually adapt to his purposes).

What's interesting here is that TI seem to give these things out for free.

Seriously, a box with a debug/programmer board and two microcontroller chips (which are basically your programmable computer with RAM, non-volatile flash memory, lots of interfaces, temp sensor, watchdog, etc that can be powered from 2 AA cells), to any part of the world with FedEx for a beer's worth - $5.

Guess it's time to put a computer into every doorknob indeed.

posted on 2012-08-16 09:02 YEKT

hardware

Aug 09, 2012

Unhosted remoteStorage idea

Having a bit of free time recently, worked a bit on feedjack web rss reader / aggregator project.

To keep track of what's already read and what's not, historically I've used js + client-side localStorage approach, which has quite a few advantages:

Works with multiple clients, i.e. everyone has it's own state.
Server doesn't have to store any data for possible-infinite number of clients, not even session or login data.
Same pages still can be served to all clients, some will just hide unwanted content.
Previous point leads to pages being very cache-friendly.
No need to "recognize" client in any way, which is commonly acheived with authentication.
No interation of "write" kind with the server means much less potential for abuse (DDoS, spam, other kinds of exploits).

Flip side of that rosy picture is that localStorage only works in one browser (or possibly several synced instances), which is quite a drag, because one advantage of a web-based reader is that it can be accessed from anywhere, not just single platform, where you might as well install specialized app.

To fix that unfortunate limitation, about a year ago I've added ad-hoc storage mechanism to just dump localStorage contents as json to some persistent storage on server, authenticated by special "magic" header from a browser.

It was never a public feature, requiring some browser tweaking and being a server admin, basically.

Recently, however, remoteStorage project from unhosted group has caught my attention.

Idea itself and the movement's goals are quite ambitious and otherwise awesome - to return to "decentralized web" idea, using simple already available mechanisms like webfinger for service discovery (reminds of Thimbl concept by telekommunisten.net), WebDAV for storage and OAuth2 for authorization (meaning no special per-service passwords or similar crap).

But the most interesting thing I've found about it is that it should be actually easier to use than write ad-hoc client syncer and server storage implementation - just put off-the-shelf remoteStorage.js to the page (it even includes "syncer" part to sync localStorage to remote server) and depoy or find any remoteStorage provider and you're all set.

In practice, it works as advertised, but will have quite significant changes soon (with the release of 0.7.0 js version) and had only ad-hoc proof-of-concept server implementation in python (though there's also ownCloud in php and node.js/ruby versions), so I wrote django-unhosted implementation, being basically a glue between simple WebDAV, oauth2app and Django Storage API (which has backends for everything).

Using that thing in feedjack now (here, for example) instead of that hacky json cache I've had with django-unhosted deployed on my server, allowing to also use it with all the apps with support out there.

Looks like a really neat way to provide some persistent storage for any webapp out there, guess that's one problem solved for any future webapps I might deploy that will need one.

With JS being able to even load and use binary blobs (like images) that way now, it becomes possible to write even unhosted facebook, with only events like status updates still aggregated and broadcasted through some central point.

I bet there's gotta be something similar, but with facebook, twitter or maybe github backends, but as proven in many cases, it's not quite sane to rely on these centralized platforms for any kind of service, which is especially a pain if implementation there is one-platform-specific, unlike one remoteStorage protocol for any of them.

Would be really great if they'd support some protocol like that at some point though.

But aside for short-term "problem solved" thing, it's really nice to see such movements out there, even though whole stack of market incentives (which heavily favors control over data, centralization and monopolies) is against them.

posted on 2012-08-09 06:09 YEKT

web p2p social

Jun 16, 2012

Proper(-ish) way to start long-running systemd service on udev event (device hotplug)

Update 2015-01-12: There's a follow-up post with a different way to do that, enabled by "systemd-escape" tool available in more recent systemd versions.

I use a smartcard token which requires long-running (while device is plugged) handler process to communicate with the chip.

Basically, udev has to start a daemon process when the device get plugged.

Until recently, udev didn't mind doing that via just RUN+="/path/to/binary ...", but in recent merged systemd-udevd versions this behavior was deprecated:

RUN
...
Starting daemons or other long running processes is not appropriate for
udev; the forked processes, detached or not, will be unconditionally killed
after the event handling has finished.

I think it's for the best - less accumulating cruft and unmanageable pids forked from udevd, but unfortunately it also breaks existing udev rule-files, the ones which use RUN+="..." to do just that.

One of the most obvious breakage for me was the smartcard failing, so decided to fix that. Documentation on the whole migration process is somewhat lacking (hence this post), even though docs on all the individual pieces are there (which are actually awesome).

Main doc here is systemd.device(5) for the reference on the udev attributes which systemd recognizes, and of course udev(7) for a generic syntax reference.

Also, there's this entry on Lennart's blog.

In my case, when device (usb smartcard token) get plugged, ifdhandler process should be started via openct-control (OpenCT sc middleware), which then creates unix socket through which openct libraries (used in turn by OpenSC PKCS#11 or PCSClite) can access the hardware.

So, basically I've had something like this (there are more rules for different hw, of course, but for the sake of clarity...):

SUBSYSTEM!="usb", GOTO="openct_rules_end"
ACTION!="add", GOTO="openct_rules_end"
PROGRAM="/bin/sleep 0.1"
...
SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device",\
  ENV{ID_VENDOR_ID}=="0529", ENV{ID_MODEL_ID}=="0600",\
  GROUP="usb",\
  RUN+="/usr/sbin/openct-control attach usb:$env{PRODUCT} usb $env{DEVNAME}"
...
LABEL="openct_rules_end"

Instead of RUN here, ENV{SYSTEMD_WANTS} can be used to start a properly-handled service, but note that some hardware parameters are passed from udev properties and in general systemd unit can't reference these.

I.e. if just ENV{SYSTEMD_WANTS}="openct-handler.service" (or more generic smartcard.target) is started, it won't know which device to pass to "openct-control attach" command.

One way might be storing these parameters in some dir, where they'll be picked by some path unit, a bit more hacky way would be scanning usb bus in the handler, and yet another one (which I decided to go along with) is to use systemd unit-file templating to pass these parameters.

openct-handler@.service:

[Unit]
Requires=openct.service

[Service]
Type=forking
GuessMainPID=false
ExecStart=/bin/sh -c "exec openct-control attach %I"

Note that it requires openct.service, which is basically does "openct-control init" once per boot to setup paths and whatnot:

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/sbin/openct-control init
ExecStop=/usr/sbin/openct-control shutdown

[Install]
WantedBy=multi-user.target

Another thing to note is that "sh" used in the handler.

It's intentional, because just %I will be passed by systemd as a single argument, while it should be three of them after "attach".

Finally, udev rules file for the device:

SUBSYSTEM=="usb", ACTION="add", ENV{DEVTYPE}=="usb_device", \
  ENV{ID_VENDOR_ID}=="0529", ENV{ID_MODEL_ID}=="0600", \
  GROUP="usb", TAG+="systemd", \
  ENV{SYSTEMD_WANTS}="openct-handler@\
    usb:$env{ID_VENDOR_ID}-$env{ID_MODEL_ID}-$env{ID_REVISION}\
    \x20usb\x20-dev-bus-usb-$env{BUSNUM}-$env{DEVNUM}.service"

(I highly doubt newline escaping in ENV{SYSTEMD_WANTS} above will work - added them just for readability, so pls strip these in your mind to a single line without spaces)

Systemd escaping in the rule above is described in systemd.unit(5) and produces a name - and start a service - like this one:

openct-handler@usb:0529-0600-0100\x20usb\x20-dev-bus-usb-002-003.service

Which then invokes:

sh -c "exec openct-control attach\
  usb:0529/0600/0100 usb /dev/bus/usb/002/003"

And it forks ifdhandler process, which works with smartcard from then on.

ifdhandler seem to be able to detect unplugging events and exits gracefully, but otherwise BindTo= unit directive can be used to stop the service when udev detects that device is unplugged.

Note that it might be more obvious to just do RUN+="systemctl start whatever.service", but it's a worse way to do it, because you don't bind that service to a device in any way, don't produce the "whatever.device" unit and there are lot of complications due to systemctl being a tool for the user, not the API proper.

posted on 2012-06-16 12:26 YEKT

sysadmin systemd udev

Feb 28, 2012

Late adventures with time-series data collection and representation

When something is wrong and you look at the system, most often you'll see that... well, it works. There's some cpu, disk, ram usage, some number of requests per second on different services, some stuff piling up, something in short supply here and there...

And there's just no way of telling what's wrong without answers to the questions like "so, what's the usual load average here?", "is the disk always loaded with requests 80% of time?", "is it much more requests than usual?", etc, otherwise you might be off to some wild chase just to find out that load has always been that high, or solve the mystery of some unoptimized code that's been there for ages, without doing anything about the problem in question.

Historical data is the answer, and having used rrdtool with stuff like (customized) cacti and snmpd (with some my hacks on top) in the past, I was overjoyed when I stumbled upon a graphite project at some point.

From then on, I strived to collect as much metrics as possible, to be able to look at history of anything I want (and lots of values can be a reasonable symptom for the actual problem), without any kind of limitations.

carbon-cache does magic by batching writes and carbon-aggregator does a great job at relieving you of having to push aggregate metrics along with a granular ones or sum all these on graphs.

Initially, I started using it with just collectd (and still using it), but there's a need for something to convert metric names to a graphite hierarcy.

After looking over quite a few solutions to collecd-carbon bridge, decided to use bucky, with a few fixes of my own and quite large translation config.

Bucky can work anywhere, just receiving data from collectd network plugin, understands collectd types and properly translates counter increments to N/s rates. It also includes statsd daemon, which is brilliant at handling data from non-collector daemons and scripts and more powerful metricsd implementation.

Downside is that it's only maintained in forks, has bugs in less-used code (like metricsd), quite resource-hungry (but can be easily scaled-out) and there's kinda-official collectd-carbon plugin now (although I found it buggy as well, not to mention much less featureful, but hopefully that'll be addressed in future collectd versions).

Some of the problems I've noticed with such collectd setup:

Disk I/O metrics are godawful or just doesn't work - collected metrics of read/write either for processes of device are either zeroes, have weird values detached from reality (judging by actual problems and tools like atop and sysstat provide) or just useless.
Lots of metrics for network and memory (vmem, slab) and from various plugins have naming, inconsistent with linux /proc or documentation names.
Some useful metrics that are in, say, sysstat doesn't seem to work with collectd, like sensor data, nfsv4, some paging and socket counters.
Some metrics need non-trivial post-processing to be useful - disk utilization % time is one good example.
Python plugins leak memory on every returned value. Some plugins (ping, for example) make collectd segfault several times a day.
One of the most useful info is the metrics from per-service cgroup hierarchies, created by systemd - there you can compare resource usage of various user-space components, totally pinpointing exactly what caused the spikes on all the other graphs at some time.
Second most useful info by far is produced from logs and while collectd has a damn powerful tail plugin, I still found it to be too limited or just too complicated to use, while simple log-tailing code does the better job and is actually simpler due to more powerful language than collectd configuration. Same problem with table plugin and /proc.
There's still a need for lagre post-processing chunk of code and pushing the values to carbon.

Of course, I wanted to add systemd cgroup metrics, some log values and missing (and just properly-named) /proc tables data, and initially I wrote a collectd plugin for that. It worked, leaked memory, occasionally crashed (with collectd itself), used some custom data types, had to have some metric-name post-processing code chunk in bucky...

Um, what the hell for, when sending metric value directly takes just "echo some.metric.name $val $(printf %(%s)T -1) >/dev/tcp/carbon_host/2003"?

So off with collectd for all the custom metrics.

Wrote a simple "while True: collect_and_send() && sleep(till_deadline);" loop in python, along with the cgroup data collectors (there are even proper "block io" and "syscall io" per-service values!), log tailer and sysstat data processor (mainly for disk and network metrics which have batshit-crazy values in collectd plugins).

Another interesting data-collection alternative I've explored recently is ganglia.

Redundant gmond collectors and aggregators, communicating efficiently over multicast are nice. It has support for python plugins, and is very easy to use - pulling data from gmond node network can be done with one telnet or nc command, and it's fairly comprehensible xml, not some binary protocol. Another nice feature is that it can re-publish values only on some significant changes (where you define what "significant" is), thus probably eliminating traffic for 90% of "still 0" updates.

But as I found out while trying to use it as a collectd replacement
(forwarding data to graphite through amqp via custom scripts), there's a fatal flaw -
gmond plugins can't handle dynamic number of values, so writing a plugin that
collects metrics from systemd services' cgroups without knowing how many of
these will be started in advance is just impossible.
Also it has no concept for timestamps of values - it only has "current" ones,
making plugins like "sysstat data parser" impossible to implement as well.
collectd, in contrast, has no constraint on how many values plugin returns and
has timestamps, but with limitations on how far backwards they are.

Pity, gmond looked like a nice, solid and resilent thing otherwise.

I still like the idea to pipe graphite metrics through AMQP (like rocksteady does), routing them there not only to graphite, but also to some proper threshold-monitoring daemon like shinken (basically nagios, but distributed and more powerful), with alerts, escalations, trending and flapping detection, etc, but most of the existing solutions all seem to use graphite and whisper directly, which seem kinda wasteful.

Looking forward, I'm actually deciding between replacing collectd completely for a few most basic metrics it now collects, pulling them from sysstat or just /proc directly or maybe integrating my collectors back into collectd as plugins, extending collectd-carbon as needed and using collectd threshold monitoring and matches/filters to generate and export events to nagios/shinken... somehow first option seem to be more effort-effective, even in the long run, but then maybe I should just work more with collectd upstream, not hack around it.

posted on 2012-02-28 09:16 YEKT

monitoring sysadmin notification python unix

Feb 07, 2012

Phasing out fossil completely

Having used git excessively for the last few days decided to ditch fossil scm at last.

All the stuff will be in git and mirorred on the github (maybe later on bittbucket as well).

Will probably re-import meta stuff (issues, wikis) from there into the main tree, but still can't find nice-enough tool for that.

Closest thing seem to be Artemis, but it's for mercurial, so I'll probably need to port it to git first, shouldn't be too hard.

Also, I'm torn at this point between the thoughts along the lines "selection of modern DVCS spoil us" against "damn, why they there is no clear popular + works-for-everything thing", but it's probably normal, as I have (or had) similar thoughts about lot of technologies.

posted on 2012-02-07 07:15 YEKT

scm fossil

Feb 03, 2012

On github as well now

Following another hiatus from a day job, I finally have enough spare time to read some of the internets and do something about them.

For quite a while I had lots of quite small scripts and projects, which I kinda documented here (and on the site pages before that).

I always kept them in some kind of scm - be it system-wide repo for configuration files, ~/.cFG repo for DE and misc user configuration and ~/bin scripts, or ~/hatch repo I keep for misc stuff, but as their number grows, as well as the size and complexity, I think maybe some of this stuff deserves some kind of repo, maybe attention, and best-case scenario, will even be useful to someone but me.

So I thought to gradually push all this stuff out to github and/or bitbucket (still need to learn or at least look at hg for that!). github being the most obvious and easiest choice, just created a few repos there and started the migration. More to come.

Still don't really trust a silo like github to keep anything reliably (besides it lags like hell here, especially compared to local servers I'm kinda used to), so need to devise some mirroring scheme asap.

Initial idea is to take some flexible tool (hg seem to be ideal, being python and scm proper) and build a hooks into local repos to push stuff out to mirrors from there, ideally both bitbucket and github, also exploiting their metadata APIs to fetch stuff like tickets/issues and commit history of these into separate repo branch as well.

Effort should be somewhat justified by the fact that such repos will be geo-distributed backups, shareable links and I can learn more SCM internals by the way.

For now - me on github.

posted on 2012-02-03 20:57 YEKT

scm web social

← Previous Next → Page 12 of 17