May 14, 2017

ssh reverse tunnel ("ssh -R") caveats and tricks

"ssh -R" a is kinda obvious way to setup reverse access tunnel from something remote that one'd need to access, e.g. raspberry pi booted from supplied img file somewhere behind the router on the other side of the world.

Being part of OpenSSH, it's available on any base linux system, and trivial to automate on startup via loop in a shell script, crontab or a systemd unit, e.g.:

[Unit]
Wants=network.service
After=network.service

[Service]
Type=simple
User=ssh-reverse-access-tunnel
Restart=always
RestartSec=10
ExecStart=/usr/bin/ssh -oControlPath=none -oControlMaster=no \
  -oServerAliveInterval=6 -oServerAliveCountMax=10 -oConnectTimeout=180 \
  -oPasswordAuthentication=no -oNumberOfPasswordPrompts=0 \
  -NnT -R "1234:localhost:22" tun-user@tun-host

[Install]
WantedBy=multi-user.target

On the other side, ideally in a dedicated container or VM, there'll be sshd "tun-user" with an access like this (as a single line):

command="echo >&2 'No shell access!'; exit 1",no-port-forwarding,
  no-X11-forwarding,no-agent-forwarding,no-pty ssh-ed25519 ...

Or have sshd_config section with same restrictions and only keys in authorized_keys, e.g.:

Match User tun-*
 # GatewayPorts yes
 AllowTcpForwarding no
 X11Forwarding no
 AllowAgentForwarding no
 PermitTTY no
 ForceCommand echo 'no shell access!'; exit 1

And that's it, right?

No additional stuff needed, "ssh -R" will connect reliably on boot, keep restarting and reconnecting in case of any errors, even with keepalives to detect dead connections and restart asap.

Nope!

There's a bunch of common pitfalls listed below.

  • Problem 1:

    When device with a tunnel suddenly dies for whatever reason - power or network issues, kernel panic, stray "kill -9" or what have you - connection on sshd machine will hang around for a while, as keepalive options are only used by the client.

    Along with (dead) connection, listening port will stay open as well, and "ssh -R" from e.g. power-cycled device will not be able to bind it, and that client won't treat it as a fatal error either!

    Result: reverse-tunnels don't survive any kind of non-clean reconnects.

    Fix:

    • TCPKeepAlive in sshd_config - to detect dead connections there faster, though probably still not fast enough for e.g. emergency reboot.
    • Detect and kill sshd pids without listening socket, forcing "ssh -R" to reconnect until it can actually bind one.
    • If TCPKeepAlive is not good or reliable enough, kill all sshd pids associated with listening sockets that don't produce the usual "SSH-2.0-OpenSSH_7.4" greeting line.
  • Problem 2:

    Running sshd on any reasonably modern linux, you get systemd session for each connection, and killing sshd pids as suggested above will leave logind sessions from these, potentially creating hundreds or thousands of them over time.

    Solution:

    • "UsePAM no" to disable pam_systemd.so along with the rest of the PAM.
    • Dedicated PAM setup for ssh tunnel logins on this dedicated system, not using pam_systemd.
    • Occasional cleanup via loginctl list-sessions/terminate-session for ones that are in "closing"/"abandoned" state.

    Killing sshd pids might be hard to avoid on fast non-clean reconnect, since reconnected "ssh -R" will hang around without a listening port forever, as mentioned.

  • Problem 3:

    If these tunnels are not configured on per-system basis, but shipped in some img file to use with multiple devices, they'll all try to bind same listening port for reverse-tunnels, so only one of these will work.

    Fixes:

    • More complex script to generate listening port for "ssh -R" based on machine id, i.e. serial, MAC, local IP address, etc.

    • Get free port to bind to out-of-band from the server somehow.

      Can be through same ssh connection, by checking ss/netstat output or /proc/net/tcp there, if commands are allowed there (probably a bad idea for random remote devices).

  • Problem 4:

    Device identification in the same "mutliple devices" scenario.

    I.e. when someone sets up 5 RPi boards on the other end, how to tell which tunnel leads to each specific board?

    Can usually be solved by:

    • Knowing/checking quirks specific to each board, like dhcp hostname, IP address, connected hardware, stored files, power-on/off timing, etc.
    • Blinking LEDs via /sys/class/leds, ethtool --identify or GPIO pins.
    • Output on connected display - just "wall" some unique number (e.g. reverse-tunnel port) or put it to whatever graphical desktop.
  • Problem 5:

    Round-trip through some third-party VPS can add significant console lag, making it rather hard to use.

    More general problem than with just "ssh -R", but when doing e.g. "EU -> US -> RU" trip and back, console becomes quite unusable without something like mosh, which can't be used over that forwarded tcp port anyway!

    Kinda defeats the purpose of the whole thing, though laggy console (with an option to upgrade it, once connected) is still better than nothing.

Not an exhaustive or universally applicable list, of course, but hopefully shows that it's actually more hassle than "just run ssh -R on boot" to have something robust here.

So choice of ubiquitous / out-of-the-box "ssh -R" over installing some dedicated tunneling thing like OpenVPN is not as clear-cut in favor of the former as it would seem, taking all such quirks (handled well by dedicated tunneling apps) into consideration.

As I've bumped into all of these by now, addressed them by:

  • ssh-tunnels-cleanup script to (optionally) do three things, in order:

    • Find/kill sshd pids without associated listening socket ("ssh -R" that re-connected quickly and couldn't bind one).
    • Probe all sshd listening sockets with nmap and make sure there's an "SSH-2.0-..." banner there, otherwise kill.
    • Cleanup all dead loginctl sessions, if any.

    Only affects/checks sshd pids for specific user prefix (e.g. "tun-"), to avoid touching anything but dedicated tunnels.

  • ssh-reverse-mux-server / ssh-reverse-mux-client scripts.

    For listening port negotiation with ssh server, using bunch of (authenticated) UDP packets.

    Essentially a wrapper for "ssh -R" on the client, to also pass all the required options, replacing ExecStart= line in above systemd example with e.g.:

    ExecStart=/usr/local/bin/ssh-reverse-mux-client \
      --mux-port=2200 --ident-rpi -s pkt-mac-key.aGPwhpya tun-user@tun-host
    

    ssh-reverse-mux-server on the other side will keep .db file of --ident strings (--ident-rpi uses hash of RPi board serial from /proc/cpuinfo) and allocate persistent port number (from specified range) to each one, which client will use with actual ssh command.

    Simple symmetric key (not very secret) is used to put MAC into packets and ignore any noise traffic on either side that way.

    https://github.com/mk-fg/fgtk#ssh-reverse-mux

  • Hook in ssh-reverse-mux-client above to blink bits of allocated port on some available LED.

    Sample script to do the morse-code-like bit-blinking:

    And additional hook option for command above:

    ... -c 'sudo -n led-blink-arg -f -l led0 -n 2/4-2200'
    

    (with last arg-num / bits - decrement spec there to blink only last 4 bits of the second passed argument, which is listening port, e.g. "1011" for "2213")

Given how much OpenSSH does already, having all this functionality there (or even some hooks for that) would probably be too much to ask.

...at least until it gets rewritten as some systemd-accessd component :P

Feb 06, 2017

nftables dnat from loopback to somewhere else

Honestly didn't think NAT'ing traffic from "lo" interface was even possible, because traffic to host's own IP doesn't go through *ROUTING chains with iptables, and never used "-j DNAT" with OUTPUT, which apparently works there as well.

And then also, according to e.g. Netfilter-packet-flow.svg, unlike with nat-prerouting, nat-output goes after routing decision was made, so no point mangling IPs there, right?

Wrong, totally possible to redirect "OUT=lo" stuff to go out of e.g. "eth0" with the usual dnat/snat, with something like this:

table ip nat {
  chain in { type nat hook input priority -160; }
  chain out { type nat hook output priority -160; }
  chain pre { type nat hook prerouting priority -90; }
  chain post { type nat hook postrouting priority 110; }
}

add rule ip nat out oifname lo \
  ip saddr $own-ip ip daddr $own-ip \
  tcp dport {80, 443} dnat $somehost
add rule ip nat post oifname eth0 \
  ip saddr $own-ip ip daddr $somehost \
  tcp dport {80, 443} masquerade

Note the bizarre oifname lo ip saddr $own-ip ip daddr $own-ip thing.

One weird quirk - if "in" (arbitrary name, nat+input hook is the important bit) chain isn't defined, dnat will only work one-way, not rewriting IPs in response packets.

One explaination wrt routing decision here might be arbitrary priorities that nftables allows to set for hooks (and -160 is before iptables mangle stuff).

So, from-loopback-and-back forwarding, huh.
To think of all the redundant socats and haproxies I've seen and used for this purpose earlier...

Jan 29, 2017

Proxying ssh user connections to gitolite host transparently

Recently bumped into apparently not well-supported scenario of accessing gitolite instance transparently on a host that is only accessible through some other gateway (often called "bastion" in ssh context) host.

Something like this:

+---------------+
|               |   git@myhost.net:myrepo
|  dev-machine  ---------------------------+
|               |                          |
+---------------+                          |
                              +------------v------+
      git@gitolite:myrepo     |                   |
  +----------------------------  myhost.net (gw)  |
  |                           |                   |
+-v-------------------+       +-------------------+
|                     |
|    gitolite (gl)    |
|  host/container/vm  |
|                     |
+---------------------+

Here gitolite instance might be running on a separate machine, or on the same "myhost.net", but inside a container or vm with separate sshd daemon.

From any dev-machine you want to simply use git@myhost.net:myrepo to access repositories, but naturally that won't work because in normal configuration you'd hit sshd on gw host (myhost.net) and not on gl host.

There are quite a few common options to work around this:

  • Use separate public host/IP for gitolite, e.g. git.myhost.net (!= myhost.net).

  • TCP port forwarding or similar tricks.

    E.g. simply forward ssh port connections in a "gw:22 -> gl:22" fashion, and have gw-specific sshd listen on some other port, if necessary.

    This can be fairly easy to use with something like this for odd-port sshd in ~/.ssh/config:

    Host myhost.net
      Port 1234
    Host git.myhost.net
      Port 1235
    

    Can also be configured in git via remote urls like ssh://git@myhost.net:1235/myrepo.

  • Use ssh port forwarding to essentially do same thing as above, but with resulting git port accessible on localhost.

  • Configure ssh to use ProxyCommand, which will login to gw host and setup forwarding through it.

All of these, while suitable for some scenarios, are still nowhere near what I'd call "transparent", and require some additional configuration for each git client beyond just git add remote origin git@myhost.net:myrepo.

One advantage of such lower-level forwarding is that ssh authentication to gitolite is only handled on gitolite host, gw host has no clue about that.

If dropping this is not a big deal (e.g. because gw host has root access to everything in gl container anyway), there is a rather easy way to forward only git@myhost.net connections from gw to gl host, authenticating them only on gw instead, described below.


Gitolite works by building ~/.ssh/authorized_keys file with essentially command="gitolite-shell gl-key-id" <gl-key> for each public key pushed to gitolite-admin repository.

Hence to proxy connections from gw, similar key-list should be available there, with key-commands ssh'ing into gitolite user/host and running above commands there (with original git commands also passed through SSH_ORIGINAL_COMMAND env-var).

To keep such list up-to-date, post-update trigger/hook for gitolite-admin repo is needed, which can use same git@gw login (with special "gl auth admin" key) to update key-list on gw host.

Steps to implement/deploy whole thing:

  • useradd -m git on gw and run ssh-keygen -t ed25519 on both gw and gl hosts for git/gitolite user.

  • Setup all connections for git@gw to be processed via single "gitolite proxy" command, disallowing anything else, exactly like gitolite does for its users on gl host.

    gitolite-proxy.py script (python3) that I came up with for this purpose can be found here: https://github.com/mk-fg/gitolite-ssh-user-proxy/

    It's rather simple and does two things:

    • When run with --auth-update argument, receives gitolite authorized_keys list, and builds local ~/.ssh/authorized_keys from it and authorized_keys.base file.

    • Similar to gitolite-shell, when run as gitolite-proxy key-id, ssh'es into gl host, passing key-id and git command to it.

      This is done in a straightforward os.execlp('ssh', 'ssh', '-qT', ...) manner, no extra processing or any error-prone stuff like that.

    When installing it (to e.g. /usr/local/bin/gitolite-proxy as used below), be sure to set/update "gl_host_login = ..." line at the top there.

    For --auth-update, ~/.ssh/authorized_keys.base (note .base) file on gw should have this single line (split over two lines for readability, must be all on one line for ssh!):

    command="/usr/local/bin/gitolite-proxy --auth-update",no-port-forwarding
      ,no-X11-forwarding,no-agent-forwarding,no-pty ssh-ed25519 AAA...4u3FI git@gl
    

    Here ssh-ed25519 AAA...4u3FI git@gl is the key from ~git/.ssh/id_ed25519.pub on gitolite host.

    Also run:

    # install -m0600 -o git -g git ~git/.ssh/authorized_keys{.base,}
    # install -m0600 -o git -g git ~git/.ssh/authorized_keys{.base,.old}
    
    To have initial auth-file, not yet populated with gitolite-specific keys/commands.
    Note that only these two files need to be writable for git user on gw host.
  • From gitolite (gl) host and user, run: ssh -qT git@gw < ~/.ssh/authorized_keys

    This is to test gitolite-proxy setup above - should populate ~git/.ssh/authorized_keys on gw host and print back gw host key and proxy script to run as command="..." for it (ignore them, will be installed by trigger).

  • Add trigger that'd run after gitolite-admin repository updates on gl host.

    • On gl host, put this to ~git/.gitolite.rc right before ENABLE line:

      LOCAL_CODE => "$rc{GL_ADMIN_BASE}/local",
      POST_COMPILE => ['push-authkeys'],
      
    • Commit/push push-authkeys trigger script (also from gitolite-ssh-user-proxy repo) to gitolite-admin repo as local/triggers/push-authkeys, updating gw_proxy_login line in there.

    gitolite docs on adding triggers: http://gitolite.com/gitolite/gitolite.html#triggers

Once proxy-command is in place on gw and gitolite-admin hook runs at least once (to setup gw->gl access and proxy-command), git@gw (git@myhost.net) ssh login spec can be used in exactly same way as git@gl.

That is, fully transparent access to gitolite on a different host through that one user, while otherwise allowing to use sshd on a gw host, without any forwarding tricks necessary for git clients.

Whole project, with maybe a bit more refined process description and/or whatever fixes can be found on github here: https://github.com/mk-fg/gitolite-ssh-user-proxy/

Huge thanks to sitaramc (gitolite author) for suggesting how to best setup gitolite triggers for this purpose on the ML.

Oct 16, 2016

Redirecting hosts or replacing files just for one app with mount namespaces

My problem was this: how do you do essentially a split-horizon DNS for different apps in the same desktop session.

E.g. have claws-mail mail client go to localhost for someserver.com (because it has port forwarded thru "ssh -L"), while the rest of them (e.g. browser and such) keep using normal public IP.

Usually one'd use /etc/hosts for something like that, but it applies to all apps on the machine, of course.

Next obvious option (mostly because it's been around forever) is to LD_PRELOAD something that'd either override getaddrinfo() or open() for /etc/hosts, but that sounds like work and not included in util-linux (yet?).

Easiest and newest (well, new-ish, CLONE_NEWNS has been around since linux-3.8 and 2013) way to do that is to run the thing in its own "mount namespace", which sounds weird until you combine that with the fact that you can bind-mount files (like that /etc/hosts one).

So, the magic line is:

# unshare -m sh -c\
  'mount -o bind /etc/hosts.forwarding /etc/hosts
    && exec sudo -EHin -u myuser -- exec claws-mail'

Needs /etc/hosts.forwarding replacement-file for this app, which it will see as a proper /etc/hosts, along with root privileges (or CAP_SYS_ADMIN) for CLONE_NEWNS.

Crazy "sudo -EHin" shebang is to tell sudo not to drop much env, but still behave kinda as if on login, run zshrc and all that. "su - myuser" or "machinectl shell myuser@ -- ..." can also be used there.

Replacing files like /etc/nsswitch.conf or /etc/{passwd,group} that way, one can also essentially do any kind of per-app id-mapping - cool stuff.

Of course, these days sufficiently paranoid or advanced people might as well run every app in its own set of namespaces anyway, and have pretty much everything per-app that way, why the hell not.

Sep 25, 2016

nftables re-injected IPSec matching without xt_policy

As of linux-4.8, something like xt_policy is still - unfortunately - on the nftables TODO list, so to match traffic pre-authenticated via IPSec, some workaround is needed.

Obvious one is to keep using iptables/ip6tables to mark IPSec packets with old xt_policy module, as these rules interoperate with nftables just fine, with only important bit being ordering of iptables hooks vs nft chain priorities, which are rather easy to find in "netfilter_ipv{4,6}.h" files, e.g.:

enum nf_ip_hook_priorities {
  NF_IP_PRI_FIRST = INT_MIN,
  NF_IP_PRI_CONNTRACK_DEFRAG = -400,
  NF_IP_PRI_RAW = -300,
  NF_IP_PRI_SELINUX_FIRST = -225,
  NF_IP_PRI_CONNTRACK = -200,
  NF_IP_PRI_MANGLE = -150,
  NF_IP_PRI_NAT_DST = -100,
  NF_IP_PRI_FILTER = 0,
  NF_IP_PRI_SECURITY = 50,
  NF_IP_PRI_NAT_SRC = 100,
  NF_IP_PRI_SELINUX_LAST = 225,
  NF_IP_PRI_CONNTRACK_HELPER = 300,
  NF_IP_PRI_CONNTRACK_CONFIRM = INT_MAX,
  NF_IP_PRI_LAST = INT_MAX,
};

(see also Netfilter-packet-flow.svg by Jan Engelhardt for general overview of the iptables hook positions, nftables allows to define any number of chains before/after these)

So marks from iptables/ip6tables rules like these:

*raw
:PREROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A PREROUTING -m policy --dir in --pol ipsec --mode transport -j MARK --or-mark 0x101
-A OUTPUT -m policy --dir out --pol ipsec --mode transport -j MARK --or-mark 0x101
COMMIT

Will be easy to match in priority=0 input/ouput hooks (as NF_IP_PRI_RAW=-300) of nft ip/ip6/inet tables (e.g. mark and 0x101 == 0x101 accept)

But that'd split firewall configuration between iptables/nftables, adding more hassle to keep whole "iptables" thing initialized just for one or two rules.

xfrm transformation (like ipsec esp decryption in this case) seem to preserve all information about the packet intact, including packet marks (but not conntrack states, which track esp connection), which - as suggested by Florian Westphal in #netfilter - can be utilized to match post-xfrm packets in nftables by this preserved mark field.

E.g. having this (strictly before ct state {established, related} accept for stateful firewalls, as each packet has to be marked):

define cm.ipsec = 0x101
add rule inet filter input ip protocol esp mark set mark or $cm.ipsec
add rule inet filter input ip6 nexthdr esp mark set mark or $cm.ipsec
add rule inet filter input mark and $cm.ipsec == $cm.ipsec accept

Will mark and accept both still-encrypted esp packets (IPv4/IPv6) and their decrypted payload.

Note that this assumes that all IPSec connections are properly authenticated and trusted, so be sure not to use anything like that if e.g. opportunistic encryption is enabled.

Much simplier nft-only solution, though still not a full substitute for what xt_policy does, of couse.

Aug 31, 2016

Handy tool to wait for remote TCP port to open - TCP "ping"

Lack of some basic tool to "wait for connection" in linux toolkit always annoyed me to no end.

root@alarm~:~# reboot
Shared connection to 10.0.1.75 closed.

% ssh root@10.0.1.75

...time passes, ssh doesn't do anything...

ssh: connect to host 10.0.1.75 port 22: No route to host

% ssh root@10.0.1.75
ssh: connect to host 10.0.1.75 port 22: Connection refused
% ssh root@10.0.1.75
ssh: connect to host 10.0.1.75 port 22: Connection refused
% ssh root@10.0.1.75
ssh: connect to host 10.0.1.75 port 22: Connection refused

...[mashing Up/Enter] start it up already!...

% ssh root@10.0.1.75
ssh: connect to host 10.0.1.75 port 22: Connection refused
% ssh root@10.0.1.75

root@alarm~:~#

...finally!
Working a lot with ARM boards, can have this thing repeating few dozen times a day.
Same happens on every power-up, after fiddling with sd cards, etc.

And usually know for a fact that I'll want to reconnect to the thing in question asap and continue what I was doing there, but trying luck a few times with unresponsive or insta-failing ssh is rather counter-productive and just annoying.

Instead:

% tping 10.0.1.75 && ssh root@10.0.1.75
root@alarm~:~#

That's it, no ssh timing-out or not retrying fast enough, no "Connection refused" nonsense.

tping (code link, name is ping + fping + tcp ping) is a trivial ad-hoc script that opens new TCP connection to specified host/port every second (default for -r/--retry-delay) and polls connections for success/error/timeout (configurable) in-between, exiting as soon as first connection succeeds, which in example above means that sshd is now ready for sure.

Doesn't need extra privileges like icmp pingers do, simple no-deps python3 script.

Used fping as fping -qr20 10.0.1.75 && ssh root@10.0.1.75 before finally taking time to write that thing, but it does what it says on the tin - icmp ping, and usually results in "Connection refused" error from ssh, as there's gap between network and sshd starting.

One of these "why the hell it's not in coreutils or util-linux" tools for me now.

May 15, 2016

Debounce bogus repeated mouse clicks in Xorg with xbindkeys

My current Razer E-Blue mouse had this issue since I've got it - Mouse-2 / BTN_MIDDLE / middle-click (useful mostly as "open new tab" in browsers and "paste" in X) sometimes produces two click events (in rapid succession) instead of one.

It was more rare before, but lately it feels like it's harder to make it click once than twice.

Seem to be either hardware problem with debouncing circuitry or logic in the controller, or maybe a button itself not mashing switch contacts against each other hard enough... or soft enough (i.e. non-elastic), actually, given that they shouldn't "bounce" against each other.

Since there's no need to double-click that wheel-button ever, it looks rather easy to debounce the click on Xorg input level, by ignoring repeated button up/down events after producing the first full "click".

Easiest solution of that kind that I've found was to use guile (scheme) script with xbindkeys tool to keep that click-state data and perform clicks selectively, using xdotool:

(define razer-delay-min 0.2)
(define razer-wait-max 0.5)
(define razer-ts-start #f)
(define razer-ts-done #f)
(define razer-debug #f)

(define (mono-time)
  "Return monotonic timestamp in seconds as real."
  (+ 0.0 (/ (get-internal-real-time) internal-time-units-per-second)))

(xbindkey-function '("b:8") (lambda ()
  (let ((ts (mono-time)))
    (when
      ;; Enforce min ts diff between "done" and "start" of the next one
      (or (not razer-ts-done) (>= (- ts razer-ts-done) razer-delay-min))
      (set! razer-ts-start ts)))))

(xbindkey-function '(Release "b:8") (lambda ()
  (let ((ts (mono-time)))
    (when razer-debug
      (format #t "razer: ~a/~a delay=~a[~a] wait=~a[~a]\n"
        razer-ts-start razer-ts-done
        (and razer-ts-done (- ts razer-ts-done)) razer-delay-min
        (and razer-ts-start (- ts razer-ts-start)) razer-wait-max))
    (when
      (and
        ;; Enforce min ts diff between previous "done" and this one
        (or (not razer-ts-done) (>= (- ts razer-ts-done) razer-delay-min))
        ;; Enforce max "click" wait time
        (and razer-ts-start (<= (- ts razer-ts-start) razer-wait-max)))
      (set! razer-ts-done ts)
      (when razer-debug (format #t "razer: --- click!\n"))
      (run-command "xdotool click 2")))))

Note that xbindkeys actually grabs "b:8" here, which is a "mouse button 8", as if it was "b:2", then "xdotool click 2" command will recurse into same code, so wheel-clicker should be bound to button 8 in X for that to work.

Rebinding buttons in X is trivial to do on-the-fly, using standard "xinput" tool - e.g. xinput set-button-map "My Mouse" 1 8 3 (xinitrc.input script can be used as an extended example).

Running "xdotool" to do actual clicks at the end seem a bit wasteful, as xbindkeys already hooks into similar functionality, but unfortunately there's no "send input event" calls exported to guile scripts (as of 1.8.6, at least).

Still, works well enough as it is, fixing that rather annoying issue.

[xbindkeysrc.scm on github]

Dec 29, 2015

Tool to interleave and colorize lines from multiple log (or any other) files

There's multitail thing to tail multiple logs, potentially interleaved, in one curses window, which is a painful-to-impossible to browse through, as you'd do with simple "less".

There's lnav for parsing and normalizing a bunch of logs, and continuously monitoring these, also interactive.

There's rainbow to color specific lines based on regexp, which can't really do any interleaving.

And this has been bugging me for a while - there seem to be no easy way to get this:

interleaved_and_colorized_output_image

This is an interleaved output from several timestamped log files, for events happening at nearly the same time (which can be used to establish the sequence between these and correlate output of multiple tools/instances/etc), browsable via the usual "less" (or whatever other $PAGER) in an xterm window.

In this case, logfiles are from "btmon" (bluetooth sniffer tool), "bluetoothd" (bluez) debug output and an output from gdb attached to that bluetoothd pid (showing stuff described in previous entry about gdb).

Output for neither of these tools have timestamps by default, but this is easy to fix by piping it through any tool which would add them into every line, svlogd for example.

To be concrete (and to show one important thing about such log-from-output approach), here's how I got these particular logs:

# mkdir -p debug_logs/{gdb,bluetoothd,btmon}

# gdb -ex 'source gdb_device_c_ftrace.txt' -ex q --args\
        /usr/lib/bluetooth/bluetoothd --nodetach --debug\
        1> >(svlogd -r _ -ttt debug_logs/gdb)\
        2> >(svlogd -r _ -ttt debug_logs/bluetoothd)

# stdbuf -oL btmon |\
        svlogd -r _ -ttt debug_logs/btmon

Note that "btmon" runs via coreutils stdbuf tool, which can be critical for anything that writes to its stdout via libc's fwrite(), i.e. can have buffering enabled there, which causes stuff to be output delayed and in batches, not how it'd appear in the terminal (where line buffering is used), resulting in incorrect timestamps, unless stdbuf or any other option disabling such buffering is used.

With three separate logs from above snippet, natural thing you'd want is to see these all at the same time, so for each logical "event", there'd be output from btmon (network packet), bluetoothd (debug logging output) and gdb's function call traces.

It's easy to concatenate all three logs and sort them to get these interleaved, but then it can be visually hard to tell which line belongs to which file, especially if they are from several instances of the same app (not really the case here though).

Simple fix is to add per-file distinct color to each line of each log, but then you can't sort these, as color sequences get in the way, it's non-trivial to do even that, and it all adds-up to a script.

Seem to be hard to find any existing tools for the job, so wrote a script to do it - liac (in the usual mk-fg/fgtk github repo), which was used to produce output in the image above - that is, interleave lines (using any field for sorting btw), add tags for distinct ANSI colors to ones belonging to different files and optional prefixes.

Thought it might be useful to leave a note for anyone looking for something similar.

[script source link]

Dec 29, 2015

Getting log of all function calls from specific source file using gdb

Maybe I'm doing debugging wrong, but messing with code written by other people, first question for me is usually not "what happens in function X" (done by setting a breakpoint on it), but rather "which file/func do I look into".

I.e. having an observable effect - like "GET_REPORT messages get sent on HID level to bluetooth device, and replies are ignored", it's easy to guess that it's either linux kernel or bluetoothd - part of BlueZ.

Question then becomes "which calls in app happen at the time of this observable effect", and luckily there's an easy, but not very well-documented (unless my google is that bad) way to see it via gdb for C apps.

For scripts, it's way easier of course, e.g. in python you can do python -m trace ... and it can dump even every single line of code it runs.

First of all, app in question has to be compiled with "-g" option and not "stripped", of course, which should be easy to set via CFLAGS, usually, defining these in distro-specific ways if rebuilding a package to include that (e.g. for Arch - have debug !strip in OPTIONS line from /etc/makepkg.conf).

Then running app under gdb can be done via something like gdb --args someapp arg1 arg2 (and typing "r" there to start it), but if the goal is to get a log of all function calls (and not just in a "call graph" way profiles like gprof do) from a specific file, first - interactivity has to go, second - breakpoints have to be set for all these funcs and then logged when app runs.

Alas, there seem to be no way to add break point to every func in a file.

One common suggestion (does NOT work, don't copy-paste!) I've seen is doing rbreak device\.c: ("rbreak" is a regexp version of "break") to match e.g. profiles/input/device.c:extract_hid_record (as well as all other funcs there), which would be "filename:funcname" pattern in my case, but it doesn't work and shouldn't work, as "rbreak" only matches "filename".

So trivial script is needed to a) get list of funcs in a source file (just name is enough, as C has only one namespace), and b) put a breakpoint on all of them.

This is luckily quite easy to do via ctags, with this one-liner:

% ctags -x --c-kinds=fp profiles/input/device.c |
  awk 'BEGIN {print "set pagination off\nset height 0\nset logging on\n\n"}\
    {print "break", $1 "\ncommands\nbt 5\necho ------------------\\n\\n\nc\nend\n"}\
    END {print "\n\nrun"}' > gdb_device_c_ftrace.txt

Should generate a script for gdb, starting with "set pagination off" and whatever else is useful for logging, with "commands" block after every "break", running "bt 5" (displays backtrace) and echoing a nice-ish separator (bunch of hyphens), ending in "run" command to start the app.

Resulting script can/should be fed into gdb with something like this:

% gdb -ex 'source gdb_device_c_ftrace.txt' -ex q --args\
  /usr/lib/bluetooth/bluetoothd --nodetach --debug

This will produce the needed list of all the calls to functions from that "device.c" file into "gdb.txt" and have output of the app interleaved with these in stdout/stderr (which can be redirected, or maybe closed with more gdb commands in txt file or before it with "-ex"), and is non-interactive.

From here, seeing where exactly the issue seem to occur, one'd probably want to look thru the code of the funcs in question, run gdb interactively and inspect what exactly is happening there.

Definitely nowhere near the magic some people script gdb with, but haven't found similar snippets neatly organized anywhere else, so here they go, in case someone might want to do the exact same thing.

Can also be used to log a bunch of calls from multiple files, of course, by giving "ctags" more files to parse.

Dec 09, 2015

Transparent buffer/file processing in emacs on load/save/whatever-io ops

Following-up on my gpg replacement endeavor, also needed to add transparent decryption for buffers loaded from *.ghg files, and encryption when writing stuff back to these.

git filters (defined via gitattributes file) do same thing when interacting with the repo.

Such thing is already done by a few exising elisp modules, such as jka-compr.el for auto-compression-mode (opening/saving .gz and similar files as if they were plaintext), and epa.el for transparent gpg encryption.

While these modules do this The Right Way by adding "file-name-handler-alist" entry, googling for a small ad-hoc boilerplate, found quite a few examples that do it via hooks, which seem rather unreliable and with esp. bad failure modes wrt transparent encryption.

So, in the interest of providing right-er boilerplate for the task (and because I tend to like elisp) - here's fg_sec.el example (from mk-fg/emacs-setup) of how it can be implemented cleaner, in similar fashion to epa and jka-compr.

Code calls ghg -do when visiting/reading files (with contents piped to stdin) and ghg -eo (with stdin/stdout buffers) when writing stuff back.

Entry-point/hook there is "file-name-handler-alist", where regexp to match *.ghg gets added to call "ghg-io-handler" for every i/o operation (including path ops like "expand-file-name" or "file-exists-p" btw), with only "insert-file-contents" (read) and "write-region" (write) being overidden.

Unlike jka-compr though, no temporary files are used in this implementation, only temp buffers, and "insert-file-contents" doesn't put unauthenticated data into target buffer as it arrives, patiently waiting for subprocess to exit with success code first.

Fairly sure that this bit of elisp can be used for any kind of processing, by replacing "ghg" binary with anything else that can work as a pipe (stdin -> processing -> stdout), which opens quite a lot of possibilities.

For example, all JSON files can be edited as a pretty YAML version, without strict syntax and all the brackets of JSON, or the need to process/convert them purely in elisp's json-mode or something - just plug python -m pyaml and python -m json commands for these two i/o ops and it should work.

Suspect there's gotta be something that'd make such filters easier in MELPA already, but haven't been able to spot anything right away, maybe should put up a package there myself.

[fg_sec.el code link]

Next → Page 1 of 2
Member of The Internet Defense League