Jan 16, 2025

nftables rate-limiting against low-effort DDoS attacks

Dunno what random weirdo found me this time around, but have noticed 'net connection on home-server getting clogged by 100mbps of incoming traffic yesterday, which seemed to be just junk sent to every open protocol which accepts it from some 5-10K IPs around the globe, with bulk being pipelined requests over open nginx connections.

Seems very low-effort, and easily worked around by not responding to TCP SYN packets, as volume of those is relatively negligible (not a syn/icmp flood or any kind of amplification backscatter), and just nftables can deal with that, if configured to block the right IPs.

Actually, first of all, as nginx was a prime target here, and allows single connection to dump a lot of request traffic into it (what was happening), two things can be easily done there:

  • Tighten keepalive and request limits in general, e.g.:

    limit_req_zone $binary_remote_addr zone=perip:10m rate=20r/m;
    limit_req zone=perip burst=30 nodelay;
    keepalive_requests 3;
    keepalive_time 1m;
    keepalive_timeout 75 60;
    client_max_body_size 10k;
    client_header_buffer_size 1k;
    large_client_header_buffers 2 1k;
    

    Idea is to at least force bots to reconnect, which will work nicely with nftables rate-limiting below too.

  • If bots are simple and dumb, sending same 3-4 types of requests, grep those:

    tail -F access.log | stdbuf -oL awk '/.../ {print $1}' |
      while read addr; do nft add element inet filter set.inet-bots4 "{$addr}"; done
    

    Yeah, there's fail2ban and such for that as well, but why overcomplicate things when a trivial tail to grep/awk will do.

Tbf that takes care of bulk of the traffic in such simple scenario already, but nftables can add more generalized "block bots connecting to anything way more than is sane" limits, like these:

add set inet filter set.inet-bots4.rate \
  { type ipv4_addr; flags dynamic, timeout; timeout 10m; }
add set inet filter set.inet-bots4 \
  { type ipv4_addr; flags dynamic, timeout; counter; timeout 240m; }

add counter inet filter cnt.inet-bots.pass
add counter inet filter cnt.inet-bots.blackhole

add rule inet filter tc.pre \
  iifname $iface.wan ip daddr $ip.wan tcp flags syn jump tc.pre.ddos

add rule inet filter tc.pre.ddos \
  ip saddr @set.inet-bots4 counter name cnt.inet-bots.blackhole drop
add rule inet filter tc.pre.ddos \
  update @set.inet-bots4.rate { ip saddr limit rate over 3/minute burst 20 packets } \
  add @set.inet-bots4 { ip saddr } drop
add rule inet filter tc.pre.ddos counter name cnt.inet-bots.pass

(this is similar to an example under SET STATEMENT from "man nft")

Where $iface.wan and such vars should be define'd separately, as well as tc.pre hooks (somewhere like prerouting -350, before anything else). ip/ip6 addr selectors can also be used with separate IPv4/IPv6 sets.

But the important things there IMO are:

  • To define persistent sets, like set.inet-bots4 blackhole one, and not flush/remove those on any configuration fine-tuning afterwards, only build it up until non-blocked botnet traffic is negligible.

    Rate limits like ip saddr limit rate over 3/minute burst 20 packets are stored in the dynamic set itself, so can be adjusted on the fly anytime, without needing to replace it.

    Sets are easy to export/import in isolation as well:

    # nft list set inet filter set.inet-bots4 > bots4.nft
    # nft -f bots4.nft
    

    Last command adds set elements from bots4.nft, as there's no "flush" in there, effectively merging old set with the new, does not replace it. -j/--json input/output can be useful there to filter sets via scripts.

  • Always use separate chain like tc.pre.ddos for complicated rate-limiting and set-matching rules, so that those can be atomically flushed-replaced via e.g. a simple .sh script to change or tighten/relax the limits as-needed later:

    nft -f- <<EOF
    flush chain inet filter tc.pre.ddos
    add rule inet filter tc.pre.ddos \
      ip saddr @set.inet-bots4 counter name cnt.inet-bots.blackhole drop
    # ... more rate-limiting rule replacements here
    EOF
    

    These atomic updates is one of the greatest things about nftables - no need to nuke whole ruleset, just edit/replace and apply relevant chain(s) via script.

    It's also not hard to add such chains after the fact, but a bit fiddly - see e.g. "Managing tables, chains, and rules using nft commands" in RHEL docs for how to list all rules with their handles (use nft -at list ... with -t in there to avoid dumping large sets), insert/replace rules, etc.

    But the point is - it's a lot easier when pre-filtered traffic is already passing through dedicated chain to focus on, and edit it separately from the rest.

  • Counters are very useful to understand whether any of this helps, for example:

    # nft list counters table inet filter
    table inet filter {
      counter cnt.inet-bots.pass {
        packets 671 bytes 39772
      }
      counter cnt.inet-bots.blackhole {
        packets 368198 bytes 21603012
      }
    }
    

    So it's easy to see that rules are working, and blocking is applied correctly.

    And even better - nft reset counters ... && sleep 100 && nft list counters ... command will effectively give the rate of how many bots get passed or blocked per second.

    nginx also has similar metrics btw, without needing to remember any status-page URLs or monitoring APIs - tail -F access.log | pv -ralb >/dev/null (pv is a common unix "pipe viewer" tool, and can count line rates too).

  • Sets can have counters as well, like set.inet-bots4, defined with counter; in the example above.

    nft get element inet filter set.inet-bots4 '{ 103.115.243.145 }' will get info on blocked packets/bytes for specific bot, when it was added, etc.

    One missing "counter" on sets is the number of elements in those, which piping it through wc -l won't get, as nft dumps multiple elements on the same line, but jq or a trivial python script can get from -j/--json output:

    nft -j list set inet filter set.inet-bots4 | python /dev/fd/3 3<<'EOF'
    import sys, json
    for block in json.loads(sys.stdin.read())['nftables']:
      if not (nft_set := block.get('set')): continue
      print(f'{len(nft_set.get("elem", list())):,d}'); break
    EOF
    

    (jq syntax is harder to remember when using it rarely than python)

  • nftables sets can have tuples of multiple things too, e.g. ip + port, or even a verdict stored in there, but it hardly matters with such temporary bot blocks.

  • Feed any number of other easy-to-spot bot-patterns into same "blackhole" nftables sets.

    E.g. that tail -F access.log | awk is enough to match obviously-phony requests to same bogus host/URL, and same for malformed junk in error.log, auth.log, mail.log, etc - stream all those IPs into nft add element ... too, the more the merrier :)

It used to be more difficult to maintain such limits efficiently in userspace to sync into iptables, but nftables has this basic stuff built-in and very accessible.

Though probably won't help against commercial DDoS that's expected to get results instead of just a minor nuisance, against something more valuable than a static homepage on a $6/mo internet connection - bots might be a bit more sophisticated there, and numerous enough to clog the pipe by syn-flood or whatever icmp/udp junk, without distributed network like CloudFlare filtering it at multiple points.

This time I've finally decided to bother putting it all in the script too (as well as this blog post while at it), which can be found in the usual repo for scraps - mk-fg/fgtk/scraps/nft-ddos (or on codeberg and in local cgit).

Dec 13, 2024

Automated git release tags

For projects tracked in some package repositories, apparently it's worth tagging releases in git repos (as in git tag 24.12.1 HEAD && git push --tags), for distro packagers/maintainers to check/link/use/compare new release from git, which seems easy enough to automate if pkg versions are stored in a repo file already.

One modern way of doing that in larger projects can be CI/CD pipelines, but they imply a lot more than just release tagging, so for some tiny python module like pyaml, don't see a reason to bother with them atm, and I know how git hooks work.

For release to be pushed to a repository like PyPI in the first place, project repo almost certainly has a version stored in a file somewhere, e.g. pyproject.toml for PyPI:

[project]
name = "pyaml"
version = "24.12.1"
...

Updates to this version string can be automated on their own (I use simple git-version-bump-filter script for that in some projects), or done manually when pushing a new release to package repo, and git tags can easily follow that.

E.g. when pyproject.toml changes in git commit, and that change includes version= line - that's a commit that should have that updated version tag on it.

Best place to add/update that tag in git after commit is post-commit hook:

#!/bin/bash
set -eo pipefail

die() {
  echo >&2 $'\ngit-post-commit :: ----------------------------------------'
  echo >&2 "git-post-commit :: ERROR: $@"
  echo >&2 $'git-post-commit :: ----------------------------------------\n'; exit 1; }

ver=$( git show --no-color --diff-filter=M -aU0 pyproject.toml |
    gawk '/^\+version\s*=/ {
      split(substr($NF,2,length($NF)-2),v,".")
      print v[1]+0 "." v[2]+0 "." v[3]+0}' )

[[ "$ver" =~ ^[0-9]+\.[0-9]+\.[0-9]+$ ]] || {
  ver=$( gawk '/^version\s*=/ {
    split(substr($NF,2,length($NF)-2),v,".")
    print v[1]+0 "." v[2]+0 "." v[3]+0}' pyproject.toml )
  [[ "$ver" =~ ^[0-9]+\.[0-9]+\.[0-9]+$ ]] || \
    die 'Failed to get version from git-show and pyproject.toml file'
  ver_tag=$(git tag --sort=v:refname | tail -1)
  [[ -n "$ver" && "$ver" = "$ver_tag" ]] || die 'No new release to tag,'`
    `" and last git-tag [ $ver_tag ] does not match pyproject.toml version [ $ver ]"
  echo $'\ngit-post-commit :: no new tag, last one matches pyproject.toml\n'; exit 0; }

git tag -f "$ver" HEAD # can be reassigning tag after --amend
echo -e "\ngit-post-commit :: tagged new release [ $ver ]\n"

git-show there picks version update line from just-created commit, which is then checked against existing tag and assigned or updated as necessary.

"Updated" part tends to be important too, as at least for me it's common to remember something that needs to be fixed/updated only when writing commit msg or even after git-push, so git commit --amend is common, and should update that same tag to a new commit hash.

Messages printed in this hook are nicely prepended to git's usual commit info output in the terminal, so that you remember when/where this stuff is happening, and any potential errors are fairly obvious.

Having tags assigned is not enough to actually have those on github/gitlab/codeberg and such, as git doesn't push those automatically.

There's --follow-tags option to push "annotated" tags only, but I don't see any reason why trivial version tags should have a message attached to them, so of course there's another way too - pre-push hook:

#!/bin/sh
set -e

# Push tags on any pushes to "master" branch, with stdout logging
# Re-assigns tags, but does not delete them, use "git push --delete remote tag" for that

push_remote=$1 push_url=$2

master_push= master_oid=$(git rev-parse master)
while read local_ref local_oid remote_ref remote_oid
do [ "$local_oid" != "$master_oid" ] || master_push=t; done
[ -n "$master_push" ] || exit 0

prefix=$(printf 'git-pre-push [ %s %s ] ::' "$push_remote" "$push_url")
printf '\n%s --- tags-push ---\n' "$prefix"
git push --no-verify --tags -f "$push_url" # specific URL in case remote has multiple of those
printf '%s --- tags-push success ---\n\n' "$prefix"

It has an extra check for whether it's a push for a master branch, where release tags presumably are, and auto-runs git push --tags -f to the same URL.

Again -f here is to be able to follow any tag reassignments after --amend's, although it doesn't delete tags that were removed locally, but don't think that should happen often enough to bother (if ever).

pre-push position of the hook should abort the push if there're any issues pushing tags, and pushing to specific URLs allows to use multiple repo URLs in e.g. default "origin" remote (used with no-args git push), like in github + codeberg + self-hosted URL-combo that I typically use for redundancy and to avoid depending on silly policies of "free" third-party services (which is also why maintaining service-specific CI/CD stuff on those seems like a wasted effort).

With both hooks in place (under .git/hooks/), there should be no manual work involved in managing/maintaining git tags anymore, to forget that they exist again for all practical purposes.

Made both hooks for pyaml project repo (apparently packaged in some distro), where maybe more recent versions of those can be found:

Don't think git or sh/bash/gawk used in those ever change to bother updating them, but maybe there'll be some new corner-case or useful git workflow to handle, which I haven't bumped into yet.

Sep 30, 2024

Adding color to tcpdump makes a ton of difference

Debugging the usual censorshit issues, finally got sick of looking at normal tcpdump output, and decided to pipe it through a simple translator/colorizer script.

I think it's one of these cases where a picture is worth a thousand words:

tcpdump console with IPv6 traffic

This is very hard to read, especially when it's scrolling, with long generated IPv6'es in there.

While this IMO is quite readable:

tcpdump console with colorized traffic

Immediately obvious who's talking to whom and when, where it's especially trivial to focus on packets from specific hosts by their name shape/color.

Difference between the two is this trivial config file:

2a01:4f8:c17:37c1: local.net: !gray
2a01:4f8:c17:37c1:8341:8768:e26:83ff [Container] !bo-ye

2a02:17d0:201:8b0 remote.net !gr
2a02:17d0:201:8b01::1 [Remote-A] !br-gn

2a02:17d0:201:8b00:2a10:6e67:1a0:60ae [Peer] !bold-cyan
2a02:17d0:201:8b00:f60:f2c3:5c:7702 [Desktop] !blue
2a02:17d0:201:8b00:de9a:11c8:e285:235e [Laptop] !wh

...which sets host/network/prefix labels to replace unreadable address parts with (hosts in brackets as a convention) and colors/highlighting for those (using either full or two-letter DIN 47100-like names for brevity).

Plus the script to pipe that boring tcpdump output through - tcpdump-translate.

Another useful feature of such script turns out to be filtering - tcpdump command-line quickly gets unwieldy with "host ... && ..." specs, while in the config above it's trivial to comment/uncomment lines and filter by whatever network prefixes, instead of cramming it all into shell prompt.

tcpdump has some of this functionality via DNS reverse-lookups too, but I really don't want it resolving any addrs that I don't care to track specifically, which often makes output even more confusing, with long and misleading internal names assigned by someone else for their own purposes popping up in wrong places, while still remaining indistinct and lacking colors.

Aug 06, 2024

List connected processes for unix sockets on linux

For TCP connections, it seems pretty trivial - old netstat (from net-tools project) and modern ss (iproute2) tools do it fine, where you can easily grep both listening or connected end by IP:port they're using.

But ss -xp for unix sockets (AF_UNIX, aka "named pipes") doesn't work like that - only prints socket path for listening end of the connection, which makes lookups by socket path not helpful, at least with the current iproute-6.10.

"at least with the current iproute" because manpage actually suggests this:

ss -x src /tmp/.X11-unix/*
  Find all local processes connected to X server.

Where socket is wrong for modern X - easy to fix - and -p option seem to be omitted (to show actual processes), but the result is also not at all "local processes connected to X server" anyway:

# ss -xp src @/tmp/.X11-unix/X1

Netid State  Recv-Q Send-Q      Local Address:Port   Peer Address:Port  Process
u_str ESTAB  0      0      @/tmp/.X11-unix/X1 26800             * 25948  users:(("Xorg",pid=1519,fd=51))
u_str ESTAB  0      0      @/tmp/.X11-unix/X1 331064            * 332076 users:(("Xorg",pid=1519,fd=40))
u_str ESTAB  0      0      @/tmp/.X11-unix/X1 155940            * 149392 users:(("Xorg",pid=1519,fd=46))
...
u_str ESTAB  0      0      @/tmp/.X11-unix/X1 16326             * 20803  users:(("Xorg",pid=1519,fd=44))
u_str ESTAB  0      0      @/tmp/.X11-unix/X1 11106             * 27720  users:(("Xorg",pid=1519,fd=50))
u_str LISTEN 0      4096   @/tmp/.X11-unix/X1 12782             * 0      users:(("Xorg",pid=1519,fd=7))

It's just a long table listing same "Xorg" process on every line, which obviously isn't what example claims to fetch, or useful in any way. So maybe it worked fine earlier, but some changes to the tool or whatever data it grabs made this example obsolete and not work anymore.

But there are "ports" listed for unix sockets, which I think correspond to "inodes" in /proc/net/unix, and are global across host (or at least same netns), so two sides of connection - that socket-path + Xorg process info - and other end with connected process info - can be joined together by those port/inode numbers.

I haven't been able to find a tool to do that for me easily atm, so went ahead to write my own script, mostly focused on listing per-socket pids on either end, e.g.:

# unix-socket-links
...
/run/dbus/system_bus_socket :: dbus-broker[1190] :: Xorg[1519] bluetoothd[1193]
  claws-mail[2203] dbus-broker-lau[1183] efreetd[1542] emacs[2160] enlightenment[1520]
  pulseaudio[1523] systemd-logind[1201] systemd-network[1363] systemd-timesyn[966]
  systemd[1366] systemd[1405] systemd[1] waterfox[2173]
...
/run/user/1000/bus :: dbus-broker[1526] :: dbus-broker-lau[1518] emacs[2160] enlightenment[1520]
  notification-th[1530] pulseaudio[1523] python3[1531] python3[5397] systemd[1405] waterfox[2173]

/run/user/1000/pulse/native :: pulseaudio[1523] :: claws-mail[2203] emacs[2160]
  enlightenment[1520] mpv[9115] notification-th[1530] python3[2063] waterfox[2173]

@/tmp/.X11-unix/X1 :: Xorg[1519] :: claws-mail[2203] conky[1666] conky[1671] emacs[2160]
  enlightenment[1520] notification-th[1530] python3[5397] redshift[1669] waterfox[2173]
  xdpms[7800] xterm[1843] xterm[2049] yeahconsole[2047]
...

Output format is <socket-path> :: <listening-pid> :: <clients...>, where it's trivial to see exactly what is connected to which socket (and what's listening there).

unix-socket-links @/tmp/.X11-unix/X1 can list only conns/pids for that socket, and adding -c/--conns can be used to disaggregate that list of processes back into specific connections (which can be shared between pids too), to get more like a regular netstat/ss output, but with procs on both ends, not weirdly broken one like ss -xp gives you.

Script is in the usual mk-fg/fgtk repo (also on codeberg and local git), with code link and a small doc here:

https://github.com/mk-fg/fgtk?tab=readme-ov-file#hdr-unix-socket-links

Was half-suspecting that I might need to parse /proc/net/unix or load eBPF for this, but nope, ss has all the info needed, just presents it in a silly way.

Also, unlike some other iproute2 tools where that was added (or lsfd below), it doesn't have --json output flag, but should be stable enough to parse anyway, I think, and easy enough to sanity-check by the header.


Oh, and also, one might be tempted to use lsof or lsfd for this, like I did, but it's more complicated and can be janky to get the right output out of these, and pretty sure lsof even has side-effects, where it connects to socket with +E (good luck figuring out what's that supposed to do btw), causing all sorts of unintended mayhem, but here are snippets that I've used for those in some past (checking where stalled ssh-agent socket connections are from in this example):

lsof -wt +E "$SSH_AUTH_SOCK" | awk '{print "\\<" $1 "\\>"}' | g -3f- <(ps axlf)
lsfd -no PID -Q "UNIX.PATH == '$SSH_AUTH_SOCK'" | grep -f- <(ps axlf)

Don't think either of those work anymore, maybe for same reason as with ss not listing unix socket path for egress unix connections, and lsof in particular straight-up hangs without even kill -9 getting it, if socket on the other end doesn't process its (silly and pointless) connection, so maybe don't use that one at least - lsfd seem to be easier to use in general.

Jul 01, 2024

zsh fallback completions on empty results

Usually zsh does fine wrt tab-completion, but sometimes you just get nothing when pressing tab, either due to somewhat-broken completer or it working as intended but there's seemingly being "nothing" to complete.

Recently latter started happening after redirection characters, e.g. on cat myfile > <TAB>, and that finally prompted me to re-examine why I even put up with this crap.

Because in vast majority of cases, completion should use files, except for commands as the first thing on the line, and maybe some other stuff way more rarely, almost as an exception. But completing nothing at all seems like an obvious bug to me, as if I wanted nothing, wouldn't have pressed the damn tab key in the first place.

One common way to work around the lack of file-completions when needed, is to define special key for just those, like shift-tab:

zstyle ':completion:complete-files:*' completer _files
bindkey "\e[Z" complete-files

If using that becomes a habit everytime one needs files, that'd be a good solution, but I still use generic "tab" by default, and expect file-completion from it in most cases, so why not have it fallback to file-completion if whatever special thing zsh has otherwise fails - i.e. suggest files/paths instead of nothing.

Looking at _complete_debug output (can be bound/used instead of tab-completion), it's easy to find where _main_complete dispatcher picks completer script, and that there is apparently no way to define fallback of any kind there, but easy enough to patch one in, at least.

Here's the hack I ended up with for /etc/zsh/zshrc:

## Make completion always fallback to next completer if current returns 0 results
# This allows to fallback to _file completion properly when fancy _complete fails
# Patch requires running zsh as root at least once, to apply it (or warn/err about it)

_patch_completion_fallbacks() {
  local patch= p=/usr/share/zsh/functions/Completion/Base/_main_complete
  [[ "$p".orig -nt "$p" ]] && return || {
    grep -Fq '## fallback-complete patch v1 ##' "$p" && touch "$p".orig && return ||:; }
  [[ -n "$(whence -p patch)" ]] || {
    echo >&2 'zshrc :: NOTE: missing "patch" tool to update completions-script'; return; }
  read -r -d '' patch <<'EOF'
--- _main_complete      2024-06-09 01:10:28.352215256 +0500
+++ _main_complete.new  2024-06-09 01:10:51.087404762 +0500
@@ -210,18 +210,20 @@
     fi

     _comp_mesg=
     if [[ -n "$call" ]]; then
       if "${(@)argv[3,-1]}"; then
         ret=0
         break 2
       fi
     elif "$tmp"; then
+      ## fallback-complete patch v1 ##
+      [[ $compstate[nmatches] -gt 0 ]] || continue
       ret=0
       break 2
     fi
     (( _matcher_num++ ))
   done
   [[ -n "$_comp_mesg" ]] && break

   (( _completer_num++ ))
 done
EOF
  patch --dry-run -stN "$p" <<< "$patch" &>/dev/null \
    || { echo >&2 "zshrc :: WARNING: zsh fallback-completions patch fails to apply"; return; }
  cp -a "$p" "$p".orig && patch -stN "$p" <<< "$patch" && touch "$p".orig \
    || { echo >&2 "zshrc :: ERROR: failed to apply zsh fallback-completions patch"; return; }
  echo >&2 'zshrc :: NOTE: patched zsh _main_complete routine to allow fallback-completions'
}
[[ "$UID" -ne 0 ]] || _patch_completion_fallbacks
unset _patch_completion_fallbacks

This would work with multiple completers defined like this:

zstyle ':completion:*' completer _complete _ignored _files

Where _complete _ignored is the default completer-chain, and will try whatever zsh has for the command first, and then if those return nothing, instead of being satisfied with that, patched-in continue will keep going and run next completer, which is _files in this case.

A patch with generous context is to find the right place and bail if upstream code changes, but otherwise, whenever first running the shell as root, fix the issue until next zsh package update (and then patch will run/fix it again).

Doubt it'd make sense upstream in this form, as presumably current behavior is locked-in over years, but an option for something like this would've been nice. I'm content with a hack for now though, it works too.

Jan 17, 2024

Basic markdown syntax/links checks after rst -> md migration

Some days ago, I've randomly noticed that github stopped rendering long rst (as in reStructuredText) README files on its repository pages. Happened in a couple repos, with no warning or anything, it just said "document takes too long to preview" below the list of files, with a link to view raw .rst file.

Sadly that's not the only issue with rst rendering, as codeberg (and pretty sure its Gitea / Forgejo base apps) had issues with some syntax there as well - didn't make any repo links correctly, didn't render table of contents, missed indented references for links, etc.

So thought to fix all that by converting these few long .rst READMEs to .md (markdown), which does indeed fix all issues above, as it's a much more popular format nowadays, and apparently well-tested and working fine in at least those git-forges.

One nice thing about rst however, is that it has one specification and a reference implementation of tools to parse/generate its syntax - python docutils - which can be used to go over .rst file in a strict manner and point out all syntax errors in it (rst2html does it nicely).

Good example of such errors that always gets me, is using links in the text with a reference-style URLs for those below (instead of inlining them), to avoid making plaintext really ugly, unreadable and hard to edit due to giant mostly-useless URLs in middle of it.

You have to remember to put all those references in, ideally not leave any unused ones around, and then keep them matched to tags in the text precisely, down to every single letter, which of course doesn't really work with typing stuff out by hand without some kind of machine-checks.

And then also, for git repo documentation specifically, all these links should point to files in the repo properly, and those get renamed, moved and removed often enough to be a constant problem as well.

Proper static-HTML doc-site generation tools like mkdocs (or its popular mkdocs-material fork) do some checking for issues like that (though confusingly not nearly enough), but require a bit of setup, with configuration and whole venv for them, which doesn't seem very practical for a quick README.md syntax check in every random repo.

MD linters apparently go the other way and check various garbage metrics like whether plaintext conforms to some style, while also (confusingly!) often not checking basic crap like whether it actually works as a markdown format.

Task itself seems ridiculously trivial - find all ... [some link] ... and [some link]: ... bits in the file and report any mismatches between the two.

But looking at md linters a few times now, couldn't find any that do it nicely that I can use, so ended up writing my own one - markdown-checks tool - to detect all of the above problems with links in .md files, and some related quirks:

  • link-refs - Non-inline links like "[mylink]" have exactly one "[mylink]: URL" line for each.
  • link-refs-unneeded - Inline URLs like "[mylink](URL)" when "[mylink]: URL" is also in the md.
  • link-anchor - Not all headers have "<a name=hdr-...>" line. See also -a/--add-anchors option.
  • link-anchor-match - Mismatch between header-anchors and hashtag-links pointing to them.
  • link-files - Relative links point to an existing file (relative to them).
  • link-files-weird - Relative links that start with non-letter/digit/hashmark.
  • link-files-git - If .md file is in a git repo, warn if linked files are not under git control.
  • link-dups - Multiple same-title links with URLs.
  • ... - and probably a couple more by now
  • rx-in-code - Command-line-specified regexp (if any) detected inside code block(s).
  • tabs - Make sure md file contains no tab characters.
  • syntax - Any kind of incorrect syntax, e.g. blocks opened and not closed and such.

ReST also has a nice .. contents:: feature that automatically renders Table of Contents from all document headers, quite like mkdocs does for its sidebars, but afaik basic markdown does not have that, and maintaining that thing with all-working links manually, without any kind of validation, is pretty much impossible in particular, and yet absolutely required for large enough documents with a non-autogenerated ToC.

So one interesting extra thing that I found needing to implement there was for script to automatically (with -a/--add-anchors option) insert/update <a name=hdr-some-section-header></a> anchor-tags before every header, because otherwise internal links within document are impossible to maintain either - github makes hashtag-links from headers according to its own inscrutable logic, gitlab/codeberg do their own thing, and there's no standard for any of that (which is a historical problem with .md in general - poor ad-hoc standards on various features, while .rst has internal links in its spec).

Thus making/maintaining table-of-contents kinda requires stable internal links and validating that they're all still there, and ideally that all headers have such internal link as well, i.e. new stuff isn't missing in the ToC section at the top.

Script addresses both parts by adding/updating those anchor-tags, and having them in the .md file itself indeed makes all internal hashtag-links "stable" and renderer-independent - you point to a name= set within the file, not guess at what name github or whatever platform generates in its html at the moment (which inevitably won't match, so kinda useless that way too). And those are easily validated as well, since both anchor and link pointing to it are in the file, so any mismatches are detected and reported.

I was also thinking about generating the table-of-contents section itself, same as it's done in rst, for which surely many tools exist already, but as long as it stays correct and checked for not missing anything, there's not much reason to bother - editing it manually allows for much greater flexibility, and it's not long enough for that to be any significant amount of work, either to make initially or add/remove a link there occasionally.

With all these checks for wobbly syntax bits in place, markdown READMEs seem to be as tidy, strict and manageable as rst ones. Both formats have rough feature parity for such simple purposes, but .md is definitely only one with good-enough support on public code-forge sites, so a better option for public docs atm.

Jan 09, 2024

(Ab-)Using fanotify as a container event/message bus

Earlier, as I was setting-up filtering for ca-certificates on a host running a bunch of systemd-nspawn containers (similar to LXC), simplest way to handle configuration across all those consistently seem to be just rsyncing filtered p11-kit bundle into them, and running (distro-specific) update-ca-trust there, to easily have same expected CA roots across them all.

But since these are mutable full-rootfs multi-app containers with init (systemd) in them, they update their filesystems separately, and routine package updates will overwrite cert bundles in /usr/share/, so they'd have to be rsynced again after that happens.

Good mechanism to handle this in linux is fanotify API, which in practice is used something like this:

# fatrace -tf 'WD+<>'

15:58:09.076427 rsyslogd(1228): W   /var/log/debug.log
15:58:10.574325 emacs(2318): W   /home/user/blog/content/2024-01-09.abusing-fanotify.rst
15:58:10.574391 emacs(2318): W   /home/user/blog/content/2024-01-09.abusing-fanotify.rst
15:58:10.575100 emacs(2318): CW  /home/user/blog/content/2024-01-09.abusing-fanotify.rst
15:58:10.576851 git(202726): W   /var/cache/atop.d/atop.acct
15:58:10.893904 rsyslogd(1228): W   /var/log/syslog/debug.log
15:58:26.139099 waterfox(85689): W   /home/user/.waterfox/general/places.sqlite-wal
15:58:26.139347 waterfox(85689): W   /home/user/.waterfox/general/places.sqlite-wal
...

Where fatrace in this case is used to report all write, delete, create and rename-in/out events for files and directories (that weird "-f WD+<>" mask), as it promptly does. It's useful to see what apps might abuse SSD/NVME writes, more generally to understand what's going on with filesystem under some load, which app is to blame for that and where it happens, or as a debugging/monitoring tool.

But also if you want to rsync/update files after they get changed under some dirs recursively, it's an awesome tool for that as well. With container updates above, can monitor /var/lib/machines fs, and it'll report when anything in <some-container>/usr/share/ca-certificates/trust-source/ gets changed under it, which is when aforementioned rsync hook should run again for that container/path.

To have something more robust and simpler than a hacky bash script around fatrace, I've made run_cmd_pipe.nim tool, that reads ini config file like this, with a list of input lines to match:

delay = 1_000 # 1s delay for any changes to settle
cooldown = 5_000 # min 5s interval between running same rule+run-group command

[ca-certs-sync]
regexp = : \S*[WD+<>]\S* +/var/lib/machines/(\w+)/usr/share/ca-certificates/trust-source(/.*)?$
regexp-env-group = 1
regexp-run-group = 1
run = ./_scripts/ca-certs-sync

And runs commands depending on regexp (PCRE) matches on whatever input gets piped into it, passing regexp-match through into via env, with sane debouncing delays, deduplication, config reloads, tiny mem footprint and other proper-daemon stuff. Can also setup its pipe without shell, for an easy ExecStart=run_cmd_pipe rcp.conf -- fatrace -cf WD+<> systemd.service configuration.

Having this running for a bit now, and bumping into other container-related tasks, realized how it's useful for a lot of things even more generally, especially when multiple containers need to send some changes to host.

For example, if a bunch of containers should have custom network interfaces bridged between them (in a root netns), which e.g. systemd.nspawn Zone= doesn't adequately handle - just add whatever custom VirtualEthernetExtra=vx-br-containerA:vx-br into container, have a script that sets-up those interfaces in those "touch" or create a file when it's done, and then run host-script for that event, to handle bridging on the other side:

[vx-bridges]
regexp = : \S*W\S* +/var/lib/machines/(\w+)/var/tmp/vx\.\S+\.ready$
regexp-env-group = 1
run = ./_scripts/vx-bridges

This seem to be incredibly simple (touch/create files to pick-up as events), very robust (as filesystems tend to be), and doesn't need to run anything more than ~600K of fatrace + run_cmd_pipe, with a very no-brainer configuration (which file[s] to handle by which script[s]).

Can be streamlined for any types and paths of containers themselves (incl. LXC and OCI app-containers like docker/podman) by bind-mounting dedicated filesystem/volume into those to pass such event-files around there, kinda like it's done in systemd with its agent plug-ins, e.g. for handling password inputs, so not really a novel idea either. systemd.path units can also handle simpler non-recursive "this one file changed" events.

Alternative with such shared filesystem can be to use any other IPC mechanisms, like append/tail file, fcntl locks, fifos or unix sockets, and tbf run_cmd_pipe.nim can handle all those too, by running e.g. tail -F shared.log instead of fatrace, but latter is way more convenient on the host side, and can act on incidental or out-of-control events (like pkg-mangler doing its thing in the initial ca-certs use-case).

Won't work for containers distributed beyond single machine or more self-contained VMs - that's where you'd probably want more complicated stuff like AMQP, MQTT, K8s and such - but for managing one host's own service containers, regardless of whatever they run and how they're configured, this seem to be a really neat way to do it.

Dec 28, 2023

Trimming-down list of trusted TLS ca-certificates system-wide using a whitelist approach

It's no secret that Web PKI was always a terrible mess.

Idk of anything that can explain it better than Moxie Marlinspike's old "SSL And The Future Of Athenticity" talk, which still pretty much holds up (and is kinda hilarious), as Web PKI for TLS is still up to >150 certs, couple of which get kicked-out after malicious misuse or gross malpractice every now and then, and it's actually more worrying when they don't.

And as of 2023, EU eIDAS proposal stands to make this PKI much worse in the near-future, adding whole bunch of random national authorities to everyone's list of trusted CAs, which of course have no rational business of being there on all levels.

(with all people/orgs on the internet seemingly in agreement on that - see e.g. EFF, Mozilla, Ryan Hurst's excellent writeup, etc - but it'll probably pass anyway, for whatever political reasons)

So in the spirit of at least putting some bandaid on that, I had a long-standing idea to write a logger for all CAs that my browser uses over time, then inspect it after a while and kick <1% CAs out of the browser at least. This is totally doable, and not that hard - e.g. cerdicator extension can be tweaked to log to a file instead of displaying CA info - but never got around to doing it myself.

Update 2024-01-03: there is now also CertInfo app to scrape local history and probe all sites there for certs, building a list of root and intermediate CAs to inspect.

But recently, scrolling through Ryan Hurst's "eIDAS 2.0 Provisional Agreement Implications for Web Browsers and Digital Certificate Trust" open letter, pie chart on page-3 there jumped out to me, as it showed that 99% of certs use only 6-7 CAs - so why even bother logging those, there's a simple list of them, which should mostly work for me too.

I remember browsers and different apps using their own CA lists being a problem in the past, having to tweak mozilla nss database via its own tools, etc, but by now, as it turns out, this problem seem to have been long-solved on a typical linux, via distro-specific "ca-certificates" package/scripts and p11-kit (or at least it appears to be solved like that on my systems).

Gist is that /usr/share/ca-certificates/trust-source/ and its /etc counterpart have *.p11-kit CA bundles installed there by some package like ca-certificates-mozilla, and then package-manager runs update-ca-trust, which exports that to /etc/ssl/cert.pem and such places, where all other tools can pickup and use same CAs. Firefox (or at least my Waterfox build) even uses installed p11-kit bundle(s) directly and immediately. Those p11-kit bundles need to be altered or restricted somehow to affect everything on the system, only needing update-ca-trust at most - neat!

One problem I bumped into however, is that p11-kit tools only support masking specific individual CAs from the bundle via blacklist, and that will not be future-proof wrt upstream changes to that bundle, if the goal is to "only use these couple CAs and nothing else".

So ended up writing a simple script to go through .p11-kit bundle files and remove everything unnecessary from them on a whitelist-bases - ca-certificates-whitelist-filter - which uses a simple one-per-line format with wildcards to match multiple certs:

Baltimore CyberTrust Root # CloudFlare
ISRG Root X* # Let's Encrypt
GlobalSign * # Google
DigiCert *
Sectigo *
Go Daddy *
Microsoft *
USERTrust *

Picking whitelisted CAs from Ryan's list, found that GlobalSign should be added, and that it already signs Google's GTS CA's (so latter are unnecessary), while "Baltimore CyberTrust Root" seem to be a strange omission, as it signs CloudFlare's CA cert, which should've been a major thing on the pie chart in that eIDAS open letter.

But otherwise, that's pretty much it, leaving a couple of top-level CAs instead of a hundred, and couple days into it so far, everything seem to be working fine with just those. Occasional "missing root" error can be resolved easily by adding that root to the list, or ignoring it for whatever irrelevant one-off pages, though this really doesn't seem to be an issue at all.

This is definitely not a solution to Web PKI being a big pile of dung, made as an afterthough and then abused relentlessly and intentionally, with no apparent incentive or hope for fixes, but I think a good low-effort bandaid against clumsy mass-MitM by whatever random crooks on the network, in ISPs and idiot governments.

It still unfortunately leaves out two large issues in particular:

  • CAs on the list are still terrible mismanaged orgs.

    For example, Sectigo there is a renamed Comodo CA, after a series of incredible fuckups in all aspects of their "business", and I'm sure the rest of them are just as bad, but at least it's not a 100+ of those to multiply the risks.

  • Majority of signing CAs are so-called "intermediate" CAs (600+ vs 100+ roots), which have valid signing cert itself signed by one of the roots, and these are even more shady, operating with even less responsibility/transparency and no oversight.

    Hopefully this is a smaller list with less roots as well, though ideally all those should be whitelist-pruned exactly same as roots, which I think easiest to do via cert chain/usage logs (from e.g. CertInfo app mentioned above), where actual first signing cert in the chain can be seen, not just top-level ones.

    But then such whitelist probably can't be enforced, as you'd need to say "trust CAs on this list, but NOT any CAs that they signed", which is not how most (all?) TLS implementations work ¯\_(ツ)_/¯

And a long-term problem with this approach, is that if used at any scale, it further shifts control over CA trust from e.g. Mozilla's p11-kit bundle to those dozen giant root CAs above, who will then realistically have to sign even more and more powerful intermediate CAs for other orgs and groups (as they're the only ones on the CA list), ossifying them to be in control of Web PKI in the future over time, and makes "trusting" them meaningless non-decision (as you can't avoid that, even as/if/when they have to sign sub-CAs for whatever shady bad actors in secret).

To be fair, there are proposals and movements to remedy this situation, like Certificate Transparency and various cert and TLS policies/parameters' pinning, but I'm not hugely optimistic, and just hope that a quick fix like this might be enough to be on the right side of "you don't need to outrun the bear, just the other guy" metaphor.

Link: ca-certificates-whitelist-filter script on github (codeberg, local git)

Nov 17, 2023

USB hub per-port power switching done right with a couple wires

Like probably most folks who are surrounded by tech, I have too many USB devices plugged into the usual desktop, to the point that it kinda bothers me.

For one thing, some of those doohickeys always draw current and noticeably heat up in the process, which can't be good on the either side of the port. Good examples of this are WiFi dongles (with iface left in UP state), a cheap NFC reader I have (draws 300mA idling on the table 99.99% of the time), or anything with "battery" or "charging" in the description.

Other issue is that I don't want some devices to always be connected. Dual-booting into gaming Windows for instance, there's nothing good that comes from it poking at and spinning-up USB-HDDs, Yubikeys or various connectivity dongles' firmware, as well as jerking power on-and-off on those for reboots and whenever random apps/games probe those (yeah, not sure why either).

Unplugging stuff by hand is work, and leads to replacing usb cables/ports/devices eventually (more work), so toggling power on/off at USB hubs seems like an easy fix.

USB Hubs sometimes support that in one of two ways - either physical switches next to ports, or using USB Per-Port-Power-Switching (PPPS) protocol.

Problem with physical switches is that relying on yourself not to forget to do some on/off sequence manually for devices each time doesn't work well, and kinda silly when it can be automated - i.e. if you want to run ad-hoc AP, let the script running hostapd turn the power on-and-off around it as well.

But sadly, at least in my experience with it, USB Hub PPPS is also a bad solution, broken by two major issues, which are likely unfixable:

  • USB Hubs supporting per-port power toggling are impossible to find or identify.

    Vendors don't seem to care about and don't advertise this feature anywhere, its presence/support changes between hardware revisions (probably as a consequence of "don't care"), and is often half-implemented and dodgy.

    uhubctl project has a list of Compatible USB hubs for example, and note how hubs there have remarks like "DUB-H7 rev D,E (black). Rev B,C,F,G not supported" - shops and even product boxes mostly don't specify these revisions anywhere, or even list the wrong one.

    So good luck finding the right revision of one model even when you know it works, within a brief window while it's still in stock. And knowing which one works is pretty much only possible through testing - same list above is full of old devices that are not on the market, and that market seem to be too large and dynamic to track models/revisions accurately.

    On top of that, sometimes hubs toggle data lines and not power (VBUS), making feature marginally less useful for cases above, but further confusing the matter when reading specifications or even relying on reports from users.

    Pretty sure that hubs with support for this are usually higher-end vendors/models too, so it's expensive to buy a bunch of them to see what works, and kinda silly to overpay for even one of them anyway.

  • PPPS in USB Hubs has no memory and defaults to ON state.

    This is almost certainly by design - when someone plugs hub without obvious buttons, they might not care about power switching on ports, and just want it to work, so ports have to be turned-on by default.

    But that's also the opposite of what I want for all cases mentioned above - turning on all power-hungry devices on reboot (incl. USB-HDDs that can draw like 1A on spin-up!), all at once, in the "I'm starting up" max-power mode, is like the worst thing such hub can do!

    I.e. you disable these ports for a reason, maybe a power-related reason, which "per-port power switching" name might even hint at, and yet here you go, on every reboot or driver/hw/cable hiccup, this use-case gets thrown out of the window completely, in the dumbest and most destructive way possible.

    It also negates the other use-cases for the feature of course - when you simply don't want devices to be exposed, aside from power concerns - hub does the opposite of that and gives them all up whenever it bloody wants to.

In summary - even if controlling hub port power via PPPS USB control requests worked, and was easy to find (which it very much is not), it's pretty much useless anyway.

My simple solution, which I can emphatically recommend:

  • Grab robust USB Hub with switches next to ports, e.g. 4-port USB3 ones like that seem to be under $10 these days.

  • Get a couple of <$1 direct-current solid-state relays or mosfets, one per port.

    I use locally-made К293КП12АП ones, rated for toggling 0-60V 2A DC via 1.1-1.5V optocoupler input, just sandwitched together at the end - they don't heat up at all and easy to solder wires to.

  • Some $3-5 microcontroller with the usual USB-TTY, like any Arduino or RP2040 (e.g. Waveshare RP2040-Zero from aliexpress).

  • Couple copper wires pulled from an ethernet cable for power, and M/F jumper pin wires to easily plug into an MCU board headers.

  • An hour or few with a soldering iron, multimeter and a nice podcast.

Open up USB Hub - cheap one probably doesn't even have any screws - probe which contacts switches connect in there, solder short thick-ish copper ethernet wires from their legs to mosfets/relays, and jumper wires from input pins of the latter to plug into a tiny rp2040/arduino control board on the other end.

I like SSRs instead of mosfets here to not worry about controller and hub being plugged into same power supply that way, and they're cheap and foolproof - pretty much can't connect them disastorously wrong, as they've diodes on both circuits. Optocoupler LED in such relays needs one 360R resistor on shared GND of control pins to drop 5V -> 1.3V input voltage there.

This approach solves both issues above - components are easy to find, dirt-common and dirt-cheap, and are wired into default-OFF state, to only be toggled into ON via whatever code conditions you put into that controller.

Simplest way, with an RP2040 running the usual micropython firmware, would be to upload a main.py file of literally this:

import sys, machine

pins = dict(
  (str(n), machine.Pin(n, machine.Pin.OUT, value=0))
  for n in range(4) )

while True:
  try: port, state = sys.stdin.readline().strip()
  except ValueError: continue # not a 2-character line
  if port_pin := pins.get(port):
    print(f'Setting port {port} state = {state}')
    if state == '0': port_pin.off()
    elif state == '1': port_pin.on()
    else: print('ERROR: Port state value must be "0" or "1"')
  else: print(f'ERROR: Port {port} is out of range')

And now sending trivial "<port><0-or-1>" lines to /dev/ttyACM0 will toggle the corresponding pins 0-3 on the board to 0 (off) or 1 (on) state, along with USB hub ports connected to those, while otherwise leaving ports default-disabled.

From a linux machine, serial terminal is easy to talk to by running mpremote used with micropython fw (note - "mpremote run ..." won't connect stdin to tty), screen /dev/ttyACM0 or many other tools, incl. just "echo" from shell scripts:

stty -F /dev/ttyACM0 raw speed 115200 # only needed once for device
echo 01 >/dev/ttyACM0 # pin/port-0 enabled
echo 30 >/dev/ttyACM0 # pin/port-3 disabled
echo 21 >/dev/ttyACM0 # pin/port-2 enabled
...

I've started with finding a D-Link PPPS hub, quickly bumped into above limitations, and have been using this kind of solution instead for about a year now, migrating from old arduino uno to rp2040 mcu and hooking up a second 4-port hub recently, as this kind of control over USB peripherals from bash scripts that actually use those devices turns out to be very convenient.

So can highly recommend to not even bother with PPPS hubs from the start, and wire your own solution with whatever simple logic for controlling these ports that you need, instead of a silly braindead way in that USB PPPS works.

An example of a bit more complicated control firmware that I use, with watchdog timeout/pings logic on a controller (to keep device up only while script using it is alive) and some other tricks can be found in mk-fg/hwctl repository (github/codeberg or a local mirror).

Sep 05, 2023

Auto-generated hash-petnames for things

Usually auto-generated names aim for being meaningful instead of distinct, e.g. LEAFAL01A-P281, LEAFAN02A-P281, LEAFAL01B-P282, LEAFEL01A-P281, LEAFEN01A-P281, etc, where single-letter diffs are common and decode to something like different location or purpose.

Sometimes they aren't even that, and are assigned sequentially or by hash, like in case of contents hashes, or interfaces/vlans/addresses in a network infrastructure.

You always have to squint and spend time mentally decoding such identifiers, as one letter/digit there can change whole meaning of the message, so working with them is unnecessarily tiring, especially if a system often presents many of those without any extra context.

Usual fix is naming things, i.e. assigning hostnames to separate hardware platforms/VMs, DNS names to addresses, and such, but that doesn't work well with modern devops approaches where components are typically generated with "reasonable" but less readable naming schemes as described above.

Manually naming such stuff up-front doesn't work, and even assigning petnames or descriptions by hand gets silly quickly (esp. with some churn in the system), and it's not always possible to store/share that extra metadata properly (e.g. on rebuilds in entirely different places).

Useful solution I found is hashing to an automatically generated petnames, which seem to be kinda overlooked and underused - i.e. to hash the name to an easily-distinct, readable and often memorable-enough strings:

  • LEAFAL01A-P281 [ Energetic Amethyst Zebra ]
  • LEAFAN02A-P281 [ Furry Linen Eagle ]
  • LEAFAL01B-P282 [ Suave Mulberry Woodpecker ]
  • LEAFEL01A-P281 [ Acidic Black Flamingo ]
  • LEAFEN01A-P281 [ Prehistoric Raspberry Pike ]

Even just different length of these names makes them visually stand apart from each other already, and usually you don't really need to memorize them in any way, it's enough to be able to tell them apart at a glance in some output.

I've bumped into only one de-facto standard scheme for generating those - "Angry Purple Tiger", with a long list of compatible implementations (e.g. https://github.com/search?type=repositories&q=Angry+Purple+Tiger ):

% angry_purple_tiger LEAFEL01A-P281
acidic-black-flamingo

% angry_purple_tiger LEAFEN01A-P281
prehistoric-raspberry-pike

(default output is good for identifiers, but can use proper spaces and capitalization to be more easily-readable, without changing the words)

It's not as high-entropy as "human hash" tools that use completely random words or babble (see z-tokens for that), but imo wins by orders of magnitude in readability and ease of memorization instead, and on the scale of names, it matters.

Since those names don't need to be stored anywhere, and can be generated anytime, it is often easier to add them in some wrapper around tools and APIs, without the need for the underlying system to know or care that they exist, while making a world of difference in usability.

Honorable mention here to occasional tools like docker that have those already, but imo it's more useful to remember about this trick for your own scripts and wrappers, as that tends to be the place where you get to pick how to print stuff, and can easily add an extra hash for that kind of accessibility.

Next → Page 1 of 17