Jan 26, 2023

More FIDO2 hardware auth/key uses on a linux machine and their quirks

As I kinda went on to replace a lot of silly long and insecure passwords with FIDO2 USB devices - aka "yubikeys" - in various ways (e.g. earlier post about password/secret management), support for my use-cases was mostly good:

  • Webauthn - works ok, and been working well for me with U2F/FIDO2 on various important sites/services for quite a few years by now.

    Wish it worked with NFC reader in Firefox on Linux Desktop too, but oh well, maybe someday, if Mozilla doesn't implode before that.

  • pam-u2f to login with the token using much simplier and hw-rate-limited PIN (with pw fallback).

    Module itself worked effortlessly, but had to be added to various pam services properly, so that password fallback is available as well, e.g. system-local-login:

    # system-login
    auth required pam_shells.so
    auth requisite pam_nologin.so
    # system-auth + pam_u2f
    auth required pam_faillock.so preauth
    # auth_err=ignore will try same string as password for pam_unix
    -auth [success=2 authinfo_unavail=ignore auth_err=ignore] pam_u2f.so \
      origin=pam://my.host.net authfile=/etc/secure/pam-fido2.auth \
      userpresence=1 pinverification=1 cue
    auth [success=1 default=bad] pam_unix.so try_first_pass nullok
    auth [default=die] pam_faillock.so authfail
    auth optional pam_permit.so
    auth required pam_env.so
    auth required pam_faillock.so authsucc
    # auth    include   system-login
    account   include   system-login
    password  include   system-login
    session   include   system-login

    "auth" section is an exact copy of system-login and system-auth lines from the current Arch Linux, with pam_u2f.so line added in the middle, jumping over pam_unix.so on success, or ignoring failure result to allow for entered string to be tried as password there.

    Using Enlightenment Desktop Environment here, also needed to make a trivial "include system-local-login" file for its lock screen, which uses "enlightenment" PAM service by default, falling back to basic system-auth or something like that, instead of system-local-login.

  • sk-ssh-ed25519 keys work out of the box with OpenSSH.

    Part that gets loaded in ssh-agent is much less sensitive than the usual private-key - here it's just a cred-id blob that is useless without FIDO2 token, and even that can be stored on-device with Discoverable/Resident Creds, for some extra security or portability.

    SSH connections can easily be cached using ControlMaster / ControlPath / ControlPersist opts in the client config, so there's no need to repeat touch presence-check too often.

    One somewhat-annoying thing was with signing git commits - this can't be cached like ssh connections, and doing physical ack on every git commit/amend is too burdensome, but fix is easy too - add separate ssh key just for signing. Such key would naturally be less secure, but not as important as an access key anyway.

    Github supports adding "signing" ssh keys that don't allow access, but Codeberg (and its underlying Gitea) currently does not - access keys can be marked as "Verified", but can't be used for signing-only on the account, which will probably be fixed, eventually, not a huge deal.

  • Early-boot LUKS / dm-crypt disk encryption unlock with offline key and a simplier + properly rate-limited "pin", instead of a long and hard-to-type passphrase.

    systemd-cryptenroll can work for that, if you have typical "Full Disk Encryption" (FDE) setup, with one LUKS-encrypted SSD, but that's not the case for me.

    I have more flexible LUKS-on-LVM setup instead, where some LVs are encrypted and needed on boot, some aren't, some might have fscrypt, gocryptfs, some other distro or separate post-boot unlock, etc etc.

    systemd-cryptenroll does not support such use-case well, as it generates and stores different credentials for each LUKS volume, and then prompts for separate FIDO2 user verification/presence check for each of them, while I need something like 5 unlocks on boot - no way I'm doing same thing 5 times, but it is unavoidable with such implementation.

    So had to make my own key-derivation fido2-hmac-boot tool for this, described in more detail separately below.

  • Management of legacy passwords, passphrases, pins, other secrets and similar sensitive strings of information - described in a lot more detail in an earlier "FIDO2 hardware password/secret management" post.

    This works great, required an (simple) extra binary, and integrating it into emacs for my purposes, but also easy to setup in various other ways, and a lot better than all alternatives (memory + reuse, plaintext somewhere, crappy third-party services, paper, etc).

  • One notable problem with FIDO2 devices is that they don't really show what it is you are confirming, so as a user, I can think that it wants to authorize one thing, while whatever compromised code secretly requests something else from the token.

    But that's reasonably easy to mitigate by splitting usage by different security level and rarity, then using multiple separate U2F/FIDO2 tokens for those, given how tiny and affordable they are these days - I ended up having three of them (so far!).

    So using token with "ssh-git" label, you have a good idea what it'd authorize.

Aside from reasonably-minor quirks mentioned above, it all was pretty common sense and straightforward for me, so can easily recommend migrating to workflows built around cheap FIDO2 smartcards on modern linux as a basic InfoSec hygiene - it doesn't add much inconvenience, and should be vastly superior to outdated (but still common) practices/rituals involving passwords or keys-in-files.

Given how all modern PC hardware has TPM2 chips in motherboards, and these can be used as a regular smartcard via PKCS#11 wrapper, they might also be a somewhat nice malware/tamper-proof cryptographic backend for various use-cases above.

From my perspective, they seem to be strictly inferior to using portable FIDO2 devices however:

  • Soldered on the motherboard, so can't be easily used in multiple places.

  • Will live/die, and have to be replaced with the motherboard.

  • Non-removable and always-accessible, holding persistent keys in there.

    Booting random OS with access to this thing seem to be a really bad idea, as ideally such keys shouldn't even be physically connected most of the time, especially to some random likely-untrustworthy software.

  • There is no physical access confirmation mechanism, so no way to actually limit it - anything getting ahold of the PIN is really bad, as secret keys can then be used freely, without any further visibility, rate-limiting or confirmation.

  • Motherboard vendor firmware security has a bad track record, and I'd rather avoid trusting crappy code there with anything extra. In fact, part of the point with having separate FIDO2 device is to trust local machine a bit less, if possible, not more.

So given that grabbing FIDO2 device(s) is an easy option, don't think TPM2 is even worth considering as an alternative to those, for all the reasons above, and probably a bunch more that I'm forgetting at the moment.

Might be best to think of TPM2 to be in the domain and managed by the OS vendor, e.g. leave it to Windows 11 and Microsoft SSO system to do trusted/measured boot and store whatever OS-managed secrets, being entirely uninteresting and invisible to the end-user.

As also mentioned above, least well-supported FIDO2-backed thing for me was early-boot dm-crypt / LUKS volume init - systemd-cryptenroll requires unlocking each encrypted LUKS blkdev separately, re-entering PIN and re-doing the touch thing multiple times in a row, with a somewhat-uncommon LUKS-on-LVM setup like mine.

But of course that's easily fixable, having following steps with a typical systemd init process:

  • Starting early on boot or in initramfs, Before=cryptsetup-pre.target, run service to ask for FIDO2 token PIN via systemd-ask-password, then use that with FIDO2 token and its hmac-secret extension to produce secure high-entropy volume unlock key.

    If PIN or FIDO2 interaction won't work, print error and repeat the query, or exit if prompt is cancelled to fallback to default systemd passphrase unlocking.

  • Drop that key into /run/cryptsetup-keys.d/ dir for each volume that it needs to open, with whatever extra per-volume alterations/hashing.

  • Let systemd pass cryptsetup.target, where systemd-cryptsetup will automatically lookup volume keys in that dir and use them to unlock devices.

    If any keys won't work or missing, systemd will do the usual passphrase-prompting and caching, so there's always a well-supported first-class fallback unlock-path.

  • Run early-boot service to cleanup after cryptsetup.target, Before=sysinit.target, to remove /run/cryptsetup-keys.d/ directory, as everything should be unlocked by now and these keys are no longer needed.

I'm using common dracut initramfs generator with systemd here, where it's easy to add a custom module that'd do all necessary early steps outlined above.

fido2_hmac_boot.nim implements all actual asking and FIDO2 operations, and can be easily run from an initramfs systemd unit file like this (fhb.service):


# Should be ordered same as stock systemd-pcrphase-initrd.service
Conflicts=shutdown.target initrd-switch-root.target
Before=sysinit.target cryptsetup-pre.target cryptsetup.target
Before=shutdown.target initrd-switch-root.target systemd-sysext.service

ExecStart=/sbin/fhb /run/initramfs/fhb.key
ExecStart=/bin/sh -c '\
  key=/run/initramfs/fhb.key; [ -e "$key" ] || exit 0; \
  mkdir -p /run/cryptsetup-keys.d; while read dev line; \
  do cat "$key" >/run/cryptsetup-keys.d/"$dev".key; \
  done < /etc/fhb.devices; rm -f "$key"'

With that fhb.service file and compiled binary itself installed via module-setup.sh in the module dir:


check() {
  require_binaries /root/fhb || return 1
  return 255 # only include if asked for

depends() {
  echo 'systemd crypt fido2'
  return 0

install() {
  # fhb.service starts binary before cryptsetup-pre.target to create key-file
  inst_binary /root/fhb /sbin/fhb
  inst_multiple mkdir cat rm
  inst_simple "$moddir"/fhb.service "$systemdsystemunitdir"/fhb.service
  $SYSTEMCTL -q --root "$initdir" add-wants initrd.target fhb.service

  # Some custom rules might be relevant for making consistent /dev symlinks
  while read p
  do grep -qiP '\b(u2f|fido2)\b' "$p" && inst_rules "$p"
  done < <(find /etc/udev/rules.d -maxdepth 1 -type f)

  # List of devices that fhb.service will create key for in cryptsetup-keys.d
  # Should be safe to have all "auto" crypttab devices there, just in case
  while read luks dev key opts; do
    [[ "${opts//,/ }" =~ (^| )noauto( |$) ]] && continue
    echo "$luks"
  done <"$dracutsysrootdir"/etc/crypttab >"$initdir"/etc/fhb.devices
  mark_hostonly /etc/fhb.devices

Module would need to be enabled via e.g. add_dracutmodules+=" fhb " in dracut.conf.d, and will include the "fhb" binary, service file to run it, list of devices to generate unlock-keys for in /etc/fhb.devices there, and any udev rules mentioning u2f/fido2 from /etc/udev/rules.d, in case these might be relevant for consistent device path or whatever other basic device-related setup.

fido2_hmac_boot.nim "fhb" binary can be built (using C-like Nim compiler) with all parameters needed for its operation hardcoded via e.g. -d:FHB_CID=... compile-time options, to avoid needing to bother with any of those in systemd unit file or when running it anytime on its own later.

It runs same operation as fido2-assert tool, producing HMAC secret for specified Credential ID and Salt values. Credential ID should be created/secured prior to that using related fido2-token and fido2-cred binaries. All these tools come bundled with libfido2.

Since systemd doesn't nuke /run/cryptsetup-keys.d by default (keyfile-erase option in crypttab can help, but has to be used consistently for each volume), custom unit file to do that can be added/enabled to main systemd as well:


ExecStart=rm -rf /run/cryptsetup-keys.d


And that should do it for implementing above early-boot unlocking sequence.

To enroll the key produced by "fhb" binary into LUKS headers, simply run it, same as early-boot systemd would, and luksAddKey its output.

Couple additional notes on all this stuff:

  • HMAC key produced by "fhb" tool is a high-entropy uniformly-random 256-bit (32B) value, so unlike passwords, does not actually need any kind of KDF applied to it - it is the key, bruteforcing it should be about as infeasible as bruteforcing 128/256-bit master symmetric cipher key (and likely even harder).

    Afaik cryptsetup doesn't support disabling KDF for key-slot entirely, but --pbkdf pbkdf2 --pbkdf-force-iterations 1000 options can be used to set fastest parameters and get something close to disabling it.

  • cryptsetup config --key-slot N --priority prefer can be used to make systemd-cryptsetup try unlocking volume with this no-KDF keyslot quickly first, before trying other slots with memory/cpu-heavy argon2id and such proper PBKDF, which should almost always be a good idea to do in this order, as it should take almost no time to try 1K-rounds PBKDF2 slot.

  • Ideally each volume should have its own sub-key derived from one that fhb outputs, e.g. via simple HMAC-SHA256(volume-uuid, key=fhb.key) operation, which is omitted here for simplicitly.

    fhb binary includes --hmac option for that, to use instead of "cat" above:

    fhb --hmac "$key" "$dev" /run/cryptsetup-keys.d/"$dev".key

    Can be added to avoid any of LUKS keys/keyslots being leaked or broken (for some weird reason) to have any effect on other keys - reversing such HMAC back to fhb.key to use it for other volumes would still be cryptographically infeasible.

Custom fido2_hmac_boot.nim binary/code used here is somewhat similar to an earlier fido2-hmac-desalinate.c that I use for password management (see above), but a bit more complex, so is written in an easier and much nicer/safer language (Nim), while still being compiled through C to pretty much same result.

Jan 08, 2023

Pushing git-notes to one specific remote via pre-push hook

I've recently started using git notes as a good way to track metadata associated with the code that's likely of no interest to anyone else, and would only litter git-log if was comitted and tracked in the repo as some .txt file.

But that doesn't mean that they shouldn't be backed-up, shared and merged between different places where you yourself work on and use that code from.

Since I have a git mirror on my own host (as you do with distributed scm), and always clone from there first, adding other "free internet service" remotes like github, codeberg, etc later, it seems like a natural place to push such notes to, as you'd always pull them from there with the repo.

That is not straightforward to configure in git to do on basic "git push" however, because "push" operation there works with "[<repository> [<refspec>...]]" destination concept. I.e. you give it a single remote for where to push, and any number of specific things to update as "<src>[:<dst>]" refspecs.

So when "git push" is configured with "origin" having multiple "url =" lines under it in .git/config file (like home-url + github + codeberg), you don't get to specify "push main+notes to url-A, but only main to url-B" - all repo URLs get same refs, as they are under same remote.

Obvious fix conceptually is to run different "git push" commands to different remotes, but that's a hassle, and even if stored as an alias, it'd clash with muscle memory that'll keep typing "git push" out of habit.

Alternative is to maybe override git-push command itself with some alias, but git explicitly does not allow that, probably for good reasons, so that's out as well.

git-push does run hooks however, and those can do the extra pushes depending on the URL, so that's an easy solution I found for this:

set -e

notes_url=$(git remote get-url "$notes_remote")
notes_ref=$(git notes get-ref)

push_remote=$1 push_url=$2
[ "$push_url" = "$notes_url" ] || exit 0

master_push= master_oid=$(git rev-parse master)
while read local_ref local_oid remote_ref remote_oid; do
  [ "$local_oid" = "$master_oid" ] && master_push=t && break || continue
[ -n "$master_push" ] || exit 0

echo "--- notes-push [$notes_remote]: start -> $notes_ref ---"
git push --no-verify "$notes_remote" "$notes_ref"
echo "--- notes-push [$notes_remote]: success ---"

That's a "pre-push" hook, which pushes notes-branch only to "home" remote, when running a normal "git push" command to a "master" branch (to be replaced with "main" in some repos).

Idea is to only augment "normal" git-push, and don't bother running this on every weirder updates or tweaks, keeping git-notes generally in sync between different places where you can use them, with no cognitive overhead in a day-to-day usage.

As a side-note - while these notes are normally attached to commits, for something more global like "my todo-list for this project" not tied to specific ref that way, it's easy to attach it to some descriptive tag like "todo", and use with e.g. git notes edit todo, and track in the repo as well.

Jan 04, 2023

FIDO2 hardware password/secret management

Passwords are bad, and they leak, but services are slow to adopt other auth methods - even TOTP is better, and even for 1-factor-auth (e.g using oathtool).

But even without passwords, there are plenty of other easy-to-forget secrets to store in a big pile somewhere, like same TOTP seed values, API keys, government ID numbers, card PINs and financial info, security answers, encryption passphrases, and many other stuff.

  • Easiest thing is to dump all these into a big .txt file somewhere.

    Problem: any malware, accidental or deliberate copy ("evil maid"), or even a screen-scrape taken at an unfortunate time exposes everything!

    And these all seem to be reasonably common threats/issues.

  • Next best thing - store that file in some encrypted form.

    Even short-lived compromise can get the whole list along with the key from memory, and otherwise it's still reasonably easy to leak both key/passphrase and ciphertext over time separately, esp. with long-lived keys.

    It's also all on-screen when opened, can be exposed/scraped from there, but still an improvement over pure plaintext, at the expense of some added key-management hassle.

  • Encrypt whole file, but also have every actual secret in there encrypted separately, with unique key for each one:

      Apex Finance:
        url: online.apex-finance.com
        login: jmastodon
        pw: fhd.eop0.aE6H/VZc36ZPM5w+jMmI
        email: reg.apexfinance.vwuk@jmastodon.me
        current visa card:
          name: JOHN MASTODON
          no: 4012888888881881
          cvv2: fhd.KCaP.QHai
          expires: 01/28
          pin: fhd.y6En.tVMHWW+C
      Team Overdraft: ...
    google account:
      -- note: FIDO2 2FA required --
      login/email: j.x.mastodon789@gmail.com
      pw: fhd.PNgg.HdKpOLE2b3DejycUGQO35RrtiA==
      recovery email: reg.google.ce21@jmastodon.me
      API private key: fhd.pxdw.QOQrvLsCcLR1X275/Pn6LBWl72uwbXoo/YiY

    In this case, even relatively long-lived malware/compromise can only sniff secrets that were used during that time, and it's mostly fine if this ends up being opened and scrolled-through on a public video stream or some app screencaps it by accident (or not) - all important secrets are in encrypted "fhd.XXX.YYY" form.

    Downside of course is even more key management burden here, since simply storing all these unique keys in a config file or a list won't do, as it'll end up being equivalent to "encrypted file + key" case against leaks or machine compromise.

  • Storing encryption keys defeats the purpose of the whole thing, typing them is insecure vs various keyloggers, and there's also way too many to remember!

    FIDO2 USB token on a keychain

    Solution: get some cheap FIDO2 hardware key to do all key-management for you, and then just keep it physically secure, i.e. put it on the keychain.

    This does not require remembering anything (except maybe a single PIN, if you set one, and can remember it reliably within 8 attempts), is reasonably safe against all common digital threats, and pretty much as secure against physical ones as anything can be (assuming rubber-hose cryptoanalysis works uniformly well), if not more secure (e.g. permanent PIN attempts lockout).

Given the recent push for FIDO2 WebAuthn-compatible passkeys by major megacorps (Google/Apple/MS), and that you'd probably want to have such FIDO2 token for SSH keys and simple+secure full disk encryption anyway, there seems to be no good reason not to use it for securing passwords as well, in a much better way than with any memorized or stored-in-a-file schemes for secrets/keys, as outlined above.

There's no go-to way to do this yet (afaik), but all tools to implement it exist.

Filippo Valsorda described one way to do it via plugin for a common "age" encryption tool in "My age+YubiKeys Password Management Solution" blog post, using Yubikey-specific PIV-smartcard capability (present in some of Yubico tokens), and a shell script to create separate per-service encrypted files.

I did it a bit differently, with secrets stored alongside non-secret notes and other info/metadata, and with a common FIDO2-standard hmac-secret extension (supported by pretty much all such devices, I think?), used in the following way:

  • Store ciphertext as a "fhd.y6En.tVMHWW+C" string, which is:

    "fhd." || base64(salt) || "." || base64(wrapped-secret)

    And keep those in the common (encrypted) list of various important info, like in the "encrypted separately" example above, to view/edit with the usual emacs.

  • When specific secret or password is needed, point to it and press "copy decrypted" hotkey (as implemented by fhd-crypt in my emacs).

  • Parsing that "fhd. ..." string gets "y6En" salt value, and it is sent to USB/NFC token in the assertion operation (same as fido2-assert cli tool runs).

  • Hardware token user-presence/verification requires you to physically touch button on the device (or drop it onto NFC pad), and maybe also enter a PIN or pass whatever biometric check, depending on device and its configuration (see fido2-token tool for that).

  • Token/device returns "hmac-sha256(salt, key=secret-generated-on-device)", unique and unguessable for that salt value, which is then used to decrypt "tVMHWW+C" part of the fhd-string into original "secret" string (via simple XOR).

  • Resulting "secret" value is copied into clipboard, to use wherever it was needed.

This ensures that every single secret string in such password-list is only decryptable separately, also demanding a separate physical verification procedure, very visible and difficult to do unintentionally, same as with WebAuthn.

Only actual secret key in this case resides on a FIDO2 device, and is infeasible to extract from there, for any common threat model at least.

Encryption/wrapping of secret-string to fhd-string above works in roughly same way - generate salt value, send to token, get back HMAC and XOR it with the secret, cutting result down to that secret-string length.

Last part introduces a small info-leak - secret length - but don't think that should be an issue in practice (always use long random passwords), while producing nicer short ciphertexts.

There are also still some issues with using these physical dongles in a compomised environment, which can lie about what it is being authorized by a device, as they usually have no way to display that, but it's still a big improvement, and can be somewhat mitigated by using multiple tokens for different purposes.

I've wrapped all these crypto bits into a simple C fido2-hmac-desalinate tool here:


Which needs "Relying Party ID" value to compile - basically an unique hostname that ideally won't be used for anything else with that authenticator (e.g. "token1.fhd.jmastodon.me" for some owned domain name), which is itself not a secret of any kind.

FIDO2 "credential" can be generated and stored on device first, using cli tools that come with libfido2, for example:

% fido2-token -L
% fido2-cred -M -rh -i cred.req.txt -o cred.info.txt /dev/hidraw5 eddsa

Such credential would work well on different machines with authenticators that support FIDO2 Discoverable Credentials (aka Resident Keys), with HMAC key stored on the same portable authenticator, but for simplier tokens that don't support that and have no storage, static credential-id value (returned by fido2-cred tool without "-r" option) also needs to be built-in via -DFHD_CID= compile-time parameter (and is also not a secret).

(technically that last "credential-id value" has device-master-key-wrapped HMAC-key in it, but it's only possible to extract from there by the device itself, and it's never passed or exposed anywhere in plaintext at any point)

On the User Interface side, I use Emacs text editor to open/edit password-list (also transparently-encrypted/decrypted using ghg tool), and get encrypted stuff from it just by pointing at the needed secret and pushing the hotkey to copy its decrypted value, implemented by fhd-crypt routine here:


(also, with universal-arg, fhd-crypt encrypts/decrypts and replaces pointed-at or region-selected thing in-place, instead of copying into clipboard)

Separate binary built against common libfido2 ensures that it's easy to use such secret strings in any other way too, or fallback to manually decoding them via cli, if necessary.

At least until push for passkeys makes no-password WebAuthn ubiquitous enough, this seem to be the most convenient and secure way of password management for me, but auth passwords aren't the only secrets, so it likely will be useful way beyond that point as well.

One thing not mentioned above are (important!) backups for that secret-file. I.e. what if FIDO2 token in question gets broken or lost? And how to keep such backup up-to-date?

My simple fix is having a shell script that does basically this:

set -eo pipefail
echo "### Paste new entry, ^D after last line to end, ^C to cancel"
echo "### Make sure to include some context for it - headers at least"
chunk=$(ghg -eo -r some-public-key | base64 -w80)
echo -e "--- entry [ $(date -Is) ]\n${chunk}\n--- end\n" >>backup.log

Then on any updates, I run this script and paste the updated plaintext secret-block into it, before encrypting all secrets in that block for good.

It does one-way public-key encryption (using ghg tool, but common age or GnuPG will work just as well), to store those encrypted updates, which can then be safely backed-up alongside the main (also encrypted) list of secrets, and everything can be restored from these using corresponding secure private key (ideally not exposed or used anywhere for anything outside of such fallback-recovery purposes).

And one more aside - since plugging devices into USB rarely aligns correctly on the first try (USB curse), is somewhat tedious, and can potentially wear-out contacts or snap-off the device, I've grabbed a cheap PC/SC-compatible ACR122U NFC reader from aliexpress, and have been using it instead of a USB interface, as modern FIDO2 tokens tend to support NFC for use with smartphones.

It works great for this password-management purpose, placing the key on NFC pad works instead of the touch presence-check with USB (at least with cheap Yubico Security Key devices), with some short (<1 minute) timeout on the pad in which token stops responding with ERR_PIN, to avoid misuse if one forgets to remove it.

libfido2 supports PC/SC interface, and PCSC lite project providing it on typical linux distros seem to support pretty much all NFC readers in existance.

libfido2 is in turn used by systemd, OpenSSH, pam-u2f, its fido2-token/cred/assert cli, my fido2-hmac-desalinate password-management hack above, and many other tools. So through it, all these projects automatically have easy and ubiquitous NFC support too.

(libfido2 also supports linux kernel AF_NFC interface in addition to PC/SC one, which works for much narrower selection of card-readers implemented by in-kernel drivers, so PC/SC might be easier to use, but kernel interface doesn't need an extra pcscd dependency, if works for your specific reader)

Notable things that don't use that lib and have issues with NFC seem to be browsers - both Firefox and Chromium on desktop (and their forks, see e.g. mozbug-1669870) - which is a shame, but hopefully will be fixed there eventually.

Nov 30, 2022

How to reliably set MTU on a weird (batman-adv) interface

I like and use B.A.T.M.A.N. (batman-adv) mesh-networking protocol on the LAN, to not worry about how to connect local linuxy things over NICs and WiFi links into one shared network, and been using it for quite a few years now.

Everything sensitive should run over ssh/wg links anyway (or ipsec before wg was a thing), so it's not a problem to have any-to-any access in a sane environment.

But due to extra frame headers, batman-adv benefits from either lower MTU on the overlay interface or higher MTU on all interfaces which it runs over, to avoid fragmentation. Instead of remembering to tweak all other interfaces, I think it's easier to only bother with one batman-adv iface on each machine, but somehow that proved to be a surprising challenge.

MTU on iface like "bat0" jumps on its own when slave interfaces in it change state, so obvious places to set it, like networkd .network/.netdev files or random oneshot boot scripts don't work - it can/will randomly change later (usually immediately after these things set it on boot) and you'll only notice when ssh or other tcp conns start to hang mid-session.

One somewhat-reliable and sticky workaround for having issues is to mangle TCP MSS by the firewall (e.g. nftables), so that MTU changes are not an issue for almost all connections, but that still leaves room for issues and fragmentation in a few non-TCP things, and is obviously a hack - wrong MTU value is still there.

After experimenting with various "try to set mtu couple times after delay", "wait for iface state and routes then set mtu" and such half-measures - none of which worked reliably for that odd interface - here's what I ended up with:



Environment=IF=bat0 MTU=1440
ExecStartPre=/usr/lib/systemd/systemd-networkd-wait-online -qi ${IF}:off --timeout 30
ExecStart=bash -c 'rl=0 rl_win=100 rl_max=20 rx=" mtu [0-9]+ "; \
  while read ev; do [[ "$ev" =~ $rx ]] || continue; \
    printf -v ts "%%(%%s)T" -1; ((ts-=ts%%rl_win)); ((rld=++rl-ts)); \
    [[ $rld -gt $rl_max ]] && exit 59 || [[ $rld -lt 0 ]] && rl=ts; \
    ip link set dev $IF mtu $MTU || break; \
  done < <(ip -o link show dev $IF; exec stdbuf -oL ip -o monitor link dev $IF)'



It's a "F this sh*t" approach of "anytime you see mtu changing, change it back immediately", which seem to be the only thing that works reliably so far.

Couple weird things in there on top of "ip monitor" loop are:

  • systemd-networkd-wait-online -qi ${IF}:off --timeout 30

    Waits for interface to appear for some time before either restarting the .service, or failing when StartLimitBurst= is reached.

    The :off networkd "operational status" (see networkctl(1)) is the earliest one, and enough for "ip monitor" to latch onto interface, so good enough here.

  • rl=0 rl_win=100 rl_max=20 and couple lines with exit 59 on it.

    This is rate-limiting in case something else decides to manage interface' MTU in a similar "persistent" way (at last!), to avoid pulling the thing back-and-forth endlessly in a loop, or (over-)reacting to interface state flapping weirdly.

    I.e. stop service with failure on >20 relevant events within 100s.

  • Restart=on-success to only restart on "break" when "ip link set" fails if interface goes away, limited by StartLimit*= options to also fail eventually if it does not (re-)appear, or if that operation fails consistently for some other reason.

With various overlay tunnels becoming commonplace lately, MTU seem to be set incorrectly by default about 80% of the time, and I almost feel like I'm done fighting various tools with their way of setting it guessed/hidden somewhere (if implemented at all), and should just extend this loop into a more generic system-wide "mtud.service" that'd match interfaces by wildcard and enforce some admin-configured MTU values, regardless of whatever creating them (wrongly) thinks might be the right value.

As seem to be common with networking stuff - you either centralize configuration like that on a system, or deal with constant never-ending stream of app failures. Other good example here are in-app ACLs, connection settings and security measures vs system firewalls and wg tunnels, with only latter actually working, and former proven to be an utter disaster for decades now.

Nov 25, 2022

Information as a disaggregated buffet instead of firehose or a trough

Following information sources on the internet has long been compared to "drinking from a firehose", and trying to "keep up" with that is how I think people end up with hundreds of tabs in some kind of backlog, overflowing inboxes, feeds, podcast queues, and feeling overwhelmed in general.

Main problem for me with that (I think) was aggregation - I've used web-based feed reader in the past, aggregated feeds from slashdot/reddit/hn/lobsters (at different times), and followed "influencers" on twitter to get personally-curated feeds from there in one bundle - and none of it really worked well.

For example, even when following podcast feeds, you end up with a backlog (of things to listen to) that is hard to catch-up with - natually - but it also adds increasingly more tracking and balancing issues, as simply picking things in order will prioritize high-volume stuff, and sometimes you'll want to skip the queue and now track "exceptions", while "no-brainer" pick pulls increasingly-old and irrelevant items first.

Same thing tends to happen with any bundle of feeds that you try to "follow", which always ends up being overcrowded in my experience, and while you can "declare bankrupcy" resetting the follow-cursor to "now", skipping backlog, that doesn't solve fundamental issue with this "following" model - you'll just fall behind again, likely almost immediately - the approach itself is wrong/broken/misguided.

Obvious "fix" might be to curate feeds better so that you can catch-up, but if your interests are broad enough and changing, that's rarely an option, as sources tend to have their own "take it or leave it" flow rate, and narrowing scope to only a selection that you can fully follow is silly and unrepresentative for those interests, esp. if some of them are inconsistent or drown out others, even while being generally low-noise.

Easy workable approach, that seem to avoid all issues that I know of, and worked for me so far, goes something like this:

  • Bookmark all sources you find interesting individually.
  • When you want some podcast to listen-to, catch-up on news in some area, or just an interesting thing to read idly - remember it - as in "oh, would be nice to read/know-about/listen-to this right now".
  • Then pull-out a bookmark, and pick whatever is interesting and most relevant there, not necessarily latest or "next" item in any way.

This removes the mental burden of tracking and curating these sources, balancing high-traffic with more rarefied ones, re-weighting stuff according to your current interests, etc - and you don't loose on anything either!

I.e. with something relevant to my current interests I'll remember to go back to it for every update, but stuff that is getting noisy or falling off from that sphere, or just no longer entertaining or memorable, will naturally slip my mind more and more often, and eventually bookmark itself can be dropped as unused.

Things that I was kinda afraid-of with such model before - and what various RSS apps or twitter follows "help" to track:

  • I'll forget where/how to find important info sources.
  • Forget to return to them.
  • Miss out on some stuff there.
  • Work/overhead of re-checking for updates is significant.

None of these seem to be an issue in practice, as most interesting and relevant stuff will natually be the first thing that will pop-up in memory to check/grab/read, you always "miss out" on something when time is more limited than amount of interesting goodies (i.e. it's a problem of what to miss-out on, not whether you do or don't), and time spent checking couple bookmarks is a rounding error compared to processing the actual information (there's always a lot of new stuff, and for something you check obsessively, you'd know the rate well).

This kind of "disaggregated buffet" way of zero-effort "controlling" information intake is surprisingly simple, pretty much automatic (happens on its own), very liberating (no backlog anywhere), and can be easily applied to different content types:

  • Don't get podcast rss-tracking app, bookmark individual sites/feeds instead.

    I store RSS/Atom feed URLs under one bookmarks-dir in Waterfox, and when wanting for something to listen on a walk or while doing something monotonous, remember and pull out an URL via quickbar (bookmarks can be searched via * <query> there iirc, I just have completion/suggestions enabled for bookmarks only), run rss-get script on the link, pick specific items/ranges/links to download via its aria2c or yt-dlp (with its built-in SponsorBlock), play that.

  • Don't "follow" people/feeds/projects on twitter or fediverse/mastodon and then read through composite timeline, just bookmark all individual feeds (on whatever home instances, or via nitter) instead.

    This has a great added advantage of maintaining context in these platforms which are notoriously bad for that, i.e. you read through things as they're posted in order, not interleaved with all other stuff, or split over time.

    Also this doesn't require an account, running a fediverse instance, giving away your list of follows (aka social graph), or having your [current] interests being tracked anywhere (even if "only" for bamboozling you with ads on the subject to death).

    With many accounts to follow and during some temporary twitter/fediverse duplication, I've also found it useful (so far) to have a simple ff-cli script to "open N bookmarks matching /@", when really bored, and quickly catch-up on something random, yet interesting enough to end up being bookmarked.

  • Don't get locked into subscribing or following "news" media that is kinda shit.

    Simply not having that crap bundled with other things in same reader/timeline/stream/etc will quickly make brain filter-out and "forget" sources that become full of ads, <emotion>bait, political propaganda and various other garbage, and brain will do such filtering all in the background on its own, without wasting any time or conscious cycles.

    There's usually nothing of importance to miss with such sources really, as it's more like taking a read on the current weather, only occasionally interesting/useful, and only for current/recent stuff.

It's obviously not a guide to something "objectively best", and maybe only works well for me this way, but as I've kinda-explained it (poorly) in chats before, thought to write it down here too - hopefully somewhat more coherently - and maybe just link to later from somewhere.

Nov 18, 2022

AWK script to convert long integers to human-readable number format and back

Haven't found anything good for this on the internet before, and it's often useful in various shell scripts where metrics or tunable numbers can get rather long:

3400000000 -> 3_400M
500000 -> 500K
8123455009 -> 8_123_455_009

That is, to replace long stretches of zeroes with short Kilo/Mega/Giga/etc suffixes and separate the usual 3-digit groups by something (underscore being programming-language-friendly and hard to mistake for field separator), and also convert those back to pure integers from cli arguments and such.

There's numfmt in GNU coreutils, but that'd be missing on Alpine, typical network devices, other busybox Linux distros, *BSD, MacOS, etc, and it doesn't have "match/convert all numbers you want" mode anyway.

So alternative is using something that is available everywhere, like generic AWK, with a reasonably short scripts to implement that number-mangling logic.

  • Human-format all space-separated long numbers in stdin, like in example above:

    awk '{n=0; while (n++ <= NR) { m=0
      while (match($n,/^[0-9]+[0-9]{3}$/) && m < 5) {
        if (substr($n,k+1)=="000") { $n=substr($n,1,k); m++ }
        else while (match($n,/[0-9]{4}(_|$)/))
          $n = substr($n,1,RSTART) "_" substr($n,RSTART+1) }
      $n = $n substr("KMGTP", m, m ? 1 : 0) }; print}'
  • Find all human-formatted numbers in stdin and convert them back to long integers:

    awk '{while (n++ <= NR) {if (match($n,/^([0-9]+_)*[0-9]+[KMGTP]?$/)) {
      sub("_","",$n); if (m=index("KMGTP", substr($n,length($n),1))) {
        $n=substr($n,1,length($n)-1); while (m-- > 0) $n=$n "000" } }}; print}'

    I.e. reverse of the operation above.

Code is somewhat compressed for brevity within scripts where it's not the point. It should work with any existing AWK implementations afaik (gawk, nawk, busybox awk, etc), and not touch any fields that don't need such conversion (as filtered by the first regexp there).

Line-match pattern can be added at the start to limit conversion to lines with specific fields (e.g. match($1,/(Count|Threshold|Limit|Total):/) {...}), "n" bounds and $n regexps adjusted to filter-out some inline values.

Numbers here will use SI prefixes, not 2^10 binary-ish increments, like in IEC units (kibi-, mebi-, gibi-, etc), as is more useful in general, but substr() and "000"-extension can be replaced by /1024 (and extra -iB unit suffix) when working with byte values - AWK can do basic arithmetic just fine too.

Took me couple times misreading and mistyping overly long integers from/into scripts to realize that this is important enough and should be done better than counting zeroes with arrow keys like some tech-barbarian, even in simple bash scripts, and hopefully this hack might eventually pop-up in search for someone else coming to that conclusion as well.

Oct 21, 2022

Useful git hook - prepare-commit-msg with repo path, branch and last commits

I've been using this as a prepare-commit-msg hook everywhere for couple years now:

msg_file=$1 commit_src=$2 hash=$3
[[ -z "$msg_file" || "$GIT_EDITOR" = : ]] && exit 0
[[ -z "$commit_src" ]] || exit 0 # cherry-picks, merges, etc

echo >>"$msg_file" '#'
echo >>"$msg_file" "# Commit dir: ${GIT_PREFIX%/}"
echo >>"$msg_file" "#   Repo dir: $(realpath "$PWD")"
echo >>"$msg_file" '#'
git log -10 --format='# %s' >>"$msg_file"

Which saves a lot of effort of coming up with commit-messages, helps in monorepos/collections and to avoid whole bunch of mistakes.

Idea is that instead of just "what is to be comitted", comment below commit-msg in $EDITOR will now include something like this:

# On branch master
# Changes to be committed:
# modified:   PKGBUILD
# new file:   9005.do-some-other-thing.patch
# Commit dir: waterfox
#   Repo dir: /home/myuser/archlinux-pkgbuilds
# waterfox: +9004.rebind_screenshot_key_to_ctrl_alt_s.patch
# waterfox: fix more build/install issues
# waterfox: +fix_build_with_newer_cbindgen.patch
# waterfox: update to G5.0.1
# +re2g-git
# waterfox: bump to G4.1.4
# +mount-idmapped-git
# telegram-tdlib-purple-*: makedepends=telegram-tdlib - static linking
# waterfox: update for G4.1.1
# +b2tag-git

It helps as a great sanity-check and reminder of the following things:

  • Which subdir within the repo you are working in, e.g. "waterfox" pkg above, so that it's easy to identify and/or use that as a part of commit message.

    With a lot of repositories I work with, there are multiple subdirs and components in there, not to mention collection-repos, and it's useful to have that in the commit msg - I always try to add them as a prefix, unless repo uses entirely different commit message style (and has one).

  • What is the repository directory that you're running "git commit" in.

    There can be a local dev repo, a submodule of it in a larger project, sshfs-mounted clone of it somewhere, more local clones for diff branches or different git-worktree dirs, etc.

    This easily tells that you're in the right place (or where you think you are), usually hard to miss by having repo under local simple dev dir, and not some weird submodule path or mountpoint in there.

  • List of last commits on this branch - incredibly useful for a ton of reasons.

    For one, it easily keeps commit-msgs consistent - you don't use different language, mood and capitalization in there by forgetting what is the style used in this particular repo, see any relevant issue/PR numbers, prefixes/suffixes.

    But also it immediately shows if you're on the wrong branch, making a duplicate commit by mistake, forgot to make commit/merge for something else important before this, undo some reset or other recent shenanigans - all at a glance.

    It was a usual practice for me to check git-log before every commit, and this completely eliminated the need for it.

Now when I don't see this info in the commit-msg comment, first thing to do is copy the hook script to whatever repo/host/place I'm working with, as it's almost as bad as not seeing which files you commit in there without it. Can highly recommend this tiny time-saver when working with any git repos from the command line.

Implementation has couple caveats, which I've added there over time:

[[ -z "$msg_file" || "$GIT_EDITOR" = : ]] && exit 0
[[ -z "$commit_src" ]] || exit 0 # cherry-picks, merges, etc

These lines are to skip running this hook for various non-interactive git operations, where anything you put into commit-msg will get appended to it verbatim, without skipping comment-lines, as it is done with interactive "git commit" ops.

Canonical version of the hook is in the usual mk-fg/fgtk dumping ground:


Which might get more caveats like above fixed in the future, should I bump into any, so might be better than current version in this post.

Oct 19, 2022

Make cursor stand-out more in Emacs by having it blink through different colors

When playing some game (Starsector, probably) on the primary display recently, having aux emacs frame (an extra X11 window) on the second one with some ERC chat in there (all chats translate to irc), I've had a minor bug somewhere and noticed that cursor in that frame/window wasn't of the usual translucent-green-foreground-text color (on a dark bg), but rather stayed simply #fff-white (likely because of some screwup in my theme-application func within that frame).

And an interesting observation is that it's actually really good when cursor is of a different color from both foreground-text and the background colors, because then it easily stands out against both, and doesn't get lost in a wall of text either - neat!

Quickly googling around for a hack to make this fluke permanent, stumbled upon this reply on Stack Overflow, which, in addition to easy (set-cursor-color ...) answer, went on to give an example of changing color on every cusor blink.

Which seems like a really simple way to make the thing stand out not just against bg/fg colors, but also against any syntax-highlighting color anywhere, and draw even more attention to itself, which is even better.

With only a couple lines replacing a timer-func that normally turns cursor on-and-off (aka blinking), now it blinks with a glorious rainbow of colors:


And at least I can confirm that it totally works for even greater/more-immediate visibility, especially in multiple emacs windows (e.g. usual vertical split showing two code buffers), where all these cursors now change colors and impossible to miss at a glance, so you always know which point in code you're jumping to upon switching there - can recommend to at least try it out.

Bit weird that most code editor windows seem to have narrow-line cursor, even if of a separate color, given how important that detail is within that window, compared to e.g. any other arbitrary glyph in there, which would be notably larger and often more distinct/visible, aside from blinking.

One interesting part with a set of colors, as usual, is to generate or pick these somehow, which can be done on a color wheel arbitrarily, with results often blending-in with other colors and each other, more-or-less.

But it's also not hard to remember about L*a*b* colorspace and pick colors that will be almost certainly distinct according to ranges there, as you'd do with a palette for dynamic lines on a graph or something more serious like that.

These days I'm using i want hue generator-page to get a long-ish list of colors within specified ranges for Lightness (L component, to stand-out against light/dark bg) and Chroma parameters, and then pass css list of these into a color-b64sort script, with -b/--bg-color and -c/--avoid-color settings/thresholds.

Output is a filtered list with only colors that are far enough from the specified ones, to stand-out against them in the window, but is also sorted to have every next color picked to be most visually-distinct against preceding ones (with some decay-weight coefficient to make more-recent diff[s] most relevant), so that color doesn't blink through similar hues in a row, and you don't have to pick/filter/order and test these out manually.

Which is the same idea as with the usual palette-picks for a line on a chart or other multi-value visualizations (have high contrast against previous/other values), that seem to pop-up like this in surprisingly many places, which is why at some point I just had to write that dedicated color-b64sort thingie to do it.

Tweaking parameters to only get as many farthest-apart colors as needed for blinks until cursor goes static (to avoid wasting CPU cycles blinking in an idle window), ended up with a kind of rainbow-cursor, counting bright non-repeating color hues in a fixed order, which is really nice for this purpose and also just kinda cool and fun to look at.

Oct 18, 2022

Revisiting POSIX ACLs and Capabilities in python some 15 years later

A while ago as I've discovered for myself using xattrs, ACLs and capabilities for various system tasks, and have been using those in python2-based tools via C wrappers for libacl and libcap (in old mk-fg/fgc repo) pretty much everywhere since then.

Tools worked without any issues for many years now, but as these are one of the last scripts left still in python2, time has come to update those, and revisit how to best access same things in python3.

Somewhat surprisingly, despite being supported on linux since forever, and imo very useful, support for neither ACLs nor capabilities haven't made it into python 3.10's stdlib, but there is now at least built-in support for reading/writing extended attributes (without ctypes, that is), and both of these are simple structs stored in them.

So, disregarding any really old legacy formats, parsing ACL from a file in a modern python can be packed into something like this:

import os, io, enum, struct, pwd, grp

class ACLTag(enum.IntEnum):
  uo = 0x01; u = 0x02; go = 0x04; g = 0x08
  mask = 0x10; other = 0x20
  str = property(lambda s: s._name_)

class ACLPerm(enum.IntFlag):
  r = 4; w = 2; x = 1
  str = property(lambda s: ''.join(
    (v._name_ if v in s else '-') for v in s.__class__ ))

def parse_acl(acl, prefix=''):
  acl, lines = io.BytesIO(acl), list()
  if (v := acl.read(4)) != b'\2\0\0\0':
    raise ValueError(f'ACL version mismatch [ {v} ]')
  while True:
    if not (entry := acl.read(8)): break
    elif len(entry) != 8: raise ValueError('ACL length mismatch')
    tag, perm, n = struct.unpack('HHI', entry)
    tag, perm = ACLTag(tag), ACLPerm(perm)
    match tag:
      case ACLTag.uo | ACLTag.go:
      case ACLTag.u | ACLTag.g:
          name = ( pwd.getpwuid(n).pw_name
            if tag is tag.u else grp.getgrgid(n).gr_name )
        except KeyError: name = str(n)
      case ACLTag.other: lines.append(f'o::{perm.str}')
      case ACLTag.mask: lines.append(f'm::{perm.str}')
      case _: raise ValueError(tag)
  lines.sort(key=lambda s: ('ugmo'.index(s[0]), s))
  return '\n'.join(f'{prefix}{s}' for s in lines)

p = 'myfile.bin'
xattrs = dict((k, os.getxattr(p, k)) for k in os.listxattr(p))
if acl := xattrs.get('system.posix_acl_access'):
  print('Access ACLs:\n' + parse_acl(acl, '  '))
if acl := xattrs.pop('system.posix_acl_default', ''):
  print('Default ACLs:\n' + parse_acl(acl, '  d:'))

Where it's just a bunch of 8B entries with uids/gids and permission bits in them, and capabilities are even simplier, except for ever-growing enum of them:

import os, io, enum, struct, dataclasses as dcs

CapSet = enum.IntFlag('CapSet', dict((cap, 1 << n) for n, cap in enumerate((
  ' chown dac_override dac_read_search fowner fsetid kill setgid setuid setpcap'
  ' linux_immutable net_bind_service net_broadcast net_admin net_raw ipc_lock'
  ' ipc_owner sys_module sys_rawio sys_chroot sys_ptrace sys_pacct sys_admin'
  ' sys_boot sys_nice sys_resource sys_time sys_tty_config mknod lease'
  ' audit_write audit_control setfcap mac_override mac_admin syslog wake_alarm'
  ' block_suspend audit_read perfmon bpf checkpoint_restore' ).split())))

class Caps: effective:bool; permitted:CapSet; inheritable:CapSet

def parse_caps(cap):
  cap = io.BytesIO(cap)
  ver, eff = ((v := struct.unpack('I', cap.read(4))[0]) >> 3*8) & 0xff, v & 1
  if ver not in [2, 3]: raise ValueError(f'Unsupported capability v{ver}')
  perm1, inh1, perm2, inh2 = struct.unpack('IIII', cap.read(16))
  if (n := len(cap.read())) != (n_tail := {2:0, 3:4}[ver]):
    raise ValueError(f'Cap length mismatch [ {n} != {n_tail} ]')
  perm_bits, inh_bits = perm2 << 32 | perm1, inh2 << 32 | inh1
  perm, inh = CapSet(0), CapSet(0)
  for c in CapSet:
    if perm_bits & c.value: perm |= c; perm_bits -= c.value
    if inh_bits & c.value: inh |= c; inh_bits -= c.value
  if perm_bits or inh_bits:
    raise ValueError(f'Unrecognized cap-bits: P={perm_bits:x} I={inh_bits:x}')
  return Caps(eff, perm, inh)

p = 'myfile.bin'
try: print(parse_caps(os.getxattr(p, 'security.capability')))
except OSError: pass

Bit weird that wrappers along these lines can't be found in today's python 3.10, but maybe most people sadly still stick to suid and more crude hacks where more complex access permissions are needed.

One interesting thing I found here is how silly my old py2 stracl.c and strcaps.c look in comparison - it's screenfuls of lines of more complicated C code, tied into python's c-api, and have to be compiled wherever these tools are used, with an extra python wrappers on top - all for parsing a couple of trivial structs, which under linux ABI compatibility promises, can be relied upon to be stable enough anyway.

Somehow it's been the obvious solution back then, to have compiler check all headers and link these libs as compatibility wrappers, but I'd never bother these days - it'll be either ctypes wrapper, or parsing simple stuff in python, to avoid having extra jank and hassle of dependencies where possible.

Makes me wonder if that's also the dynamic behind relatively new js/rust devs dragging in a bunch of crap (like the infamous left-pad) into their apps, still thinking that it'd make life simplier or due to some "good practice" dogmas.

May 30, 2022

LESSOUTPUT filter workaround for broken unicode en-dash characters in manpages

Using man <something> in the terminal as usual, I've been noticing more and more manpages being broken by tools that produce them over the years in this one specific way - long command-line options with double-dash are being mangled into having unicode en-dash "–" prefix instead of "--".

Most recent example that irked me was yt-dlp(1) manpage, which looks like this atm (excerpt as of 2022-05-30):

-h, –help
  Print this help text and exit
  Print program version and exit
-i, –ignore-errors
  Ignore download and postprocessing errors. The download
  will be considered successful even if the postprocessing fails
  Continue with next video on download errors;
  e.g. to skip unavailable videos in a playlist (default)
  Abort downloading of further videos if an error occurs
  (Alias: –no-ignore-errors)

If you habitually copy-paste any of the long opts there into a script (or yt-dlp config, as it happens), to avoid retyping these things, it won't work, because e.g –help cli option should of course actually be --help, i.e. have two ascii hyphens and not any kind of unicode dash characters.

From a brief look, this seem to happen because of conversion from markdown and probably not enough folks complaining about it, which is a pattern that I too chose to follow, and make a workaround instead of reporting a proper bug :)

(tbf, youtube-dl forks have like 1000s of these in tracker, and I'd rather not add to that, unless I have a patch for some .md tooling it uses, which I'm too lazy to look into)

"man" command (from man-db on linux) uses a pager tool to display its stuff in a terminal cli (controlled by PAGER= or MANPAGER= env vars), which is typically set to use less tool on desktop linuxes (with the exception of minimal distros like Alpine, where it comes from busybox).

"less" somewhat-infamously supports filtering of its output (which is occasionally abused in infosec contexts to make a file that installs rootkit if you run "less" on it), which can be used here for selective filtering and fixes when manpage is being displayed through it.

Relevant variable to set in ~/.zshrc or ~/.bashrc env for running a filter-script is:

LESSOPEN='|-man-less-filter %s'

With |- magic before man-less-filter %s command template indicating that command should be also used for pipes and when less is displaying stdin.

"man-less-filter" helper script should be in PATH, and can look something like this to fix issues in yt-dlp manpage excerpt above:

## Script to fix/replace bogus en-dash unicode chars in broken manpages
## Intended to be used with LESSOPEN='|-man-less-filter %s'

[ -n "$MAN_PN" ] || exit 0 # no output = use original input

# \x08 is backspace-overprint syntax that "man" uses for bold chars
# Bold chars are used in option-headers, while opts in text aren't bold
  `' s/\([ [:punct:]]\)–\([a-z0-9]\)/\1--\2/'
[ "$1" != - ] || exec sed "$seds"
exec sed "$seds" "$1"

It looks really funky for a reason - simply using s/–/--/ doesn't work, as manpages use old typewriter/teletype backspace-overtype convention for highlighting options in manpages.

So, for example, -h, –help line in manpage above is actually this sequence of utf-8 - -\x08-h\x08h,\x08, –\x08–h\x08he\x08el\x08lp\x08p\n - with en-dash still in there, but \x08 backspaces being used to erase and retype each character twice, which makes "less" display them in bold font in the terminal (using its own different set of code-bytes for that from ncurses/terminfo).

Simply replacing all dashes with double-hyphens will break that overtyping convention, as each backspace erases a single char before it, and double-dash is two of those.

Which is why the idea in the script above is to "exec sed" with two substitution regexps, first one replacing all overtyped en-dash chars with correctly-overtyped hyphens, and second one replacing all remaining dashes in the rest of the text which look like options (i.e. immediately followed by letter/number instead of space), like "Alias: –no-ignore-errors" in manpage example above, where text isn't in bold.

MAN_PN env-var check is to skip all non-manpage files, where "less" understands empty script output as "use original text without filtering". /bin/dash can be used instead of /bin/sh on some distros (e.g. Arch, where sh is usually symlinked to bash) to speed up this tiny script startup/runs.

Not sure whether "sed" might have to be a GNU sed to work with unicode char like that, but any other implementation can probably use \xe2\x80\x93 escape-soup instead of in regexps, which will sadly make them even less readable than usual.

Such manpage bugs should almost certainly be reported to projects/distros and fixed, instead of using this hack, but thought to post it here anyway, since google didn't help me find an exising workaround, and fixing stuff properly is more work.

Next → Page 1 of 16