Sep 25, 2016

nftables re-injected IPSec matching without xt_policy

As of linux-4.8, something like xt_policy is still - unfortunately - on the nftables TODO list, so to match traffic pre-authenticated via IPSec, some workaround is needed.

Obvious one is to keep using iptables/ip6tables to mark IPSec packets with old xt_policy module, as these rules interoperate with nftables just fine, with only important bit being ordering of iptables hooks vs nft chain priorities, which are rather easy to find in "netfilter_ipv{4,6}.h" files, e.g.:

enum nf_ip_hook_priorities {
  NF_IP_PRI_FIRST = INT_MIN,
  NF_IP_PRI_CONNTRACK_DEFRAG = -400,
  NF_IP_PRI_RAW = -300,
  NF_IP_PRI_SELINUX_FIRST = -225,
  NF_IP_PRI_CONNTRACK = -200,
  NF_IP_PRI_MANGLE = -150,
  NF_IP_PRI_NAT_DST = -100,
  NF_IP_PRI_FILTER = 0,
  NF_IP_PRI_SECURITY = 50,
  NF_IP_PRI_NAT_SRC = 100,
  NF_IP_PRI_SELINUX_LAST = 225,
  NF_IP_PRI_CONNTRACK_HELPER = 300,
  NF_IP_PRI_CONNTRACK_CONFIRM = INT_MAX,
  NF_IP_PRI_LAST = INT_MAX,
};

(see also Netfilter-packet-flow.svg by Jan Engelhardt for general overview of the iptables hook positions, nftables allows to define any number of chains before/after these)

So marks from iptables/ip6tables rules like these:

*raw
:PREROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A PREROUTING -m policy --dir in --pol ipsec --mode transport -j MARK --or-mark 0x101
-A OUTPUT -m policy --dir out --pol ipsec --mode transport -j MARK --or-mark 0x101
COMMIT

Will be easy to match in priority=0 input/ouput hooks (as NF_IP_PRI_RAW=-300) of nft ip/ip6/inet tables (e.g. mark and 0x101 == 0x101 accept)

But that'd split firewall configuration between iptables/nftables, adding more hassle to keep whole "iptables" thing initialized just for one or two rules.

xfrm transformation (like ipsec esp decryption in this case) seem to preserve all information about the packet intact, including packet marks (but not conntrack states, which track esp connection), which - as suggested by Florian Westphal in #netfilter - can be utilized to match post-xfrm packets in nftables by this preserved mark field.

E.g. having this (strictly before ct state {established, related} accept for stateful firewalls, as each packet has to be marked):

define cm.ipsec = 0x101
add rule inet filter input ip protocol esp mark set mark or $cm.ipsec
add rule inet filter input ip6 nexthdr esp mark set mark or $cm.ipsec
add rule inet filter input mark and $cm.ipsec == $cm.ipsec accept

Will mark and accept both still-encrypted esp packets (IPv4/IPv6) and their decrypted payload.

Note that this assumes that all IPSec connections are properly authenticated and trusted, so be sure not to use anything like that if e.g. opportunistic encryption is enabled.

Much simplier nft-only solution, though still not a full substitute for what xt_policy does, of couse.

Aug 31, 2016

Handy tool to wait for remote TCP port to open - TCP "ping"

Lack of some basic tool to "wait for connection" in linux toolkit always annoyed me to no end.

root@alarm~:~# reboot
Shared connection to 10.0.1.75 closed.

% ssh root@10.0.1.75

...time passes, ssh doesn't do anything...

ssh: connect to host 10.0.1.75 port 22: No route to host

% ssh root@10.0.1.75
ssh: connect to host 10.0.1.75 port 22: Connection refused
% ssh root@10.0.1.75
ssh: connect to host 10.0.1.75 port 22: Connection refused
% ssh root@10.0.1.75
ssh: connect to host 10.0.1.75 port 22: Connection refused

...[mashing Up/Enter] start it up already!...

% ssh root@10.0.1.75
ssh: connect to host 10.0.1.75 port 22: Connection refused
% ssh root@10.0.1.75

root@alarm~:~#

...finally!
Working a lot with ARM boards, can have this thing repeating few dozen times a day.
Same happens on every power-up, after fiddling with sd cards, etc.

And usually know for a fact that I'll want to reconnect to the thing in question asap and continue what I was doing there, but trying luck a few times with unresponsive or insta-failing ssh is rather counter-productive and just annoying.

Instead:

% tping 10.0.1.75 && ssh root@10.0.1.75
root@alarm~:~#

That's it, no ssh timing-out or not retrying fast enough, no "Connection refused" nonsense.

tping (code link, name is ping + fping + tcp ping) is a trivial ad-hoc script that opens new TCP connection to specified host/port every second (default for -r/--retry-delay) and polls connections for success/error/timeout (configurable) in-between, exiting as soon as first connection succeeds, which in example above means that sshd is now ready for sure.

Doesn't need extra privileges like icmp pingers do, simple no-deps python3 script.

Used fping as fping -qr20 10.0.1.75 && ssh root@10.0.1.75 before finally taking time to write that thing, but it does what it says on the tin - icmp ping, and usually results in "Connection refused" error from ssh, as there's gap between network and sshd starting.

One of these "why the hell it's not in coreutils or util-linux" tools for me now.

May 15, 2016

Debounce bogus repeated mouse clicks in Xorg with xbindkeys

My current Razer E-Blue mouse had this issue since I've got it - Mouse-2 / BTN_MIDDLE / middle-click (useful mostly as "open new tab" in browsers and "paste" in X) sometimes produces two click events (in rapid succession) instead of one.

It was more rare before, but lately it feels like it's harder to make it click once than twice.

Seem to be either hardware problem with debouncing circuitry or logic in the controller, or maybe a button itself not mashing switch contacts against each other hard enough... or soft enough (i.e. non-elastic), actually, given that they shouldn't "bounce" against each other.

Since there's no need to double-click that wheel-button ever, it looks rather easy to debounce the click on Xorg input level, by ignoring repeated button up/down events after producing the first full "click".

Easiest solution of that kind that I've found was to use guile (scheme) script with xbindkeys tool to keep that click-state data and perform clicks selectively, using xdotool:

(define razer-delay-min 0.2)
(define razer-wait-max 0.5)
(define razer-ts-start #f)
(define razer-ts-done #f)
(define razer-debug #f)

(define (mono-time)
  "Return monotonic timestamp in seconds as real."
  (+ 0.0 (/ (get-internal-real-time) internal-time-units-per-second)))

(xbindkey-function '("b:8") (lambda ()
  (let ((ts (mono-time)))
    (when
      ;; Enforce min ts diff between "done" and "start" of the next one
      (or (not razer-ts-done) (>= (- ts razer-ts-done) razer-delay-min))
      (set! razer-ts-start ts)))))

(xbindkey-function '(Release "b:8") (lambda ()
  (let ((ts (mono-time)))
    (when razer-debug
      (format #t "razer: ~a/~a delay=~a[~a] wait=~a[~a]\n"
        razer-ts-start razer-ts-done
        (and razer-ts-done (- ts razer-ts-done)) razer-delay-min
        (and razer-ts-start (- ts razer-ts-start)) razer-wait-max))
    (when
      (and
        ;; Enforce min ts diff between previous "done" and this one
        (or (not razer-ts-done) (>= (- ts razer-ts-done) razer-delay-min))
        ;; Enforce max "click" wait time
        (and razer-ts-start (<= (- ts razer-ts-start) razer-wait-max)))
      (set! razer-ts-done ts)
      (when razer-debug (format #t "razer: --- click!\n"))
      (run-command "xdotool click 2")))))

Note that xbindkeys actually grabs "b:8" here, which is a "mouse button 8", as if it was "b:2", then "xdotool click 2" command will recurse into same code, so wheel-clicker should be bound to button 8 in X for that to work.

Rebinding buttons in X is trivial to do on-the-fly, using standard "xinput" tool - e.g. xinput set-button-map "My Mouse" 1 8 3 (xinitrc.input script can be used as an extended example).

Running "xdotool" to do actual clicks at the end seem a bit wasteful, as xbindkeys already hooks into similar functionality, but unfortunately there's no "send input event" calls exported to guile scripts (as of 1.8.6, at least).

Still, works well enough as it is, fixing that rather annoying issue.

[xbindkeysrc.scm on github]

Dec 29, 2015

Tool to interleave and colorize lines from multiple log (or any other) files

There's multitail thing to tail multiple logs, potentially interleaved, in one curses window, which is a painful-to-impossible to browse through, as you'd do with simple "less".

There's lnav for parsing and normalizing a bunch of logs, and continuously monitoring these, also interactive.

There's rainbow to color specific lines based on regexp, which can't really do any interleaving.

And this has been bugging me for a while - there seem to be no easy way to get this:

interleaved_and_colorized_output_image

This is an interleaved output from several timestamped log files, for events happening at nearly the same time (which can be used to establish the sequence between these and correlate output of multiple tools/instances/etc), browsable via the usual "less" (or whatever other $PAGER) in an xterm window.

In this case, logfiles are from "btmon" (bluetooth sniffer tool), "bluetoothd" (bluez) debug output and an output from gdb attached to that bluetoothd pid (showing stuff described in previous entry about gdb).

Output for neither of these tools have timestamps by default, but this is easy to fix by piping it through any tool which would add them into every line, svlogd for example.

To be concrete (and to show one important thing about such log-from-output approach), here's how I got these particular logs:

# mkdir -p debug_logs/{gdb,bluetoothd,btmon}

# gdb -ex 'source gdb_device_c_ftrace.txt' -ex q --args\
        /usr/lib/bluetooth/bluetoothd --nodetach --debug\
        1> >(svlogd -r _ -ttt debug_logs/gdb)\
        2> >(svlogd -r _ -ttt debug_logs/bluetoothd)

# stdbuf -oL btmon |\
        svlogd -r _ -ttt debug_logs/btmon

Note that "btmon" runs via coreutils stdbuf tool, which can be critical for anything that writes to its stdout via libc's fwrite(), i.e. can have buffering enabled there, which causes stuff to be output delayed and in batches, not how it'd appear in the terminal (where line buffering is used), resulting in incorrect timestamps, unless stdbuf or any other option disabling such buffering is used.

With three separate logs from above snippet, natural thing you'd want is to see these all at the same time, so for each logical "event", there'd be output from btmon (network packet), bluetoothd (debug logging output) and gdb's function call traces.

It's easy to concatenate all three logs and sort them to get these interleaved, but then it can be visually hard to tell which line belongs to which file, especially if they are from several instances of the same app (not really the case here though).

Simple fix is to add per-file distinct color to each line of each log, but then you can't sort these, as color sequences get in the way, it's non-trivial to do even that, and it all adds-up to a script.

Seem to be hard to find any existing tools for the job, so wrote a script to do it - liac (in the usual mk-fg/fgtk github repo), which was used to produce output in the image above - that is, interleave lines (using any field for sorting btw), add tags for distinct ANSI colors to ones belonging to different files and optional prefixes.

Thought it might be useful to leave a note for anyone looking for something similar.

[script source link]

Dec 29, 2015

Getting log of all function calls from specific source file using gdb

Maybe I'm doing debugging wrong, but messing with code written by other people, first question for me is usually not "what happens in function X" (done by setting a breakpoint on it), but rather "which file/func do I look into".

I.e. having an observable effect - like "GET_REPORT messages get sent on HID level to bluetooth device, and replies are ignored", it's easy to guess that it's either linux kernel or bluetoothd - part of BlueZ.

Question then becomes "which calls in app happen at the time of this observable effect", and luckily there's an easy, but not very well-documented (unless my google is that bad) way to see it via gdb for C apps.

For scripts, it's way easier of course, e.g. in python you can do python -m trace ... and it can dump even every single line of code it runs.

First of all, app in question has to be compiled with "-g" option and not "stripped", of course, which should be easy to set via CFLAGS, usually, defining these in distro-specific ways if rebuilding a package to include that (e.g. for Arch - have debug !strip in OPTIONS line from /etc/makepkg.conf).

Then running app under gdb can be done via something like gdb --args someapp arg1 arg2 (and typing "r" there to start it), but if the goal is to get a log of all function calls (and not just in a "call graph" way profiles like gprof do) from a specific file, first - interactivity has to go, second - breakpoints have to be set for all these funcs and then logged when app runs.

Alas, there seem to be no way to add break point to every func in a file.

One common suggestion (does NOT work, don't copy-paste!) I've seen is doing rbreak device\.c: ("rbreak" is a regexp version of "break") to match e.g. profiles/input/device.c:extract_hid_record (as well as all other funcs there), which would be "filename:funcname" pattern in my case, but it doesn't work and shouldn't work, as "rbreak" only matches "filename".

So trivial script is needed to a) get list of funcs in a source file (just name is enough, as C has only one namespace), and b) put a breakpoint on all of them.

This is luckily quite easy to do via ctags, with this one-liner:

% ctags -x --c-kinds=fp profiles/input/device.c |
  awk 'BEGIN {print "set pagination off\nset height 0\nset logging on\n\n"}\
    {print "break", $1 "\ncommands\nbt 5\necho ------------------\\n\\n\nc\nend\n"}\
    END {print "\n\nrun"}' > gdb_device_c_ftrace.txt

Should generate a script for gdb, starting with "set pagination off" and whatever else is useful for logging, with "commands" block after every "break", running "bt 5" (displays backtrace) and echoing a nice-ish separator (bunch of hyphens), ending in "run" command to start the app.

Resulting script can/should be fed into gdb with something like this:

% gdb -ex 'source gdb_device_c_ftrace.txt' -ex q --args\
  /usr/lib/bluetooth/bluetoothd --nodetach --debug

This will produce the needed list of all the calls to functions from that "device.c" file into "gdb.txt" and have output of the app interleaved with these in stdout/stderr (which can be redirected, or maybe closed with more gdb commands in txt file or before it with "-ex"), and is non-interactive.

From here, seeing where exactly the issue seem to occur, one'd probably want to look thru the code of the funcs in question, run gdb interactively and inspect what exactly is happening there.

Definitely nowhere near the magic some people script gdb with, but haven't found similar snippets neatly organized anywhere else, so here they go, in case someone might want to do the exact same thing.

Can also be used to log a bunch of calls from multiple files, of course, by giving "ctags" more files to parse.

Dec 09, 2015

Transparent buffer/file processing in emacs on load/save/whatever-io ops

Following-up on my gpg replacement endeavor, also needed to add transparent decryption for buffers loaded from *.ghg files, and encryption when writing stuff back to these.

git filters (defined via gitattributes file) do same thing when interacting with the repo.

Such thing is already done by a few exising elisp modules, such as jka-compr.el for auto-compression-mode (opening/saving .gz and similar files as if they were plaintext), and epa.el for transparent gpg encryption.

While these modules do this The Right Way by adding "file-name-handler-alist" entry, googling for a small ad-hoc boilerplate, found quite a few examples that do it via hooks, which seem rather unreliable and with esp. bad failure modes wrt transparent encryption.

So, in the interest of providing right-er boilerplate for the task (and because I tend to like elisp) - here's fg_sec.el example (from mk-fg/emacs-setup) of how it can be implemented cleaner, in similar fashion to epa and jka-compr.

Code calls ghg -do when visiting/reading files (with contents piped to stdin) and ghg -eo (with stdin/stdout buffers) when writing stuff back.

Entry-point/hook there is "file-name-handler-alist", where regexp to match *.ghg gets added to call "ghg-io-handler" for every i/o operation (including path ops like "expand-file-name" or "file-exists-p" btw), with only "insert-file-contents" (read) and "write-region" (write) being overidden.

Unlike jka-compr though, no temporary files are used in this implementation, only temp buffers, and "insert-file-contents" doesn't put unauthenticated data into target buffer as it arrives, patiently waiting for subprocess to exit with success code first.

Fairly sure that this bit of elisp can be used for any kind of processing, by replacing "ghg" binary with anything else that can work as a pipe (stdin -> processing -> stdout), which opens quite a lot of possibilities.

For example, all JSON files can be edited as a pretty YAML version, without strict syntax and all the brackets of JSON, or the need to process/convert them purely in elisp's json-mode or something - just plug python -m pyaml and python -m json commands for these two i/o ops and it should work.

Suspect there's gotta be something that'd make such filters easier in MELPA already, but haven't been able to spot anything right away, maybe should put up a package there myself.

[fg_sec.el code link]

Dec 08, 2015

GHG - simplier GnuPG (gpg) replacement for file encryption

Have been using gpg for many years now, many times a day, as I keep lot of stuff in .gpg files, but still can't seem to get used to its quirky interface and practices.

Most notably, it's "trust" thing, keyrings and arcane key editing, expiration dates, gpg-agent interaction and encrypted keys are all sources of dread and stress for me.

Last drop, following the tradition of many disastorous interactions with the tool, was me loosing my master signing key password, despite it being written down on paper and working before. #fail ;(

Certainly my fault, but as I'll be replacing the damn key anyway, why not throw out the rest of that incomprehensible tangle of pointless and counter-productive practices and features I never use?

Took ~6 hours to write a replacement ghg tool - same thing as gpg, except with simple and sane key management (which doesn't assume you entering anything, ever!!!), none of that web-of-trust or signing crap, good (and non-swappable) djb crypto, and only for file encryption.

Does everything I've used gpg for from the command-line, and has one flat file for all the keys, so no more hassle with --edit-key nonsense.

Highly suggest to anyone who ever had trouble and frustration with gpg to check ghg project out or write their own (trivial!) tool, and ditch the old thing - life's too short to deal with that constant headache.

Sep 04, 2015

Parsing OpenSSH Ed25519 keys for fun and profit

Adding key derivation to git-nerps from OpenSSH keys, needed to get the actual "secret" or something deterministically (plus in an obvious and stable way) derived from it (to then feed into some pbkdf2 and get the symmetric key).

Idea is for liteweight ad-hoc vms/containers to have a single "master secret", from which all others (e.g. one for git-nerps' encryption) can be easily derived or decrypted, and omnipresent, secure, useful and easy-to-generate ssh key in ~/.ssh/id_ed25519 seem to be the best candidate.

Unfortunately, standard set of ssh tools from openssh doesn't seem to have anything that can get key material or its hash - next best thing is to get "fingerprint" or such, but these are derived from public keys, so not what I wanted at all (as anyone can derive that, having public key, which isn't secret).

And I didn't want to hash full openssh key blob, because stuff there isn't guaranteed to stay the same when/if you encrypt/decrypt it or do whatever ssh-keygen does.

What definitely stays the same is the values that openssh plugs into crypto algos, so wrote a full parser for the key format (as specified in PROTOCOL.key file in openssh sources) to get that.

While doing so, stumbled upon fairly obvious and interesting application for such parser - to get really and short easy to backup, read or transcribe string which is the actual secret for Ed25519.

I.e. that's what OpenSSH private key looks like:

-----BEGIN OPENSSH PRIVATE KEY-----
b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAAAMwAAAAtzc2gtZW
QyNTUxOQAAACDaKUyc/3dnDL+FS4/32JFsF88oQoYb2lU0QYtLgOx+yAAAAJi1Bt0atQbd
GgAAAAtzc2gtZWQyNTUxOQAAACDaKUyc/3dnDL+FS4/32JFsF88oQoYb2lU0QYtLgOx+yA
AAAEAc5IRaYYm2Ss4E65MYY4VewwiwyqWdBNYAZxEhZe9GpNopTJz/d2cMv4VLj/fYkWwX
zyhChhvaVTRBi0uA7H7IAAAAE2ZyYWdnb2RAbWFsZWRpY3Rpb24BAg==
-----END OPENSSH PRIVATE KEY-----

And here's the only useful info in there, enough to restore whole blob above from, in the same base64 encoding:

HOSEWmGJtkrOBOuTGGOFXsMIsMqlnQTWAGcRIWXvRqQ=

Latter, of course, being way more suitable for tried-and-true "write on a sticker and glue at the desk" approach. Or one can just have a file with one host key per line - also cool.

That's the 32-byte "seed" value, which can be used to derive "ed25519_sk" field ("seed || pubkey") in that openssh blob, and all other fields are either "none", "ssh-ed25519", "magic numbers" baked into format or just padding.

So rolled the parser from git-nerps into its own tool - ssh-keyparse, which one can run and get that string above for key in ~/.ssh/id_ed25519, or do some simple crypto (as implemented by djb in ed25519.py, not me) to recover full key from the seed.

From output serialization formats that tool supports, especially liked the idea of Douglas Crockford's Base32 - human-readable one, where all confusing l-and-1 or O-and-0 chars are interchangeable, and there's an optional checksum (one letter) at the end:

% ssh-keyparse test-key --base32
3KJ8-8PK1-H6V4-NKG4-XE9H-GRW5-BV1G-HC6A-MPEG-9NG0-CW8J-2SFF-8TJ0-e

% ssh-keyparse test-key --base32-nodashes
3KJ88PK1H6V4NKG4XE9HGRW5BV1GHC6AMPEG9NG0CW8J2SFF8TJ0e

base64 (default) is still probably most efficient for non-binary (there's --raw otherwise) backup though.

[ssh-keyparse code link]

Sep 01, 2015

Transparent and easy encryption for files in git repositories

Have been installing things to an OS containers (/var/lib/machines) lately, and looking for proper configuration management in these.

Large-scale container setups use some hard-to-integrate things like etcd, where you have to template configuration from values in these, which is not very convenient and very low effort-to-results ratio (maintenance of that system itself) for "10 service containers on 3 hosts" case.

Besides, such centralized value store is a bit backwards for one-container-per-service case, where most values in such "central db" are specific to one container, and it's much easier to edit end-result configs then db values and then templates and then check how it all gets rendered on every trivial tweak.

Usual solution I have for these setups is simply putting all confs under git control, but leaving all the secrets (e.g. keys, passwords, auth data) out of the repo, in case it might be pulled from on other hosts, by different people and for purposes which don't need these sensitive bits and might leak them (e.g. giving access to contracted app devs).

For more transient container setups, something should definitely keep track of these "secrets" however, as "rm -rf /var/lib/machines/..." is much more realistic possibility and has its uses.


So my (non-original) idea here was to have one "master key" per host - just one short string - with which to encrypt all secrets for that host, which can then be shared between hosts and specific people (making these public might still be a bad idea), if necessary.

This key should then be simply stored in whatever key-management repo, written on a sticker and glued to a display, or something.

Git can be (ab)used for such encryption, with its "filter" facilities, which are generally used for opposite thing (normalization to one style), but are easy to adapt for this case too.

Git filters work by running "clear" operation on selected paths (can be a wildcard patterns like "*.c") every time git itself uses these and "smudge" when showing to user and checking them out to a local copy (where they are edited).

In case of encryption, "clear" would not be normalizing CR/LF in line endings, but rather wrapping contents (or parts of them) into a binary blob, and "smudge" should do the opposite, and gitattributes patterns would match files to be encrypted.


Looking for projects that already do that, found quite a few, but still decided to write my own tool, because none seem have all the things I wanted:

  • Use sane encryption.

    It's AES-CTR in the absolutely best case, and AES-ECB (wtf!?) in some, sometimes openssl is called with "password" on the command line (trivial to spoof in /proc).

    OpenSSL itself is a red flag - hard to believe that someone who knows how bad its API and primitives are still uses it willingly, for non-TLS, at least.

    Expected to find at least one project using AEAD through NaCl or something, but no such luck.

  • Have tool manage gitattributes.

    You don't add file to git repo by typing /path/to/myfile managed=version-control some-other-flags to some config, why should you do it here?

  • Be easy to deploy.

    Ideally it'd be a script, not some c++/autotools project to install build tools for or package to every setup.

    Though bash script is maybe taking it a bit too far, given how messy it is for anything non-trivial, secure and reliable in diff environments.

  • Have "configuration repository" as intended use-case.

So wrote git-nerps python script to address all these.

Crypto there is trivial yet solid PyNaCl stuff, marking files for encryption is as easy as git-nerps taint /what/ever/path and bootstrapping the thing requires nothing more than python, git, PyNaCl (which are norm in any of my setups) and git-nerps key-gen in the repo.

README for the project has info on every aspect of how the thing works and more on the ideas behind it.

I expect it'll have a few more use-case-specific features and convenience-wrapper commands once I'll get to use it in a more realistic cases than it has now (initially).


[project link]

Aug 22, 2015

Quick lzma2 compression showcase

On cue from irc, recently ran this experiment:

% a=(); for n in {1..100}; do f=ls_$n; cp /usr/bin/ls $f; echo $n >> $f; a+=( $f ); done
% 7z a test.7z "${a[@]}" >/dev/null
% tar -cf test.tar "${a[@]}"
% gzip < test.tar > test.tar.gz
% xz < test.tar > test.tar.xz
% rm -f "${a[@]}"

% ls -lahS test.*
-rw-r--r-- 1 fraggod fraggod  12M Aug 22 19:03 test.tar
-rw-r--r-- 1 fraggod fraggod 5.1M Aug 22 19:03 test.tar.gz
-rw-r--r-- 1 fraggod fraggod 465K Aug 22 19:03 test.7z
-rw-r--r-- 1 fraggod fraggod  48K Aug 22 19:03 test.tar.xz

Didn't realize that gz was that bad at such deduplication task.

Also somehow thought (and never really bothered to look it up) that 7z was compressing each file individually by default, which clearly is not the case, as overall size should be 10x of what 7z produced then.

Docs agree on "solid" mode being the default of course, meaning no easy "pull one file out of the archive" unless explicitly changed - useful to know.

Further 10x difference between 7z and xz is kinda impressive, even for such degenerate case.

← Previous Next → Page 2 of 3
Member of The Internet Defense League