Aug 22, 2015

Quick lzma2 compression showcase

On cue from irc, recently ran this experiment:

% a=(); for n in {1..100}; do f=ls_$n; cp /usr/bin/ls $f; echo $n >> $f; a+=( $f ); done
% 7z a test.7z "${a[@]}" >/dev/null
% tar -cf test.tar "${a[@]}"
% gzip < test.tar > test.tar.gz
% xz < test.tar > test.tar.xz
% rm -f "${a[@]}"

% ls -lahS test.*
-rw-r--r-- 1 fraggod fraggod  12M Aug 22 19:03 test.tar
-rw-r--r-- 1 fraggod fraggod 5.1M Aug 22 19:03 test.tar.gz
-rw-r--r-- 1 fraggod fraggod 465K Aug 22 19:03 test.7z
-rw-r--r-- 1 fraggod fraggod  48K Aug 22 19:03 test.tar.xz

Didn't realize that gz was that bad at such deduplication task.

Also somehow thought (and never really bothered to look it up) that 7z was compressing each file individually by default, which clearly is not the case, as overall size should be 10x of what 7z produced then.

Docs agree on "solid" mode being the default of course, meaning no easy "pull one file out of the archive" unless explicitly changed - useful to know.

Further 10x difference between 7z and xz is kinda impressive, even for such degenerate case.

Dec 11, 2010

zcat, bzcat, lzcat, xzcat... Arrrgh! Autodetection rocks

Playing with dracut today, noticed that it can create lzma-compressed initrd's without problem, but it's "lsinitrd" script uses zcat to access initrd data, thus failing for lzma or bzip2 compression.

Of course the "problem" is nothing new, and I've bumped against it a zillion times in the past, although it looks like today I was a bit less (or more?) lazy than usual and tried to seek a solution - some *cat tool, which would be able to read any compressed format without the need to specify it explicitly.

Finding nothing of the /usr/bin persuasion, I noticed that there's a fine libarchive project, which can do all sort of magic just for this purpose, alas there seem to be no cli client for it to utilize this magic, so I got around to write my own one.

These few minutes of happy-hacking probably saved me a lot of time in the long run, guess the result may as well be useful to someone else:

#include <archive.h>
#include <archive_entry.h>
#include <stdio.h>
#include <stdlib.h>

const int BS = 16384;

int main(int argc, const char **argv) {
    if (argc > 2) {
        fprintf(stderr, "Usage: %s [file]\n", argv[0]);
        exit(1); }

    struct archive *a = archive_read_new();

    int err;
    if (argc == 2) err = archive_read_open_filename(a, argv[1], BS);
    else err = archive_read_open_fd(a, 0, BS);
    if (err != ARCHIVE_OK) {
        fprintf(stderr, "Broken archive (1)\n");
        exit(1); }

    struct archive_entry *ae;
    err = archive_read_next_header(a, &ae);
    if (err != ARCHIVE_OK) {
        fprintf(stderr, "Broken archive (2)\n");
        exit(1); }

    (void) archive_read_data_into_fd(a, 1);

Build it with "gcc -larchive excat.c -o excat" and use as "excat /path/to/something.{xz,gz,bz2,...}".
List of formats, supported by libarchive can be found here, note that it can also unpack something like file.gz.xz, although I have no idea why'd someont want to create such thing.

I've also created a project on sourceforge for it, in hopes that it'd save someone like me a bit of time with google-fu, but I doubt I'll add any new features here.

Member of The Internet Defense League