May 19, 2015

twitch.tv VoDs (video-on-demand) downloading issues and fixes

Quite often recently VoDs on twitch for me are just unplayable through the flash player - no idea what happens at the backend there, but it buffers endlessly at any quality level and that's it.

I also need to skip to some arbitrary part in the 7-hour stream (last wcs sc2 ro32), as I've watched half of it live, which turns out to complicate things a bit.

So the solution is to download the thing, which goes something like this:

  • It's just a video, right? Let's grab whatever stream flash is playing (with e.g. FlashGot FF addon).

    Doesn't work easily, since video is heavily chunked.
    It used to be 30-min flv chunks, which are kinda ok, but these days it's forced 4s chunks - apparently backend doesn't even allow downloading more than 4s per request.
  • Fine, youtube-dl it is.

    Nope. Doesn't allow seeking to time in stream.
    There's an open "Download range" issue for that.

    livestreamer wrapper around the thing doesn't allow it either.

  • Try to use ?t=3h30m URL parameter - doesn't work, sadly.

  • mpv supports youtube-dl and seek, so use that.

    Kinda works, but only for super-short seeks.
    Seeking beyond e.g. 1 hour takes AGES, and every seek after that (even skipping few seconds ahead) takes longer and longer.
  • youtube-dl --get-url gets m3u8 playlist link, use ffmpeg -ss <pos> with it.

    Apparently works exactly same as mpv above - takes like 20-30min to seek to 3:30:00 (3.5 hour offset).

    Dunno if it downloads and checks every chunk in the playlist for length sequentially... sounds dumb, but no clue why it's that slow otherwise, apparently just not good with these playlists.

  • Grab the m3u8 playlist, change all relative links there into full urls, remove bunch of them from the start to emulate seek, play that with ffmpeg | mpv.

    Works at first, but gets totally stuck a few seconds/minutes into the video, with ffmpeg doing bitrates of ~10 KiB/s.

    youtube-dl apparently gets stuck in a similar fashion, as it does the same ffmpeg-on-a-playlist (but without changing it) trick.

  • Fine! Just download all the damn links with curl.

    grep '^http:' pls.m3u8 | xargs -n50 curl -s | pv -rb -i5 > video.mp4

    Makes it painfully obvious why flash player and ffmpeg/youtube-dl get stuck - eventually curl stumbles upon a chunk that downloads at a few KiB/s.

    This "stumbling chunk" appears to be a random one, unrelated to local bandwidth limitations, and just re-trying it fixes the issue.

  • Assemble a list of links and use some more advanced downloader that can do parallel downloads, plus detect and retry super-low speeds.

    Naturally, it's aria2, but with all the parallelism it appears to be impossible to guess which output file will be which with just a cli.

    Mostly due to links having same url-path, e.g. index-0000000014-O7tq.ts?start_offset=955228&end_offset=2822819 with different offsets (pity that backend doesn't seem to allow grabbing range of that *.ts file of more than 4s) - aria2 just does file.ts, file.ts.1, file.ts.2, etc - which are not in playlist-order due to all the parallel stuff.

  • Finally, as acceptance dawns, go and write your own youtube-dl/aria2 wrapper to properly seek necessary offset (according to playlist tags) and download/resume files from there, in a parallel yet ordered and controlled fashion.

    This is done by using --on-download-complete hook with passing ordered "gid" numbers for each chunk url, which are then passed to the hook along with the resulting path (and hook renames file to prefix + sequence number).

Ended up with the chunk of the stream I wanted, locally (online playback lag never goes away!), downloaded super-fast and seekable.

Resulting script is twitch_vod_fetch (script source link).

Update 2017-06-01: rewritten it using python3/asyncio since then, so stuff related to specific implementation details here is only relevant for old py2 version (can be pulled from git history, if necessary).


aria2c magic bits in the script:

aria2c = subprocess.Popen([
  'aria2c',

  '--stop-with-process={}'.format(os.getpid()),
  '--enable-rpc=true',
  '--rpc-listen-port={}'.format(port),
  '--rpc-secret={}'.format(key),

  '--no-netrc', '--no-proxy',
  '--max-concurrent-downloads=5',
  '--max-connection-per-server=5',
  '--max-file-not-found=5',
  '--max-tries=8',
  '--timeout=15',
  '--connect-timeout=10',
  '--lowest-speed-limit=100K',
  '--user-agent={}'.format(ua),

  '--on-download-complete={}'.format(hook),
], close_fds=True)

Didn't bother adding extra options for tweaking these via cli, but might be a good idea to adjust timeouts and limits for a particular use-case (see also the massive "man aria2c").

Seeking in playlist is easy, as it's essentially a VoD playlist, and every ~4s chunk is preceded by e.g. #EXTINF:3.240, tag, with its exact length, so script just skips these as necessary to satisfy --start-pos / --length parameters.

Queueing all downloads, each with its own particular gid, is done via JSON-RPC, as it seem to be impossible to:

  • Specify both link and gid in the --input-file for aria2c.
  • Pass an actual download URL or any sequential number to --on-download-complete hook (except for gid).

So each gid is just generated as "000001", "000002", etc, and hook script is a one-liner "mv" command.


Since all stuff in the script is kinda lenghty time-wise - e.g. youtube-dl --get-url takes a while, then the actual downloads, then concatenation, ... - it's designed to be Ctrl+C'able at any point.

Every step just generates a state-file like "my_output_prefix.m3u8", and next one goes on from there.
Restaring the script doesn't repeat these, and these files can be freely mangled or removed to force re-doing the step (or to adjust behavior in whatever way).
Example of useful restart might be removing *.m3u8.url and *.m3u8 files if twitch starts giving 404's due to expired links in there.
Won't force re-downloading any chunks, will only grab still-missing ones and assemble the resulting file.

End-result is one my_output_prefix.mp4 file with specified video chunk (or full video, if not specified), plus all the intermediate litter (to be able to restart the process from any point).


One issue I've spotted with the initial version:

05/19 22:38:28 [ERROR] CUID#77 - Download aborted. URI=...
Exception: [AbstractCommand.cc:398] errorCode=1 URI=...
  -> [RequestGroup.cc:714] errorCode=1 Download aborted.
  -> [DefaultBtProgressInfoFile.cc:277]
    errorCode=1 total length mismatch. expected: 1924180, actual: 1789572
05/19 22:38:28 [NOTICE] Download GID#0035090000000000 not complete: ...

Seem to be a few of these mismatches (like 5 out of 10k chunks), which don't get retried, as aria2 doesn't seem to consider these to be a transient errors (which is probably fair).

Probably a twitch bug, as it clearly breaks http there, and browsers shouldn't accept such responses either.

Can be fixed by one more hook, I guess - either --on-download-error (to make script retry url with that gid), or the one using websocket and getting json notification there.

In any case, just running same command again to download a few of these still-missing chunks and finish the process works around the issue.

Update 2015-05-22: Issue clearly persists for vods from different chans, so fixed it via simple "retry all failed chunks a few times" loop at the end.

Update 2015-05-23: Apparently it's due to aria2 reusing same files for different urls and trying to resume downloads, fixed by passing --out for each download queued over api.


[script source link]

Jun 15, 2014

Running isolated Steam instance with its own UID and session

Finally got around to installing Steam platform to a desktop linux machine.
Been using Win7 instance here for games before, but as another fan in my laptop died, have been too lazy to reboot into dedicated games-os here.

Given that Steam is a closed-source proprietary DRM platform for mass software distribution, it seem to be either an ideal malware spread vector or just a recipie for disaster, so of course not keen on giving it any access in a non-dedicated os.

I also feel a bit guilty on giving the thing any extra PR, as it's the worst kind of always-on DRM crap in principle, and already pretty much monopolized PC Gaming market.
These days even many game critics push for filtering and essentially abuse of that immense leverage - not a good sign at all.
To its credit, of course, Steam is nice and convenient to use, as such things (e.g. google, fb, droids, apple, etc) tend to be.

So, isolation:

  • To avoid having Steam and any games anywhere near $HOME, giving it separate UID is a good way to go.

  • That should also allow for it to run in a separate desktop session - i.e. have its own cgroup, to easily contain, control and set limits for games:

    % loginctl user-status steam
    steam (1001)
      Since: Sun 2014-06-15 18:40:34 YEKT; 31s ago
      State: active
      Sessions: *7
        Unit: user-1001.slice
              └─session-7.scope
                ├─7821 sshd: steam [priv]
                ├─7829 sshd: steam@notty
                ├─7830 -zsh
                ├─7831 bash /usr/bin/steam
                ├─7841 bash /home/steam/.local/share/Steam/steam.sh
                ├─7842 tee /tmp/dumps/steam_stdout.txt
                ├─7917 /home/steam/.local/share/Steam/ubuntu12_32/steam
                ├─7942 dbus-launch --autolaunch=e52019f6d7b9427697a152348e9f84ad ...
                └─7943 /usr/bin/dbus-daemon --fork --print-pid 5 ...
    
  • AppArmor should allow to further isolate processes from having any access beyond what's absolutely necessary for them to run, warn when these try to do strange things and allow to just restrict these from doing outright stupid things.

  • Given separate UID and cgroup, network access from all Steam apps can be easily controlled via e.g. iptables, to avoid Steam and games scanning and abusing other things in LAN, for example.


Creating steam user should be as simple as useradd steam, but then switching to that UID from within a running DE should still allow it to access same X server and start systemd session for it, plus not have any extra env, permissions, dbus access, fd's and such from the main session.

By far the easiest way to do that I've found is to just ssh steam@localhost, putting proper pubkey into ~steam/.ssh/authorized_keys first, of course.
That should ensure that nothing leaks from DE but whatever ssh passes, and it's rather paranoid security-oriented tool, so can be trusted with that .
Steam comes with a bootstrap script (e.g. /usr/bin/steam) to install itself, which also starts the thing when it's installed, so Steam AppArmor profile (github link) is for that.
It should allow to both bootstrap and install stuff as well as run it, yet don't allow steam to poke too much into other shared dirs or processes.

To allow access to X, xhost or ~/.Xauthority cookie can be used along with some extra env in e.g. ~/.zshrc:

export DISPLAY=':1.0'

In similar to ssh fashion, I've used pulseaudio network streaming to main DE sound daemon on localhost for sound (also in ~/.zshrc):

export PULSE_SERVER='{e52019f6d7b9427697a152348e9f84ad}tcp6:malediction:4713'
export PULSE_COOKIE="$HOME"/.pulse-cookie

(I have pulse network streaming setup anyway, for sharing sound from desktop to laptop - to e.g. play videos on a big screen there yet hear sound from laptop's headphones)

Running Steam will also start its own dbus session (maybe it's pulse client lib doing that, didn't check), but it doesn't seem to be used for anything, so there seem to be no need to share it with main DE.


That should allow to start Steam after ssh'ing to steam@localhost, but process can be made much easier (and more foolproof) with e.g. ~/bin/steam as:

#!/bin/bash

cmd=$1
shift

steam_wait_exit() {
  for n in {0..10}; do
    pgrep -U steam -x steam >/dev/null || return 0
    sleep 0.1
  done
  return 1
}

case "$cmd" in
  '')
    ssh steam@localhost <<EOF
source .zshrc
exec steam "$@"
EOF
    loginctl user-status steam ;;

  s*) loginctl user-status steam ;;

  k*)
    steam_exited=
    pgrep -U steam -x steam >/dev/null
    [[ $? -ne 0 ]] && steam_exited=t
    [[ -z "$steam_exited" ]] && {
      ssh steam@localhost <<EOF
source .zshrc
exec steam -shutdown
EOF
      steam_wait_exit
      [[ $? -eq 0 ]] && steam_exited=t
    }
    sudo loginctl kill-user steam
    [[ -z "$steam_exited" ]] && {
      steam_wait_exit || sudo loginctl -s KILL kill-user steam
    } ;;

  *) echo >&2 "Usage: $(basename "$0") [ status | kill ]"
esac

Now just steam in the main DE will run the thing in its own $HOME.

For further convenience, there's steam status and steam kill to easily monitor or shutdown running Steam session from the terminal.

Note the complicated shutdown thing - Steam doesn't react to INT or TERM signals cleanly, passing these to the running games instead, and should be terminated via its own cli option (and the rest can then be killed-off too).


With this setup, iptables rules for outgoing connections can use user-slice cgroup match (in 3.14 at least) or -m owner --uid-owner steam matches for socket owner uid.

The only non-WAN things Steam connects to here are DNS servers and aforementioned pulseaudio socket on localhost, the rest can be safely firewalled.


Finally, running KSP there on Exherbo, I quickly discovered that sound libs and plugins - alsa and pulse - in ubuntu "runtime" steam bootstrap setups don't work well - either there's no sound or game fails to load at all.

Easy fix is to copy the runtime it uses (32-bit one for me) and cleanup alien stuff from there for what's already present in the system, i.e.:

% cp -R .steam/bin32/steam-runtime my-runtime
% find my-runtime -type f\
  \( -path '*asound*' -o -path '*alsa*' -o -path '*pulse*' \) -delete

And then add something like this to ~steam/.zshrc:

steam() { STEAM_RUNTIME="$HOME"/my-runtime command steam "$@"; }

That should keep all of the know-working Ubuntu libs that steam bootsrap gets away from the rest of the system (where stuff like Mono just isn't needed, and others will cause trouble) while allowing to remove any of them from the runtime to use same thing in the system.

And yay - Kerbal Space Program seem to work here way faster than on Win7.

KSP and Steam on Linux

Mar 19, 2014

Shadowrun: Dragonfall

Dragonfall is probably the best CRPG (and game, given that it's my fav genre) I've played in a few years.
Right mix of setting, world, characters, tone, choices and mechanics, and it only gets more amazing towards the end.
I think all the praise it gets is well deserved - for me it's up there with early Bioware games (BGs, PST) and Fallouts.
Didn't quite expect it too after okay'ish Dead Man's Switch first SRR experience, and hopefully there's more to come from HBS, would even justify another kickstarter imo.
Member of The Internet Defense League