Jun 02, 2017
There are way more tools that happily forward TCP ports than ones for UDP.
Case in point - it's usually easy to forward ssh port through a bunch of hosts
and NATs, with direct and reverse ssh tunnels, ProxyCommand stuff, tools like
pwnat, etc, but for mosh UDP connection it's not that trivial.
Which sucks, because its performance and input prediction stuff is exactly
what's lacking in super-laggy multi-hop ssh connections forwarded back-and-forth
between continents through such tunnels.
There are quite a few long-standing discussions on how to solve it properly in
mosh, which didn't get any traction so far, unfortunately:
One obvious way to make it work, is to make some tunnel (like OpenVPN or
wireguard) from destination host (server) to a client, and use mosh over that.
But that's some extra tools and configuration to keep around on both sides, and
there is much easier way that works perfectly for most cases - knowing both
server and client IPs, pre-pick ports for mosh-server and mosh-client, then
punch hole in the NAT for these before starting both.
How it works:
- Pick some UDP ports that server and client will be using, e.g. 34700 for
server and 34701 for client.
- Send UDP packet from server:34700 to client:34701.
- Start mosh-server, listening on server:34700.
- Connect to that with mosh-client, using client:34701 as a UDP source port.
NAT on the router(s) in-between the two will see this exchange as a server
establishing "udp connection" to a client, and will allow packets in both
directions to flow through between these two ports.
Once mosh-client establishes the connection and keepalive packets will start
bouncing there all the time, it will be up indefinitely.
mosh is generally well-suited for running manually from an existing console,
so all that's needed to connect in a simple case is:
server% mosh-server new
MOSH CONNECT 60001 NN07GbGqQya1bqM+ZNY+eA
client% MOSH_KEY=NN07GbGqQya1bqM+ZNY+eA mosh-client <server-ip> 60001
With hole-punching, two additional wrappers are required with the current mosh
version (1.3.0):
- One for mosh-server to send UDP packet to the client IP, using same port on
which server will then be started: mosh-nat
- And a wrapper for mosh-client to force its socket to bind to specified local
UDP port, which was used as a dst by mosh-server wrapper above: mosh-nat-bind.c
Making connection using these two is as easy as with stock mosh above:
server% ./mosh-nat 74.59.38.152
mosh-client command:
MNB_PORT=34730 LD_PRELOAD=./mnb.so
MOSH_KEY=rYt2QFJapgKN5GUqKJH2NQ mosh-client <server-addr> 34730
client% MNB_PORT=34730 LD_PRELOAD=./mnb.so \
MOSH_KEY=rYt2QFJapgKN5GUqKJH2NQ mosh-client 84.217.173.225 34730
(with server at 84.217.173.225, client at 74.59.38.152 and using port 34730 on
both ends in this example)
Extra notes:
- "mnb.so" used with LD_PRELOAD is that mosh-nat-bind.c wrapper, which can be
compiled using: gcc -nostartfiles -fpic -shared -ldl -D_GNU_SOURCE
mosh-nat-bind.c -o mnb.so
- Both mnb.so and mosh-nat only work with IPv4, IPv6 shouldn't use NAT anyway.
- 34730 is the default port for -c/--client-port and -s/--server-port opts in
mosh-nat script.
- Started mosh-server waits for 60s (default) for mosh-client to connect.
- Continous operation relies on mosh keepalive packets without interruption, as
mentioned, and should break on (long enough) net hiccups, unlike direct mosh
connections established to server that has no NAT in front of it (or with a
dedicated port forwarding).
- No roaming of any kind is possible here, again, unlike with original mosh - if
src IP/port changes, connection will break.
- New MOSH_KEY is generated by mosh-server on every run, and is only good for
one connection, as server should rotate it after connection gets established,
so is pretty safe/easy to use.
- If client is behind NAT as well, its visible IP should be used, not internal one.
- Should only work when NAT on either side doesn't rewrite source ports.
Last point can be a bummer with some "Carrier-grade" NATs, which do rewrite src
ports out of necessity, but can be still worked around if it's only on the
server side by checking src port of the hole-punching packet in tcpdump and
using that instead of whatever it was supposed to be originally.
Requires just python to run wrapper script on the server and no additional
configuration of any kind.
Both linked wrappers are from here:
May 15, 2017
Mostly use unorthodox variable-width font for coding, but do need monospace
sometimes, e.g. for jagged YAML files or .rst.
Had weird issue with my emacs for a while, where switching to monospace font
will slow window/frame rendering significantly, to a noticeable degree, having
stuff blink and lag, making e.g. holding key to move cursor impossible, etc.
Usual profiling showed that it's an actual rendering via C code, so kinda hoped
that it'd go away in one of minor releases, but nope - turned out to be the
dumbest thing in ~/.emacs:
(set-face-font 'fixed-pitch "DejaVu Sans Mono-7.5")
That one line is what slows stuff down to a crawl in monospace ("fixed-pitch")
configuration, just due to non-integer font size, apparently.
Probably not emacs' fault either, just xft or some other lower-level rendering
lib, and a surprising little quirk that can affect high-level app experience a lot.
Changing font size there to 8 or 9 gets rid of the issue. Oh well...
May 14, 2017
"ssh -R" a is kinda obvious way to setup reverse access tunnel from something
remote that one'd need to access, e.g. raspberry pi booted from supplied img
file somewhere behind the router on the other side of the world.
Being part of OpenSSH, it's available on any base linux system, and trivial to
automate on startup via loop in a shell script, crontab or a systemd unit, e.g.:
[Unit]
Wants=network.service
After=network.service
[Service]
Type=simple
User=ssh-reverse-access-tunnel
Restart=always
RestartSec=10
ExecStart=/usr/bin/ssh -oControlPath=none -oControlMaster=no \
-oServerAliveInterval=6 -oServerAliveCountMax=10 -oConnectTimeout=180 \
-oPasswordAuthentication=no -oNumberOfPasswordPrompts=0 \
-oExitOnForwardFailure=yes -NnT -R "1234:localhost:22" tun-user@tun-host
[Install]
WantedBy=multi-user.target
On the other side, ideally in a dedicated container or VM, there'll be sshd
"tun-user" with an access like this (as a single line):
command="echo >&2 'No shell access!'; exit 1",
no-X11-forwarding,no-agent-forwarding,no-pty ssh-ed25519 ...
Or have sshd_config section with same restrictions and only keys in
authorized_keys, e.g.:
Match User tun-*
# GatewayPorts yes
PasswordAuthentication no
X11Forwarding no
AllowAgentForwarding no
PermitTTY no
PermitTunnel no
AllowStreamLocalForwarding no
AllowTcpForwarding remote
ForceCommand echo 'no shell access!'; exit 1
And that's it, right?
No additional stuff needed, "ssh -R" will connect reliably on boot, keep
restarting and reconnecting in case of any errors, even with keepalives to
detect dead connections and restart asap.
Nope!
There's a bunch of common pitfalls listed below.
Problem 1:
When device with a tunnel suddenly dies for whatever reason - power or network
issues, kernel panic, stray "kill -9" or what have you - connection on sshd
machine will hang around for a while, as keepalive options are only used by
the client.
Along with (dead) connection, listening port will stay open as well, and "ssh
-R" from e.g. power-cycled device will not be able to bind it, and that client
won't treat it as a fatal error either!
Result: reverse-tunnels don't survive any kind of non-clean reconnects.
Fix:
- TCPKeepAlive in sshd_config - to detect dead connections there faster,
though probably still not fast enough for e.g. emergency reboot.
- Detect and kill sshd pids without listening socket, forcing "ssh -R" to
reconnect until it can actually bind one.
- If TCPKeepAlive is not good or reliable enough, kill all sshd pids
associated with listening sockets that don't produce the usual
"SSH-2.0-OpenSSH_7.4" greeting line.
Problem 2:
Running sshd on any reasonably modern linux, you get systemd session for each
connection, and killing sshd pids as suggested above will leave logind
sessions from these, potentially creating hundreds or thousands of them over
time.
Solution:
- "UsePAM no" to disable pam_systemd.so along with the rest of the PAM.
- Dedicated PAM setup for ssh tunnel logins on this dedicated system, not
using pam_systemd.
- Occasional cleanup via loginctl list-sessions/terminate-session for ones
that are in "closing"/"abandoned" state.
Killing sshd pids might be hard to avoid on fast non-clean reconnect, since
reconnected "ssh -R" will hang around without a listening port forever,
as mentioned.
Problem 3:
If these tunnels are not configured on per-system basis, but shipped in some
img file to use with multiple devices, they'll all try to bind same listening
port for reverse-tunnels, so only one of these will work.
Fixes:
More complex script to generate listening port for "ssh -R" based on
machine id, i.e. serial, MAC, local IP address, etc.
Get free port to bind to out-of-band from the server somehow.
Can be through same ssh connection, by checking ss/netstat output or
/proc/net/tcp there, if commands are allowed there (probably a bad idea for
random remote devices).
Problem 4:
Device identification in the same "mutliple devices" scenario.
I.e. when someone sets up 5 RPi boards on the other end, how to tell which
tunnel leads to each specific board?
Can usually be solved by:
- Knowing/checking quirks specific to each board, like dhcp hostname,
IP address, connected hardware, stored files, power-on/off timing, etc.
- Blinking LEDs via /sys/class/leds, ethtool --identify or GPIO pins.
- Output on connected display - just "wall" some unique number
(e.g. reverse-tunnel port) or put it to whatever graphical desktop.
Problem 5:
Round-trip through some third-party VPS can add significant console lag,
making it rather hard to use.
More general problem than with just "ssh -R", but when doing e.g. "EU -> US ->
RU" trip and back, console becomes quite unusable without something like
mosh, which can't be used over that forwarded tcp port anyway!
Kinda defeats the purpose of the whole thing, though laggy console (with an
option to upgrade it, once connected) is still better than nothing.
Not an exhaustive or universally applicable list, of course, but hopefully shows
that it's actually more hassle than "just run ssh -R on boot" to have something
robust here.
So choice of ubiquitous / out-of-the-box "ssh -R" over installing some dedicated
tunneling thing like OpenVPN (or, wireguard - much better choice on linux) is
not as clear-cut in favor of the former as it would seem, taking all such quirks
(handled well by dedicated tunneling apps) into consideration.
As I've bumped into all of these by now, addressed them by:
ssh-tunnels-cleanup script to (optionally) do three things, in order:
- Find/kill sshd pids without associated listening socket
("ssh -R" that re-connected quickly and couldn't bind one).
- Probe all sshd listening sockets with ncat (nc that comes with nmap) and
make sure there's an "SSH-2.0-..." banner there, otherwise kill.
- Cleanup all dead loginctl sessions, if any.
Only affects/checks sshd pids for specific user prefix (e.g. "tun-"), to avoid
touching anything but dedicated tunnels.
ssh-reverse-mux-server / ssh-reverse-mux-client scripts.
For listening port negotiation with ssh server,
using bunch of (authenticated) UDP packets.
Essentially a wrapper for "ssh -R" on the client, to also pass all the
required options, replacing ExecStart= line in above systemd example
with e.g.:
ExecStart=/usr/local/bin/ssh-reverse-mux-client \
--mux-port=2200 --ident-rpi -s pkt-mac-key.aGPwhpya tun-user@tun-host
ssh-reverse-mux-server on the other side will keep .db file of --ident strings
(--ident-rpi uses hash of RPi board serial from /proc/cpuinfo) and allocate
persistent port number (from specified range) to each one, which client will
use with actual ssh command.
Simple symmetric key (not very secret) is used to put MAC into packets and
ignore any noise traffic on either side that way.
https://github.com/mk-fg/fgtk#ssh-reverse-mux
Hook in ssh-reverse-mux-client above to blink bits of allocated port on some
available LED.
Sample script to do the morse-code-like bit-blinking:
And additional hook option for command above:
... -c 'sudo -n led-blink-arg -f -l led0 -n 2/4-2200'
(with last arg-num / bits - decrement spec there to blink only last 4 bits
of the second passed argument, which is listening port, e.g. "1011" for "2213")
Given how much OpenSSH does already, having all this functionality there
(or even some hooks for that) would probably be too much to ask.
...at least until it gets rewritten as some systemd-accessd component :P
Apr 27, 2017
Running Wireless AP on linux is pretty much always done through handy hostapd
tool, which sets the necessary driver parameters and handles authentication
and key management aspects of an infrastructure mode access point operation.
Its configuration file has plenty of options, which get initialized to a
rather conserative defaults, resulting in suboptimal bendwidth with anything
from this decade, e.g. 802.11n or 802.11ac cards/dongles.
Furthermore, it seem to assume decent amount of familiarity with IEEE standards
on WiFi protocols, which are mostly paywalled (though can easily be pirated ofc,
just use google).
Specifically, channel selection for VHT (802.11ac) there is a bit of a
nightmare, as hostapd code not only has (undocumented afaict) whitelist for
these, but also needs more than one parameter to set them.
I'm not an expert on wireless links and wifi specifically, just had to setup one
recently (and even then, without going into STBC, Beamforming and such), so
don't take this info as some kind of authoritative "how it must be done" guide -
just my 2c and nothing more.
Anyway, first of all, to get VHT ("Very High Throughput") aka 802.11ac mode at
all, following hostapd config can be used as a baseline:
# https://w1.fi/cgit/hostap/plain/hostapd/hostapd.conf
ssid=my-test-ap
wpa_passphrase=set-ap-password
country_code=US
# ieee80211d=1
# ieee80211h=1
interface=wlan0
driver=nl80211
wpa=2
wpa_key_mgmt=WPA-PSK
rsn_pairwise=CCMP
logger_syslog=0
logger_syslog_level=4
logger_stdout=-1
logger_stdout_level=0
hw_mode=a
ieee80211n=1
require_ht=1
ieee80211ac=1
require_vht=1
vht_oper_chwidth=1
channel=36
vht_oper_centr_freq_seg0_idx=42
There, important bits are obviously stuff at the top - ssid and wpa_passphrase.
But also country_code, as it will apply all kinds of restrictions on 5G channels
that one can use.
ieee80211d/ieee80211h are related to these country_code restrictions, and are
probably necessary for some places and when/if DFS (dynamic frequency selection)
is used, but more on that later.
If that config doesn't work (started with e.g. hostapd myap.conf), and not
just due to some channel conflict or regulatory domain (i.e. country_code) error,
probably worth running hostapd command with -d option and seeing where it fails
exactly, though most likely after nl80211: Set freq ... (ht_enabled=1,
vht_enabled=1, bandwidth=..., cf1=..., cf2=...) log line (and list of options
following it), with some "Failed to set X: Invalid argument" error from kernel
driver.
When that's the case, if it's not just bogus channel (see below), probably worth
to stop right here and see why driver rejects this basic stuff - could be that
it doesn't actually supports running AP and/or VHT mode (esp. for proprietary ones)
or something, which should obviously be addressed first.
VHT (Very High Throughput mode, aka 802.11ac, page 214 in 802.11ac-2013.pdf) is
extension of HT (High Throughput aka 802.11n) mode and can use 20 MHz, 40 MHz,
80 MHz, 160 MHz and 80+80 MHz channel widths, which basically set following caps
on bandwidth:
- 20 MHz - 54 Mbits/s
- 40 MHz - 150-300 Mbits/s
- 80 MHz - 300+ Mbits/s
- 160 MHz or 80+80 MHz (two non-contiguous 80MHz chans) - moar!!!
Most notably, 802.11ac requires to support only up to 80MHz-wide chans, with
160 and 80+80 being optional, so pretty much guaranteed to be not supported by
95% of cheap-ish dongles, even if they advertise "full 802.11ac support!",
"USB 3.0!!!" or whatever - forget it.
"vht_oper_chwidth" parameter sets channel width to use, so "vht_oper_chwidth=1"
(80 MHz) is probably safe choice for ac here.
Unless ACS - Automatic Channel Selection - is being used (which is maybe a good
idea, but not described here at all), both "channel" and
"vht_oper_centr_freq_seg0_idx" parameters must be set (and also
"vht_oper_centr_freq_seg1_idx" for 80+80 vht_oper_chwidth=3 mode).
"vht_oper_centr_freq_seg0_idx" is "dot11CurrentChannelCenterFrequencyIndex0"
from 802.11ac-2013.pdf (22.3.7.3 on page 248 and 22.3.14 on page 296),
while "channel" option is "dot11CurrentPrimaryChannel".
Relation between these for 80MHz channels is the following one:
vht_oper_centr_freq_seg0_idx = channel + 6
Where "channel" can only be picked from the following list (see
hw_features_common.c in hostapd sources):
36 44 52 60 100 108 116 124 132 140 149 157 184 192
And vht_oper_centr_freq_seg0_idx can only be one of:
42 58 106 122 138 155
Furthermore, picking anything but 36/42 and 149/155 is probably restricted by
DFS and/or driver, and if you have any other 5G APs around, can also be
restricted by conflicts with these, as detected/reported by hostapd on start.
Which is kinda crazy - you've got your fancy 802.11ac hardware and maybe can't
even use it because hostapd refuses to use any channels if there's other 5G AP
or two around.
BSS conflicts (with other APs) are detected on start only and are easy to
patch-out with hostapd-2.6-no-bss-conflicts.patch - just 4 lines to
hw_features.c and hw_features_common.c there, should be trivial to adopt for any
newer hostpad version.
But that still leaves all the DFS/no-IR and whatever regdb-special channels locked,
which is safe for legal reasons, but also easy to patch-out in crda (loader tool
for regdb) and wireless-regdb (info on regulatory domains, e.g. US and such)
packages, e.g.:
crda patch is needed to disable signature check on loaded db.txt file,
and alternatively different public key can be used there, but it's less hassle this way.
Note that using DFS/no-IR-marked frequencies with these patches is probably
breaking the law, though no idea if and where these are actually enforced.
Also, if crda/regdb is not installed or country_code not picked, "00" regulatory
domain is used by the kernel, which is the most restrictive subset (to be ok to
use anywhere), and is probably never a good idea.
All these tweaks combined should already provide ~300 Mbits/s (half-duplex) on
a single 80 MHz channel (any from the lists above).
Beyond that, I think "vht_capab" set should be tweaked to enable STBC/LDPC
(space-time block coding) capabilities - i.e. using multiple RX/TX antennas -
which are all disabled by default, and beamforming stuff.
These are all documented in hostapd.conf, but dongles and/or rtl8812au driver
I've been using didn't have support for any of that, so didn't go there myself.
There's also bunch of wmm_* and tx_queue_* parameters, which seem to be for QoS
(prioritizing some packets over others when at 100% capacity).
Tinkering with these doesn't affect iperf3 resutls obviously, and maybe should
be done in linux QoS subsystem ("tc" tool) instead anyway.
Plenty of snippets for tweaking them are available on mailing lists and such,
but should probably be adjusted for specific traffic/setup.
One last important bandwidth optimization for both AP and any clients (stations)
is disabling all the power saving stuff with iw dev wlan0 set power_save off.
Failing to do that can completely wreck performance, and can usually be done
via kernel module parameter in /etc/modprobe.d/ instead of running "iw".
No patches or extra configuration for wpa_supplicant (tool for infra-mode
"station" client) are necessary - it will connect just fine to anything and pick
whatever is advertised, if hw supports all that stuff.
Mar 21, 2017
Traditionally glusterd (glusterfs storage node) runs as root without any kind of
namespacing, and that's suboptimal for two main reasons:
- Grossly-elevated privileges (it's root) for just using net and storing files.
- Inconvenient to manage in the root fs/namespace.
Apart from being historical thing, glusterd uses privileges for three things
that I know of:
- Set appropriate uid/gid on stored files.
- setxattr() with "trusted" namespace for all kinds of glusterfs xattrs.
- Maybe running nfsd? Not sure about this one, didn't use its nfs access.
For my purposes, only first two are useful, and both can be easily satisfied in
non-uid-mapped contained, e.g. systemd-nspawn without -U.
With user_namespaces(7), first requirement is also satisfied, as chown works
for pseudo-root user inside namespace, but second one will never work without
some kind of namespace-private fs or xattr-mapping namespace.
"user" xattr namespace works fine there though, so rather obvious fix is to make
glusterd use those instead, and it has no obvious downsides, at least if backing
fs is used only by glusterd.
xattr names are unfortunately used quite liberally in the gluster codebase, and
don't have any macro for prefix, but finding all "trusted" outside of tests/docs
with grep is rather easy, seem to be no caveats there either.
Would be cool to see something like that upstream eventually.
It won't work unless all nodes are using patched glusterfs version though,
as non-patched nodes will be sending SETXATTR/XATTROP for trusted.* xattrs.
Two extra scripts that can be useful with this patch and existing setups:
First one is to copy trusted.* xattrs to user.*, and second one to set upper
16 bits of uid/gid to systemd-nspawn container id value.
Both allow to pass fs from old root glusterd to a user-xattr-patched glusterd
inside uid-mapped container (i.e. bind-mount it there), without loosing anything.
Both operations are also reversible - can just nuke user.* stuff or upper part
of uid/gid values to revert everything back.
One more random bit of ad-hoc trivia - use getfattr -Rd -m '.*' /srv/glusterfs-stuff
(getfattr without -m '.*' hack hides trusted.* xattrs)
Note that I didn't test this trick extensively (yet?), and only use simple
distribute-replicate configuration here anyway, so probably a bad idea to run
something like this blindly in an important and complicated production setup.
Also wow, it's been 7 years since I've written here about glusterfs last,
time (is made of) flies :)
Feb 13, 2017
Got to reading short stories in Column Reader from laptop screen before sleep
recently, and for an extra-lazy points, don't want to drag my hand to keyboard
to flip pages (or columns, as the case might be).
Easy fix - get any input device and bind stuff there to keys you'd normally use.
As it happens, had Xbox 360 controller around for that.
Hard part is figuring out how to properly do it all in Xorg - need to build
xf86-input-joystick first (somehow not in Arch core), then figure out how to
make it act like a dumb event source, not some mouse emulator, and then stuff
like xev and xbindkeys will probably help.
This is way more complicated than it needs to be, and gets even more so when you
factor-in all the Xorg driver quirks, xev's somewhat cryptic nature (modifier
maps, keysyms, etc), fact that xbindkeys can't actually do "press key" actions
(have to use stuff like xdotool for that), etc.
All the while reading these events from linux itself is as trivial as evtest
/dev/input/event11 (or for event in dev.read_loop(): ...) and sending them
back is just ui.write(e.EV_KEY, e.BTN_RIGHT, 1) via uinput device.
Hence whole binding thing can be done by a tiny python loop that'd read events
from whatever specified evdev and write corresponding (desired) keys to uinput.
So instead of +1 pre-naptime story, hacked together a script to do just that -
evdev-to-xev (python3/asyncio) - which reads mappings from simple YAML and runs
the loop.
For example, to bind right joystick's (on the same XBox 360 controller) extreme
positions to cursor keys, plus triggers, d-pad and bumper buttons there:
map:
## Right stick
# Extreme positions are ~32_768
ABS_RX <-30_000: left
ABS_RX >30_000: right
ABS_RY <-30_000: up
ABS_RY >30_000: down
## Triggers
# 0 (idle) to 255 (fully pressed)
ABS_Z >200: left
ABS_RZ >200: right
## D-pad
ABS_HAT0Y -1: leftctrl leftshift equal
ABS_HAT0Y 1: leftctrl minus
ABS_HAT0X -1: pageup
ABS_HAT0X 1: pagedown
## Bumpers
BTN_TL 1: [h,e,l,l,o,space,w,o,r,l,d,enter]
BTN_TR 1: right
timings:
hold: 0.02
delay: 0.02
repeat: 0.5
Run with e.g.: evdev-to-xev -c xbox-scroller.yaml /dev/input/event11
(see also less /proc/bus/input/devices and evtest /dev/input/event11).
Running the thing with no config will print example one with comments/descriptions.
Given how all iterations of X had to work with whatever input they had at the
time, plus not just on linux, even when evdev was around, hard to blame it for
having a bit of complexity on top of way simpler input layer underneath.
In linux, aforementioned Xbox 360 gamepad is supported by "xpad" module (so that
you'd get evdev node for it), and /dev/uinput for simulating arbitrary evdev
stuff is "uinput" module.
No need for any extra Xorg drivers beyond standard evdev.
Most similar tool to such script seem to be actkbd, though afaict, one'd still
need to run xdotool from it to simulate input :O=
Github link: evdev-to-xev script (in the usual mk-fg/fgtk scrap-heap)
Feb 06, 2017
Honestly didn't think NAT'ing traffic from "lo" interface was even possible,
because traffic to host's own IP doesn't go through *ROUTING chains with iptables,
and never used "-j DNAT" with OUTPUT, which apparently works there as well.
And then also, according to e.g. Netfilter-packet-flow.svg, unlike with
nat-prerouting, nat-output goes after routing decision was made, so no point
mangling IPs there, right?
Wrong, totally possible to redirect "OUT=lo" stuff to go out of e.g. "eth0" with
the usual dnat/snat, with something like this:
table ip nat {
chain in { type nat hook input priority -160; }
chain out { type nat hook output priority -160; }
chain pre { type nat hook prerouting priority -90; }
chain post { type nat hook postrouting priority 110; }
}
add rule ip nat out oifname lo \
ip saddr $own-ip ip daddr $own-ip \
tcp dport {80, 443} dnat $somehost
add rule ip nat post oifname eth0 \
ip saddr $own-ip ip daddr $somehost \
tcp dport {80, 443} masquerade
Note the bizarre oifname lo ip saddr $own-ip ip daddr $own-ip thing.
One weird quirk - if "in" (arbitrary name, nat+input hook is the important bit)
chain isn't defined, dnat will only work one-way, not rewriting IPs in response packets.
One explaination wrt routing decision here might be arbitrary priorities that
nftables allows to set for hooks (and -160 is before iptables mangle stuff).
So, from-loopback-and-back forwarding, huh.
To think of all the redundant socats and haproxies I've seen and used for this purpose earlier...
Jan 29, 2017
Recently bumped into apparently not well-supported scenario of accessing
gitolite instance transparently on a host that is only accessible through
some other gateway (often called "bastion" in ssh context) host.
Something like this:
+---------------+
| | git@myhost.net:myrepo
| dev-machine ---------------------------+
| | |
+---------------+ |
+------------v------+
git@gitolite:myrepo | |
+---------------------------- myhost.net (gw) |
| | |
+-v-------------------+ +-------------------+
| |
| gitolite (gl) |
| host/container/vm |
| |
+---------------------+
Here gitolite instance might be running on a separate machine, or on the same
"myhost.net", but inside a container or vm with separate sshd daemon.
From any dev-machine you want to simply use git@myhost.net:myrepo to access
repositories, but naturally that won't work because in normal configuration
you'd hit sshd on gw host (myhost.net) and not on gl host.
There are quite a few common options to work around this:
Use separate public host/IP for gitolite, e.g. git.myhost.net (!= myhost.net).
TCP port forwarding or similar tricks.
E.g. simply forward ssh port connections in a "gw:22 -> gl:22" fashion,
and have gw-specific sshd listen on some other port, if necessary.
This can be fairly easy to use with something like this for odd-port sshd
in ~/.ssh/config:
Host myhost.net
Port 1234
Host git.myhost.net
Port 1235
Can also be configured in git via remote urls like
ssh://git@myhost.net:1235/myrepo.
Use ssh port forwarding to essentially do same thing as above, but with
resulting git port accessible on localhost.
Configure ssh to use ProxyCommand, which will login to gw host and setup
forwarding through it.
All of these, while suitable for some scenarios, are still nowhere near what
I'd call "transparent", and require some additional configuration for each git
client beyond just git add remote origin git@myhost.net:myrepo.
One advantage of such lower-level forwarding is that ssh authentication to
gitolite is only handled on gitolite host, gw host has no clue about that.
If dropping this is not a big deal (e.g. because gw host has root access to
everything in gl container anyway), there is a rather easy way to forward only
git@myhost.net connections from gw to gl host, authenticating them only on gw
instead, described below.
Gitolite works by building ~/.ssh/authorized_keys file with essentially
command="gitolite-shell gl-key-id" <gl-key> for each public key pushed to
gitolite-admin repository.
Hence to proxy connections from gw, similar key-list should be available there,
with key-commands ssh'ing into gitolite user/host and running above commands there
(with original git commands also passed through SSH_ORIGINAL_COMMAND env-var).
To keep such list up-to-date, post-update trigger/hook for gitolite-admin repo
is needed, which can use same git@gw login (with special "gl auth admin"
key) to update key-list on gw host.
Steps to implement/deploy whole thing:
useradd -m git on gw and run ssh-keygen -t ed25519 on both gw and gl
hosts for git/gitolite user.
Setup all connections for git@gw to be processed via single "gitolite
proxy" command, disallowing anything else, exactly like gitolite does for its
users on gl host.
gitolite-proxy.py script (python3) that I came up with for this purpose can be
found here: https://github.com/mk-fg/gitolite-ssh-user-proxy/
It's rather simple and does two things:
When run with --auth-update argument, receives gitolite authorized_keys list,
and builds local ~/.ssh/authorized_keys from it and authorized_keys.base file.
Similar to gitolite-shell, when run as gitolite-proxy key-id, ssh'es
into gl host, passing key-id and git command to it.
This is done in a straightforward os.execlp('ssh', 'ssh', '-qT', ...)
manner, no extra processing or any error-prone stuff like that.
When installing it (to e.g. /usr/local/bin/gitolite-proxy as used below),
be sure to set/update "gl_host_login = ..." line at the top there.
For --auth-update, ~/.ssh/authorized_keys.base (note .base) file on gw should
have this single line (split over two lines for readability, must be all on
one line for ssh!):
command="/usr/local/bin/gitolite-proxy --auth-update",no-port-forwarding
,no-X11-forwarding,no-agent-forwarding,no-pty ssh-ed25519 AAA...4u3FI git@gl
Here ssh-ed25519 AAA...4u3FI git@gl is the key from ~git/.ssh/id_ed25519.pub
on gitolite host.
Also run:
# install -m0600 -o git -g git ~git/.ssh/authorized_keys{.base,}
# install -m0600 -o git -g git ~git/.ssh/authorized_keys{.base,.old}
To have initial auth-file, not yet populated with gitolite-specific keys/commands.
Note that only these two files need to be writable for git user on gw host.
From gitolite (gl) host and user, run: ssh -qT git@gw < ~/.ssh/authorized_keys
This is to test gitolite-proxy setup above - should populate
~git/.ssh/authorized_keys on gw host and print back gw host key and proxy
script to run as command="..." for it (ignore them, will be installed by trigger).
Add trigger that'd run after gitolite-admin repository updates on gl host.
On gl host, put this to ~git/.gitolite.rc right before ENABLE line:
LOCAL_CODE => "$rc{GL_ADMIN_BASE}/local",
POST_COMPILE => ['push-authkeys'],
Commit/push push-authkeys trigger script (also from gitolite-ssh-user-proxy repo)
to gitolite-admin repo as local/triggers/push-authkeys,
updating gw_proxy_login line in there.
gitolite docs on adding triggers: http://gitolite.com/gitolite/gitolite.html#triggers
Once proxy-command is in place on gw and gitolite-admin hook runs at least once
(to setup gw->gl access and proxy-command), git@gw (git@myhost.net) ssh
login spec can be used in exactly same way as git@gl.
That is, fully transparent access to gitolite on a different host through that
one user, while otherwise allowing to use sshd on a gw host, without any
forwarding tricks necessary for git clients.
Whole project, with maybe a bit more refined process description and/or whatever fixes
can be found on github here: https://github.com/mk-fg/gitolite-ssh-user-proxy/
Huge thanks to sitaramc (gitolite author) for suggesting how to best setup gitolite triggers
for this purpose on the ML.
Oct 16, 2016
My problem was this: how do you do essentially a split-horizon DNS for different
apps in the same desktop session.
E.g. have claws-mail mail client go to localhost for someserver.com (because it
has port forwarded thru "ssh -L"), while the rest of them (e.g. browser and
such) keep using normal public IP.
Usually one'd use /etc/hosts for something like that, but it applies to all apps
on the machine, of course.
Next obvious option (mostly because it's been around forever) is to LD_PRELOAD
something that'd either override getaddrinfo() or open() for /etc/hosts, but
that sounds like work and not included in util-linux (yet?).
Easiest and newest (well, new-ish, CLONE_NEWNS has been around since linux-3.8
and 2013) way to do that is to run the thing in its own "mount namespace", which
sounds weird until you combine that with the fact that you can bind-mount files
(like that /etc/hosts one).
So, the magic line is:
# unshare -m sh -c\
'mount -o bind /etc/hosts.forwarding /etc/hosts
&& exec sudo -EHin -u myuser -- exec claws-mail'
Needs /etc/hosts.forwarding replacement-file for this app, which it will see as
a proper /etc/hosts, along with root privileges (or CAP_SYS_ADMIN) for CLONE_NEWNS.
Crazy "sudo -EHin" shebang is to tell sudo not to drop much env, but still
behave kinda as if on login, run zshrc and all that.
"su - myuser" or "machinectl shell myuser@ -- ..." can also be used there.
Replacing files like /etc/nsswitch.conf or /etc/{passwd,group} that way, one can
also essentially do any kind of per-app id-mapping - cool stuff.
Of course, these days sufficiently paranoid or advanced people might as well run
every app in its own set of namespaces anyway, and have pretty much everything
per-app that way, why the hell not.
Sep 25, 2016
As of linux-4.8, something like xt_policy is still - unfortunately - on the
nftables TODO list, so to match traffic pre-authenticated via IPSec, some
workaround is needed.
Obvious one is to keep using iptables/ip6tables to mark IPSec packets with old
xt_policy module, as these rules interoperate with nftables just fine, with
only important bit being ordering of iptables hooks vs nft chain priorities,
which are rather easy to find in "netfilter_ipv{4,6}.h" files, e.g.:
enum nf_ip_hook_priorities {
NF_IP_PRI_FIRST = INT_MIN,
NF_IP_PRI_CONNTRACK_DEFRAG = -400,
NF_IP_PRI_RAW = -300,
NF_IP_PRI_SELINUX_FIRST = -225,
NF_IP_PRI_CONNTRACK = -200,
NF_IP_PRI_MANGLE = -150,
NF_IP_PRI_NAT_DST = -100,
NF_IP_PRI_FILTER = 0,
NF_IP_PRI_SECURITY = 50,
NF_IP_PRI_NAT_SRC = 100,
NF_IP_PRI_SELINUX_LAST = 225,
NF_IP_PRI_CONNTRACK_HELPER = 300,
NF_IP_PRI_CONNTRACK_CONFIRM = INT_MAX,
NF_IP_PRI_LAST = INT_MAX,
};
(see also Netfilter-packet-flow.svg by Jan Engelhardt for general overview of
the iptables hook positions, nftables allows to define any number of chains
before/after these)
So marks from iptables/ip6tables rules like these:
*raw
:PREROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A PREROUTING -m policy --dir in --pol ipsec --mode transport -j MARK --or-mark 0x101
-A OUTPUT -m policy --dir out --pol ipsec --mode transport -j MARK --or-mark 0x101
COMMIT
Will be easy to match in priority=0 input/ouput hooks (as NF_IP_PRI_RAW=-300) of
nft ip/ip6/inet tables (e.g. mark and 0x101 == 0x101 accept)
But that'd split firewall configuration between iptables/nftables, adding more
hassle to keep whole "iptables" thing initialized just for one or two rules.
xfrm transformation (like ipsec esp decryption in this case) seem to preserve
all information about the packet intact, including packet marks (but not
conntrack states, which track esp connection), which - as suggested by Florian
Westphal in #netfilter - can be utilized to match post-xfrm packets in nftables
by this preserved mark field.
E.g. having this (strictly before ct state {established, related} accept for
stateful firewalls, as each packet has to be marked):
define cm.ipsec = 0x101
add rule inet filter input ip protocol esp mark set mark or $cm.ipsec
add rule inet filter input ip6 nexthdr esp mark set mark or $cm.ipsec
add rule inet filter input mark and $cm.ipsec == $cm.ipsec accept
Will mark and accept both still-encrypted esp packets (IPv4/IPv6) and their
decrypted payload.
Note that this assumes that all IPSec connections are properly authenticated and
trusted, so be sure not to use anything like that if e.g. opportunistic
encryption is enabled.
Much simpler nft-only solution, though still not a full substitute for what
xt_policy does, of couse.