Distributed fault-tolerant fs take 2: MooseFS
Ok, almost one month of glusterfs was too much for me to handle. That was an epic fail ;)
Well, maybe I'm just out of luck or too brain-dead for it, whatever.
So, moving on, I've tried (although briefly) ceph.
Being in mainline kernel, and not just the staging part, I'd expected it to be much less work-in-progress, but as it is, it's very raw, to the point that x86_64 monitor daemon just crashes upon receiving data from plain x86. Interface is a bunch of simple shell scripts, fairly opaque operation, and the whole thing is built on such crap as boost.
Managed to get it running with two nodes, but it feels like the end of the world - one more kick and it all falls to pieces. Confirmed by the reports all over the mailing list and #ceph.
In-kernel and seemingly fast is a good mix though, so I may get back to it eventually, but now I'd rather prefer to settle on something that actually works.
Next thing in my sight was tahoe-lafs, but it still lacks normal posix-fs interface layer, sftp interface being totally unusable on 1.8.0c3 - no permissions, cp -R fails w/ I/O error, displayed data in inconsistent even with locally-made changes, and so on. A pity, whole system design looks very cool, with raid5-like "parity" instead of plain chunk replication, and it's python!
Thus I ended up with MooseFS.
Replication? Piece a cake, and it's configured on per-tree basis, so important or compact stuff can have one replication "goal" and some heavy trash in the neighbor path have no replication at all. No chance of anything like this with gluster and it's not even documented for ceph.
Performance is totally I/O and network bound (which is totally not-the-case with tahoe, for instance), so no complaints here as well.
One more amazing thing is how simple and transparent it is:
fraggod@anathema:~% mfsgetgoal tmp/db/softCore/_nix/os/systemrescuecd-x86-1.5.8.iso
tmp/db/softCore/_nix/os/systemrescuecd-x86-1.5.8.iso: 2
fraggod@anathema:~% mfsfileinfo tmp/db/softCore/_nix/os/systemrescuecd-x86-1.5.8.iso
tmp/db/softCore/_nix/os/systemrescuecd-x86-1.5.8.iso:
chunk 0: 000000000000CE78_00000001 / (id:52856 ver:1)
copy 1: 192.168.0.8:9422
copy 2: 192.168.0.11:9422
chunk 1: 000000000000CE79_00000001 / (id:52857 ver:1)
copy 1: 192.168.0.10:9422
copy 2: 192.168.0.11:9422
chunk 2: 000000000000CE7A_00000001 / (id:52858 ver:1)
copy 1: 192.168.0.10:9422
copy 2: 192.168.0.11:9422
chunk 3: 000000000000CE7B_00000001 / (id:52859 ver:1)
copy 1: 192.168.0.8:9422
copy 2: 192.168.0.10:9422
chunk 4: 000000000000CE7C_00000001 / (id:52860 ver:1)
copy 1: 192.168.0.10:9422
copy 2: 192.168.0.11:9422
fraggod@anathema:~% mfsdirinfo tmp/db/softCore/_nix/os
tmp/db/softCore/_nix/os:
inodes: 12
directories: 1
files: 11
chunks: 175
length: 11532174263
size: 11533462528
realsize: 23066925056
MooseFS has yet to pass the trial of time on my makeshift "cluster", yet none of the other setups went (even remotely) as smooth as this one so far, thus I feel pretty optimistic about it.