The moment of epic fail hilarity with hashes
Just had an epic moment wrt how to fail at kinda-basic math, which seem to be quite representative of how people fail wrt homebrew crypto code (and what everyone and their mom warn against).
So, anyhow, on a d3 vis, I wanted to get a pseudorandom colors for text blobs, but with reasonably same luminosity on HSL scale (Hue - Saturation - Luminosity/Lightness/Level), so that changing opacity on light/dark bg can be seen clearly as a change of L in the resulting color.
There are text items like (literally, in this example) "thing1", "thing2", "thing3" - these should have all distinct and constant colors, ideally.
So how do you pick H and S components in HSL from a text tag? Just use hash, right?
Didn't want much bias wrt which range of colors get picked, so there are these test results - link1, link2 - wrt how these functions work, e.g. performance and distribution of output values over uint32 range.
Picked random "ok" one - Ly hash, with fairly even output distribution, implemented as this:
hashLy_max = 4294967296 # uint32
hashLy = (str, seed=0) ->
for n in [0..(str.length-1)]
c = str.charCodeAt(n)
while c > 0
seed = ((seed * 1664525) + (c & 0xff) + 1013904223) % hashLy_max
c >>= 8
seed
c >>= 8 line and internal loop here because JS has unicode strings, so it's a trivial (non-standard) encoding.
But given any "thing1" string, I need two 0-255 values: H and S, not one 0-(2^32-1). So let's map output to a 0-255 range and just call it twice:
hashLy_chain = (str, count=2, max=255) ->
[hash, hashes] = [0, []]
scale = d3.scale.linear()
.range([0, max]).domain([0, hashLy_max])
for n in [1..count]
hash = hashLy(str, hash)
hashes.push(scale(hash))
hashes
Getting the values from this func, noticed that they look kinda non-random at all, which is not what I came to expect from hash functions, quite used to dealing crypto hashes, which are really easy to get in any lang but JS.
So, sure, given that I'm playing around with d3 anyway, let's just plot the outputs:
"Wat?... Oh, right, makes sense."
It's meant for hash tables, where seq-in/seq-out is fine, and the results in "chain(3)[0]" and "chain(3)[1]" calls are so close on 0-255 that they map to the same int value.
Plus, of course, the results are anything but "random-looking", even for non-sequential strings of d3.scale.category20() range.
Lession learned - know what you're dealing with, be super-careful rolling your own math from primitives you don't really understand, stop and think about wth you're doing for a second - don't just rely on "intuition" (associated with e.g. "hash" word).
Now I totally get how people start with AES and SHA1 funcs, mix them into their own crypto protocol and somehow get something analogous to ROT13 (or even double-ROT13, for extra hilarity) as a result.