Aug 05, 2016

D3 chart for common temperature/rh time-series data

More D3 tomfoolery!

It's been a while since I touched the thing, but recently been asked to make a simple replacement for processing common-case time-series from temperature + relative-humidity (calling these "t" and "rh" here) sensors (DHT22, sht1x, or what have you), that's been painstakingly done in MS Excel (from tsv data) until now.

So here's the plot:

d3 t/rh chart image
Interactive version can be run directly from mk-fg/fgtk: d3-temp-rh-sensor-tsv-series-chart.html
Bunch of real-world data samples for the script: d3-temp-rh-sensor-tsv-series-chart.zip

Misc feats of the thing, in no particular order:

  • Single-html d3.v4.js + ES6 webapp (assembed by html-embed script) that can be opened from localhost or any static httpd on the net.
  • Drag-and-drop or multi-file browse/pick box, for uploading any number of tsv files (in whatever order, possibly with gaps in data) instantly to JS on the page.
  • Line chart with two Y axes (one for t, one for rh).
  • Smaller "overview" chart below that, where one can "brush" needed timespan (i.e. subset of uploaded data) for all the other charts and readouts.
  • Mouseover "vertical line" display snapping to specific datapoints.
  • List of basic stats for picked range - min/max, timespan, value count.
  • Histograms for value distribution, to easily see typical values for picked timespan, one for t and rh.

Kinda love this sort of interactive vis stuff, and it only takes a bunch of hours to put it all together with d3, as opposed to something like rrdtool, its dead images and quirky mini-language.

Also, surprisingly common use-case for this particular chart, as having such sensors connected to some RPi is pretty much first thing people usually want to do (or maybe close second after LEDs and switches).

Will probably look a bit further to make it into an offline-first Service Worker app, just for the heck of it, see how well this stuff works these days.

No point to this post, other than forgetting to write stuff for months is bad ;)

Mar 25, 2015

gnuplot for live "last 30 seconds" sliding window of "free" (memory) data

Was looking at a weird what-looks-like-a-memleak issue somewhere in the system on changing desktop background (somewhat surprisingly complex operation btw) and wanted to get a nice graph of "last 30s of free -m output", with some labels and easy access to data.

A simple enough task for gnuplot, but resulting in a somewhat complicated solution, as neither "free" nor gnuplot are perfect tools for the job.

First thing is that free -m -s1 doesn't actually give a machine-readable data, and I was too lazy to find something better (should've used sysstat and sar!) and thought "let's just parse that with awk":

free -m -s $interval |
  awk '
    BEGIN {
      exports="total used free shared available"
      agg="free_plot.datz"
      dst="free_plot.dat"}
    $1=="total" {
      for (n=1;n<=NF;n++)
        if (index(exports,$n)) headers[n+1]=$n }
    $1=="Mem:" {
      first=1
      printf "" >dst
      for (n in headers) {
        if (!first) {
          printf " " >>agg
          printf " " >>dst }
        printf "%d", $n >>agg
        printf "%s", headers[n] >>dst
        first=0 }
      printf "\n" >>agg
      printf "\n" >>dst
      fflush(agg)
      close(dst)
      system("tail -n '$points' " agg " >>" dst) }'

That might be more awk than one ever wants to see, but I imagine there'd be not too much space to wiggle around it, as gnuplot is also somewhat picky in its input (either that or you can write same scripts there).

I thought that visualizing "live" stream of data/measurements would be kinda typical task for any graphing/visualization solution, but meh, apparently not so much for gnuplot, as I haven't found better way to do it than "reread" command.

To be fair, that command seem to do what I want, just not in a much obvious way, seamlessly updating output in the same single window.

Next surprising quirk was "how to plot only last 30 points from big file", as it seem be all-or-nothing with gnuplot, and googling around, only found that people do it via the usual "tail" before the plotting.

Whatever, added that "tail" hack right to the awk script (as seen above), need column headers there anyway.

Then I also want nice labels - i.e.:

  • How much available memory was there at the start of the graph.
  • How much of it is at the end.
  • Min for that parameter on the graph.
  • Same, but max.
stats won't give first/last values apparently, unless I missed those in the PDF (only available format for up-to-date docs, le sigh), so one solution I came up with is to do a dry-run plot command with set terminal unknown and "grab first value" / "grab last value" functions to "plot".
Which is not really a huge deal, as it's just a preprocessed batch of 30 points, not a huge array of data.

Ok, so without further ado...

src='free_plot.dat'
y0=100; y1=2000;
set xrange [1:30]
set yrange [y0:y1]

# --------------------
set terminal unknown
stats src using 5 name 'y' nooutput

is_NaN(v) = v+0 != v
y_first=0
grab_first_y(y) = y_first = y_first!=0 && !is_NaN(y_first) ? y_first : y
grab_last_y(y) = y_last = y

plot src u (grab_first_y(grab_last_y($5)))
x_first=GPVAL_DATA_X_MIN
x_last=GPVAL_DATA_X_MAX

# --------------------
set label 1 sprintf('first: %d', y_first) at x_first,y_first left offset 5,-1
set label 2 sprintf('last: %d', y_last) at x_last,y_last right offset 0,1
set label 3 sprintf('min: %d', y_min) at 0,y0-(y1-y0)/15 left offset 5,0
set label 4 sprintf('max: %d', y_max) at 0,y0-(y1-y0)/15 left offset 5,1

# --------------------
set terminal x11 nopersist noraise enhanced
set xlabel 'n'
set ylabel 'megs'

set style line 1 lt 1 lw 1 pt 2 pi -1 ps 1.5
set pointintervalbox 2

plot\
  src u 5 w linespoints linestyle 1 t columnheader,\
  src u 1 w lines title columnheader,\
  src u 2 w lines title columnheader,\
  src u 3 w lines title columnheader,\
  src u 4 w lines title columnheader,\

# --------------------
pause 1
reread

Probably the most complex gnuplot script I composed to date.

Yeah, maybe I should've just googled around for an app that does same thing, though I like how this lore potentially gives ability to plot whatever other stuff in a similar fashion.

That, and I love all the weird stuff gnuplot can do.

For instance, xterm apparently has some weird "plotter" interface hardware terminals had in the past:

gnuplot and Xterm Tektronix 4014 Mode

And there's also the famous "dumb" terminal for pseudographics too.

Regular x11 output looks nice and clean enough though:

gnuplot x11 output

It updates smoothly, with line crawling left-to-right from the start and then neatly flowing through. There's a lot of styling one can do to make it prettier, but I think I've spent enough time on such a trivial thing.

Didn't really help much with debugging though. Oh well...

Full "free | awk | gnuplot" script is here on github.

May 18, 2014

The moment of epic fail hilarity with hashes

Just had an epic moment wrt how to fail at kinda-basic math, which seem to be quite representative of how people fail wrt homebrew crypto code (and what everyone and their mom warn against).

So, anyhow, on a d3 vis, I wanted to get a pseudorandom colors for text blobs, but with reasonably same luminosity on HSL scale (Hue - Saturation - Luminosity/Lightness/Level), so that changing opacity on light/dark bg can be seen clearly as a change of L in the resulting color.

There are text items like (literally, in this example) "thing1", "thing2", "thing3" - these should have all distinct and constant colors, ideally.

So how do you pick H and S components in HSL from a text tag? Just use hash, right?

As JS doesn't have any hashes yet (WebCryptoAPI is in the works) and I don't really need "crypto" here, just some str-to-num shuffler for color, decided that I might as well just roll out simple one-liner non-crypto hash func implementation.
There are plenty of those around, e.g. this list.

Didn't want much bias wrt which range of colors get picked, so there are these test results - link1, link2 - wrt how these functions work, e.g. performance and distribution of output values over uint32 range.

Picked random "ok" one - Ly hash, with fairly even output distribution, implemented as this:

hashLy_max = 4294967296 # uint32
hashLy = (str, seed=0) ->
  for n in [0..(str.length-1)]
    c = str.charCodeAt(n)
    while c > 0
      seed = ((seed * 1664525) + (c & 0xff) + 1013904223) % hashLy_max
      c >>= 8
  seed

c >>= 8 line and internal loop here because JS has unicode strings, so it's a trivial (non-standard) encoding.

But given any "thing1" string, I need two 0-255 values: H and S, not one 0-(2^32-1). So let's map output to a 0-255 range and just call it twice:

hashLy_chain = (str, count=2, max=255) ->
  [hash, hashes] = [0, []]
  scale = d3.scale.linear()
    .range([0, max]).domain([0, hashLy_max])
  for n in [1..count]
    hash = hashLy(str, hash)
    hashes.push(scale(hash))
  hashes
Note how to produce second hash output "hashLy" just gets called with "seed" value equal to the first hash - essentially hash(hash(value) || value).
People do that with md5, sha*, and their ilk all the time, right?

Getting the values from this func, noticed that they look kinda non-random at all, which is not what I came to expect from hash functions, quite used to dealing crypto hashes, which are really easy to get in any lang but JS.

So, sure, given that I'm playing around with d3 anyway, let's just plot the outputs:

Ly hash outputs

"Wat?... Oh, right, makes sense."

Especially with sequential items, it's hilarious how non-random, and even constant the output there is.
And it totally makes sense, of course - it's just a "k1*x + x + k2" function.

It's meant for hash tables, where seq-in/seq-out is fine, and the results in "chain(3)[0]" and "chain(3)[1]" calls are so close on 0-255 that they map to the same int value.

Plus, of course, the results are anything but "random-looking", even for non-sequential strings of d3.scale.category20() range.

Lession learned - know what you're dealing with, be super-careful rolling your own math from primitives you don't really understand, stop and think about wth you're doing for a second - don't just rely on "intuition" (associated with e.g. "hash" word).

Now I totally get how people start with AES and SHA1 funcs, mix them into their own crypto protocol and somehow get something analogous to ROT13 (or even double-ROT13, for extra hilarity) as a result.

May 12, 2014

X-Y plots of d3 scales and counter-intuitive domain/range effect

As I was working on a small d3-heavy project (my weird firefox homepage), I did use d3 scales for things like opacity of the item, depending on its relevance, and found these a bit counter-intuitive, but with no readily-available demo (i.e. X-Y graphs of scales with same fixed domain/range) on how they actually work.

Basically, I needed this:

d3 scales graph

I'll be first to admit that I'm no data scientist and not particulary good at math, but from what memories on the subject I have, intuition tells me that e.g. "d3.scale.pow().exponent(4)" should rise waaaay faster from the very start than "d3.scale.log()", but with fixed domain + range values, that's exactly the opposite of truth!

So, a bit confused about weird results I was getting, just wrote a simple script to plot these charts for all basic d3 scales.
And, of course, once I saw a graph, it's fairly clear how that works.

Here, it's obvious that if you want to pick something that mildly favors higher X values, you'd pick pow(2), and not sqrt.

Feel like such chart should be in the docs, but don't feel qualified enough to add it, and maybe it's counter-intuitive just for me, as I don't dabble with data visualizations much and/or might be too much of a visually inclined person.

In case someone needs the script to do the plotting (it's really trivial though): scales_graph.zip

May 12, 2014

My Firefox Homepage

Wanted to have some sort of "homepage with my fav stuff, arranged as I want to" in firefox for a while, and finally got resolve to do something about it - just finished a (first version of) script to generate the thing - firefox-homepage-generator.

Default "grid of page screenshots" never worked for me, and while there are other projects that do other layouts for different stuff, they just aren't flexible enough to do whatever horrible thing I want.

In this particular case, I wanted to experiment with chaotic tag cloud of bookmarks (so they won't ever be in the same place), relations graph for these tags and random picks for "links to read" from backlog.

Result is a dynamic d3 + d3.layout.cloud (interactive example of this layout) page without much style:

homepage screenshot
"Mark of Chaos" button in the corner can fly/re-pack tags around.
Clicking tag shows bookmarks tagged as such and fades all other tags out in proportion to how they're related to the clicked one (i.e. how many links share the tag with others).

Started using FF bookmarks again in a meaningful way only recently, so not much stuff there yet, but it does seem to help a lot, especially with these handy awesome bar tricks.

Not entirely sure how useful the cloud visualization or actually having a homepage would be, but it's a fun experiment and a nice place to collect any useful web-surfing-related stuff I might think of in the future.

Repo link: firefox-homepage-generator

Member of The Internet Defense League