Sunday, February 20, 2011

Haskell vs Busybox (for an embedded soft-realtime control system)

I'm building an embedded soft-real-time control system. It will handle sensor events and provide feedback to the user using voice synthesis.

I really want to use Haskell for this CFT project, but I can get something running so much quicker by shell scripting.  There won't be a lot of sophisticated algorithms and I don't see scalability as a concern.

When it comes down to it, I am find it harder and harder to do system programming in a "programming language" vs something in a shell (with support from awk and friends).  It doesn't matter if it is C or Haskell, it starts to feel like (once again) re-inventing a wheel.

As an example (and it has nothing to do with this current CFT project), consider this problem: I want to transform 1024 byte chunks of a file and write the results as a compressed file. The transformation doesn't matter, but let's say the transformation is written in C (or Haskell for that matter) and takes 50-100 ms per 1024 byte chunk.

I want to do this task as fast as possible. I have (at least) 2 CPU cores to work with.  Let's look at two approaches:

Approach A:   Write a Haskell/C program to read 1024 bytes at a time, perform the translation, then the compression  and write the 1024 bytes to an output file.

Okay, so I need to link in a decent gzip compression library and I use an appropriate "opt" parser to grab the input and output file.  Done.

Approach B:  dd if=$1 bs=1024 | translator | gzip > $2

This assumes that I write the same core "translator" code as above, so we can ignore that and focus on reading, compression and writing.

You can guess which will take shorter to implement, but which is the more efficient?

Well, my wild guess would be Approach B. Why? Well I already have a couple of things going for me. One is that I have automatic concurrency! While "dd" is just sitting there reading the disk, translator is running and gzip is also doing its thing. If I have 3 cores, then I have a good chance that each process can run in parallel (for at least a little while before they block). There is some cost in the piping, but that is something that linux/unix is optimized to perform.  Given that, "dd" has a good chance of causing more efficient file input buffering than my single threaded app in Approach A. The dd process has disk buffering + pipe buffering working for it so it may fetch (and dispatch) several 1024 byte chunks before it blocks on a full pipe.  A similar (but reverse) caching is happening with gzip too.

So, you then consider rewriting Approach A but using a concurrency module/library.  Ugh. Let's not go there.

So, if I take a scripting approach, my "controlling" part of the system can be written using the Shell and I can optimize to Haskell (or C) as needed.

No comments: