Sunday, November 20, 2011

arrayForth notes Part 1

I haven't had a lot of time to work on my EVB001 eval board.  The time between sessions can go weeks and I tend to forget a lot of stuff.  These are notes to myself... sort of dead simple exercises to use as restart points.  So...

Here is an example of just attaching to a node and getting it to do some computation. What you type into arrayForth is displayed in Courier font.

First, you need to make sure a-com is set to the attached COM port.
a-com (this should display the com port number)

If the com port is incorrect, you must change it.  Do this:
def a-com (enter into the editor)

Navigate to the value, make the change, type save and exit/re-enter arrayForth.
(Hint:  Press ';'  to position the cursor just after the number; press 'n' to delete it; use 'u' to insert the new number; press ESC to leave that mode; press SPACE to exit editor and then type save )

Check the value again (a-com).

Now, let's go ahead and hook into node 600:
host load panel (load host code and display the panel)
talk 0 600 hook upd (talk to the chip, wire up to node 600 and update the stack view)

You should now see a bunch of numbers on the stack, starting with line 3.

Now, let's throw a couple of numbers onto the node's stack:
7 lit 2 lit (lit pushes the numbers off of the x86 arrayForth stack and onto the node's stack)

You show now see 7 and 2 as the last two values on the 4th line. 
Remember, the stack display is in hex and the values you are entering is in decimal.
(If you wish to enter values in Hex mode, press F1 and precede hex numbers with a zero (0).)

Now, let's add the two numbers:
r+ (the "r" distinguishes the F18 addition word from the x86 "+" word)

You should now see 9 on top of the stack.


Now, let's try one more contrived exercise.  Let's load some data into the node's RAM:
55 0 r! (take 2 values off of the x86 stack:  55 goes into location 0)

You won't see the changed memory until you type:

So, at last, we have something working "hands on".  The example in the arrayForth user guide is great, but sometimes a good start is to just to be able to interactively talk to the chip.


Another GreenArrays blog

I've stumbled upon:

Oh, and also, look to the right of this entry (under Links) for the permanent home of my arrayForth cheat sheet (just revised today).

Saturday, October 01, 2011

GreenArrays arrayForth Keyboard cheat sheet

I'm having trouble wrapping my head around the arrayForth editor keyboard layout diagram in section 3.1 of the arrayForth user guide, so I am trying to put together a slightly modified version that more tightly associates the keyboard keys (and position) to function.  I am also dropping the grouping-via-color since I don't have a color printer at hand.  Here is a link to the PDF.

Tuesday, September 27, 2011

My GreenArrays EVB001 Eval Board Adventure

Okay, so I broke down and purchased an eval board last week. I got my shipment notice last Friday (which included a nice personal note from Greg Bailey mentioning that he saw my last blog post -- thanks Greg) and the board arrived Monday.

Now, to answer my own question (from that post): What to do with 144 cores?  I guess I'm going to have to figure that one out...

I've got a big learning curve ahead of me, and although I'm not the type to post daily updates on "learning experiences", I'll probably post now and then how it is going. If I get an overwhelming burst of energy, then I may even fork my EVB001 adventures to a new blog dedicated to just that.

Anyway, what are my current plans?

  1. Learn enough arrayForth (ColorForth) to be dangerous.
  2. Work my way around the board (nodes and I/O).
  3. Begin world dominating project.
Regarding #1, I have followed ColorForth for years, but I never really used it.  That being said, I am using Charley Shattuck's MyForth on my day job (shhh.. don't tell them)  and that is different enough from ANS Forths that the arrayForth "culture-shock" is low. 

Working around the board (#2) is critical as I have to figure out what my peripheral hook up options are.  I figure that I would try and get the board talking to an accelerometer (or other sensor). This would be a good goal.

Now, world domination (#3) is a bit vague.

Now, here is what I am thinking.... My usual approach of building tiny/simple things that can be replicated (low volume production runs) won't work here.  I simply can't afford to dedicate a $450 eval board to a single task.  Then again, I hate the idea of just using it as a "prototyping" board for various ideas.  I need a more singular goal.  

So, I am viewing the eval board as a "platform".  But, a platform for what?

When someone (for passion) designs and builds their own car, plane or boat, they are creating something unique. They are not making something with the end goal of mass production. They are building a "system" that satisfies their own needs. Now, if that "system" later results in replication due to demand, then that is great. But, it is all about building something unique -- something unlike the other guy's car, plane or boat.

You may see where I am getting... the usual place: Robotics.

But, here I use the word "Robot" in loose terms. I am thinking about building a platform to support the integration of sensors and actuators.  I want to load up the EVB001 with as many sensors as possible and have it collect, correlate and react through the manipulation of actuators.  However, I want to do this within the tightest time constraint possible: I want a tight coupling between sensors and actuators. I want a feedback mechanism. I want... my flocking Goslings (or at least one of them at this point).

Integrating lots of sensors with a single fast running ARM is certainly possible. But this would be interrupt hell (or polling hell or linux process/thread management hell). This is why I (and other sensible people) incorporate tiny 8051s, AVRs and MSP430s into dumb sensors -- to make them independently smarter.  Unfortunately, when you have a bunch of microcontroller enhanced sensors (and actuators) you have a communication nightmare. And you need a separate master CPU to integrate all of the "smart" data and manipulate the actuators.

None of this is new. None of this is rocket science. However, the robot I design would be my bot. It would be unique. 

More deep thoughts later... For now, I just need to figure out how to talk to my new toy ;-)

Monday, September 19, 2011

GreenArrays G144 - What to do with 144 cores?

I've been following the GreenArrays G144 since its inception.  Now a kit is available... programmable in colorforth (and eforth).  Forth chips aren't new to me. I remember devouring the Novix NC4000 back in the mid-80s (I couldn't afford one...).

So, the kit costs $450. I don't really have that kind of money to drop on a dev kit, but... if I did manage to scrape up the cash, what would I do with 144 computing cores?

Seriously, that is a good question for deep thinking. From an embedded computing perspective, what could one do with 144 computing cores?

If I can come up with some good ideas, this kit may be on my birthday wishlist.... ;-)

Friday, September 09, 2011

Smart Things

I want to design Smart Things.

Smart Things are small devices that do specific things to augment our own intelligence and abilities.

A Smart Thing communicates with the outside world (and other external Smart Things) via protocols like NFC, Bluetooth or  TCP/IP.

Co-located Smart Things may use SPI, I2C, UART or even bit banging GPIO.

A Smart Thing is never too smart, although it should require very little human intervention.  It should be just smart enough to justify its own existence (as a gadget or parasitic module).

A Smart Thing is usually energy efficient. It should run off of a battery or a host's power.

Some examples of a Smart Things:

  • A UV monitor that samples sunlight, calculates UV Index and can be queried through by an NFC Reader (like the Google Nexus S phone).
  • A motion detector that sends alerts through Bluetooth or Wi-Fi.
  • A light detector that keeps a log of when a room light has been turned on and for how long. It too could be queried through NFC.
  • Sensor modules for gardens. The Smart Things could measure soil moisture  and temperature.  They could pass the information on through Ant or Zigbee to another Smart Thing that collects and correlates the data (for collection via smart phone).
A Smart Thing should not be too expensive: The more, the merrier.

A Smart Thing should last for at least 10 years (with at most 1 battery change per year) -- it should be embedded and "forgotten".

A really small Smart Thing could be built around a low power 8051 running MyForth or maybe an MSP430 running uForth/fil.  The point here being: You don't care what the platform is.  It just needs to work... and be smart.

Thursday, September 08, 2011

Tethered Forths

In my last post I talk about writing my new language fil in uForth via 3 stage metacompilation.
I want to note here that there is an alternate approach I am considering:  Tethering.

I don't need to metacompile fil in order to get it running on a small MCU. If I retain the C code that does interpreting/compiling (bootstrap.exe) and write a new stripped down VM that doesn't understand interpreting/compiling (fil.exe), I only have to port the stripped down VM to the MCU.  I would do all of my development/compiling on the PC (with bootstrap.exe -- renamed pc-fil.exe).  The resulting byte code image (from compilation) would be moved to the MCU and run by the ported fil.exe.

This approach is a subset of what uForth already does.  However, uForth also allows for the C based interpreter/compiler to run on the MCU.  This is a bit weighty, but essentially gives me the ability to interactively develop directly on the MCU (interacting through a serial interface).

A better approach is tethering, and fil will prefer this approach even if I do re-implement the interpreter/compiler in uForth.  In tethering, you do your development on the PC (using fil.exe/bootstrap.exe) and craft a MCU VM that listens on a serial port for special "commands".  These commands can be as simple as "store byte-code" and "execute byte-code". You maintain a mirror of the dictionary between the PC and MCU. All the MCU VM needs to do (from an interaction perspective) is write PC compiled byte-code to its dictionary and to be able to execute it. No interpreter is needed.

This is the brilliant method used by various Forths including MyForth, and Riscy Pygness.

If I run into too many walls doing my 3 stages, I may take a break and just go tethered. I do have MCU CFT projects I want to get done!

My new language: fil (Forth Inspired Language)

My last Forth was uForth. I wrote it to run on PCs (Linux, Cygwin, etc) and MCUs (TI MSP430 and any other Harvard architecture).  The implementation was a subset of ANSI and most of the Forth words were coded in Forth.  The interpreter, compiler and VM were coded in C.

Since then, I've become (re)fascinated by more minimalistic Forths like ColorForth and MyForth.  uForth isn't a good playground for minimalistic experimentation, so I am writing a new Forth inspired language to be called fil.

Like uForth, fil will work on small MCUs as well as big PCs. It will work on Harvard based memory architectures (separate flash/ROM and RAM address spaces) as well as the more familiar linear address space.  It will have a 16 bit instruction space (limited currently to a 64KB dictionary -- quite large in Forth terrms) and a 32 bit stack/variable cell size.   Using a 32 bit instruction space will force a trade off of code bloat (double the size of code) or speed/complexity (right now I used a switch based code interpreter that assumes each token is 16 bits). In the future I may silently upgrade to a 32 bit dictionary.  This shouldn't require a rewrite ;-)

But, where do I start?  Well, uForth is a good place.  I figured I would bootstrap fil off of uForth.  In an ideal world (and ideal implementation of Forth), I would metacompile fil straight from uForth.  Unfortunately, there are some limitations/assumptions in the uForth C-based core.  So, instead, I am taking a hybrid approach of modifying the core (C) code and the uForth (Forth) code.  In essence, I am rewriting uForth to support metacompiling my new language (fil).

Metacompiling is not new. It is a time honored Forth technique of building Forth from Forth.  However, while traditional metacompilers target machine code, I am targeting a strip down version of uForth's VM (bytecode interpreter).

My approach is has three stages:

 1. I implement as much of uForth in Forth so that I can remove any underlying "C" assumptions and basically simplify the VM.  What I'll have left is a uForth/fil with the interpreter/compiler/VM written in C.  Let's call that C based executable "bootstrap.exe".

2.  I rewrite the interpreter/compiler in uForth/fil.

3. I submit the uForth/fil (Forth) source code to itself (the new interpreter/compiler) and produce a new byte code image.  I can then strip the interpreter/compiler out of  the C code and produce a simple C VM that doesn't know squat about interpreting or compiling.  This new VM executable (fil.exe) and byte code image will be fil.   I no longer use "bootstrap.exe".

After this, I can port the new VM to various MCUs.

I have already finished Stage 1, but I reserve the option to spiral back in order to remove further C assumptions that prevent progress on Stage 2.  I am also not being very careful to retain full uForth backward compatibility. At the end of Stage 1 I already have a "hybrid" fi/uForth language.

Once fil is complete, I will probably revisit the 16 bit dictionary and consider extending it to 32 bit.  If I do this, I don't want to break the idea of fil running on small (8/16 bit) MCUs efficiently.  I may consider a bank switched approach instead (multiple 16 bit dictionaries).  Don't forget: You can pack a lot of code into a 16 bit Forth!

Monday, August 29, 2011

Software defined... Radio, GPS, ... etc?

Most comms modules (e.g. Bluetooth, GPS, etc),  memory peripherals (e.g. SD cards, USB sticks, etc) and other sophisticated "chips" have embedded processor cores. These cores may be based on stock 8051 or specialized ARM designs.  They are smart devices that save system designers a lot of integration time by being "drop ins" (i.e. you talk with them via simple protocols over UART, I2C or SPI) and they do all of the hard work.

Recently, reading about Software-defined Radios (replacing hardware based tuning/filtering with software) and this article (dumber GPS modules where satellite correlation/fusion is done by back end computers), makes me wonder if the future will present dumber peripherals in trade for more processing on our main CPUs.

How many processor cores are there in an average smart phone? You've got the primary CPU running the OS, but have you considered what is powering your Bluetooth, Wi-fi, GPS, cellular modem, display and touch interface?  Having sophisticated software in these peripheral chips certainly aids time to market (less programming for the integrator).  But, I rely on the craftiness of the chip designer to meet my needs.

Sure, its all software (even when on individual hardware modules), but rarely are these things upgradeable. They have a limited product life (even if the analog part of it is still relevant).  This is good for hardware companies, but not good for us (the end user/consumer).  

I remember playing with early MEM accelerometer chips. They usually just output a voltage for an axis. There were no "interrupts" or SPI or I2C protocols. They were analog devices. It was up to me to figure out what they were spitting out and deal appropriately.

Now I use smart digital accelerometers that notify me when an event (e.g. tilt, acceleration exceeding a threshold, free fall, tap, etc) occurs. Sometimes they can be frustrating if they don't quite provide what I need -- lots of register based tuning usually takes care of this, but still...

Imagine a smart phone where all of the processing was done on the main CPU. Sure, that would bog it down significantly, but imagine a much faster (and power efficient) main CPU (maybe even with 4 or 5 cores).  Now, your GPS/Bluetooth/Cellular-modem are just  analog transceivers that streams bits or analog signals.  Your smartphone would just have a bunch of antennas, transceivers and sensor hardware. All of the software resides somewhere on the main CPU. Imagine having access to that software.

Wednesday, August 03, 2011

UV Index monitor prototype #1

I don't know if I mentioned it here before, but I've been working on a personal UV (Index) monitor.
Folks who have skin cancer (or those at high risk) need to make sure they limit their sun exposure.

The general approach is to just lather up with sunscreen every time you leave the house, but this is impractical (plus you have to re-apply every couple of hours).  This becomes more of an annoyance when you consider spending hours riding in a car: Are the windows UV protected? How well? Do you have to lather up every time you drive?

You can get UV index forecasts on your smartphone, but these are just forecasts (for your area and for the whole day). When you are out in the sun, you'll need to know how much UV intensity is hitting you  "right now".

Another solution is to carry a UV monitor.

The only ones I've seen on the market are overkill (too large and complex) or vague (how does this work and is it reliable -- where is the sensor?) .
I am aiming at something so small that you'll always carry it with you, but also clear and as accurate as possible.  My target form factor is a key fob:

My target UI is based on colored LEDs. There are official colors for the UV index scale and I have an LED for each level.  I would like to have (at most) 2 buttons -- one for "instant read" (point at the sun and an LED will light up for 2 seconds indicating UV index level) and one for setting a countdown timer (for sunscreen re-application).

My current prototype has 1 button, 5 high-intensity LEDs (green, yellow, orange, red and blue/violet) and is a little bulkier than a key fob. Amazingly, the LEDs are quite readable in bright sunlight! If you are colorblind you can always read index based on which LED lights up (right?). The current layout ramps "upwards" depending on UV intensity.

It takes a single coin cell battery and is based on a very low power 8051 from SiLabs. It should get 3-5 years off the battery with casual usage.

I need to do lots of tuning/calibration and I know it won't be "demo worthy" for the rest of this summer, but I am making progress.  Apparently, the calculations done for UV Index forecasting aren't very practical for small single UV sensors. Somehow, the personal UV monitors make due though.  I think I'll use one of the better ones to aid in my calibration.

Maybe I'll have case design and a formal board spin ready for next summer?

Friday, July 15, 2011

Tiny computers that fit on your fingernail...

Here is a thought:

Pick up a  microSD card.  Place it on a fingernail. Look at how small it is. How much does it hold? 1GB? 2GB? 8GB? More?  Amazing. That is a lot of storage. These things are examples of how storage keeps shrinking while maintaining incredible capacity. You could fit a whole library on a microSD, right?

But consider this: Inside all microSD cards lie an MCU core . (It may be an 8051.  The 8051 MCU is still a popular flash memory controller that you'll find in a majority of your USB thumb drives, SD and even (as a naked die) microSD cards.)  Each MCU contains some small amount of RAM too.

So,  on your fingernail you have an 8 or 16 bit computer (typically running > 50Mhz)  with high speed I/O, RAM, gigabytes of persistent storage, and firmware that was probably written in C.

Mind blown.

Thursday, June 09, 2011

Android: Bluetooth Low Energy vs USB

With all the hype about adding devices/peripherals to Android via USB, I desperately want a low energy wireless means of adding devices. A number of my (yet-to-be-started) CFT projects involve collecting sensor data for correlation/display on smart phones.  ANT has always looked appealing, but with next to nothing in way of smartphone support, the new Bluetooth 4.0 BLE support looks like it may capture the market.

This year promises new Android devices with BLE.  On the peripheral/sensor front, we seem to have 2 major vendor choices: Nordic and TI.

I'm not ready to drop money on a kit just yet, but my "body worn" sensor projects may get a kickstart knowing that a suitable means of data display is coming soon.

Sunday, May 01, 2011

Ultrasonic goslings: Sensors and software

I'm starting to get back into low level embedded systems. I'm back to see what 8-bits can do in a 64-bit world.

Part of this reboot is to cast a fresh eye towards some of the sensor enhanced systems I've been mulling around for the past couple of years.

In particular, I am re-investigating some ultrasonic tracking stuff.  In a nutshell, I want to to build a flock of robots (does 3 constitute a flock?) that will follow me around. Think: Mother goose and goslings.

Imagine that you have an ultrasonic transmitter, attached to your belt, that transmits a short "beep" every second.  If your robots have 3 ultrasonic sensors each, then they can use hyperbolic positioning (Multilateration) to figure out where you are.  (The time difference between the 3 received beeps gives you direction; the receive time between each transmitted beep gives you distance).

Now, every decent circuit I've seen for ultrasonic transducers tend to be fairly complex to build (mostly for clean amplification and rectification of the received signal).  Just throwing a transducer onto a (relatively) clean MCU with a sensitive ADC won't cut it. Or can it?

We tend to want to put the cleanest, most linear signal into the ADC, but nature doesn't work that way. Nature uses a ton of error correction (software).  Even without perfectly working ears or eyes, the brain adapts to form a "picture".

Given a noisy, weak, poorly rectified signal from an ultrasonic receiver, can software make sense of it?

Monday, April 11, 2011

Forth for ARM Cortex M3...

This news makes me happy :-)
I should break out my old STM eval boards and give it a try.

The last Forth (ignoring mine) that I've used was Charlie Shattuck's MyForth. Well it looks like he has created a new MyForth for the Arduino crowd. Slide here and sources here.

The nice thing about minimalism within the microcontroller world is that your end result is a "device". You don't have a lot of extra stuff (software standards, etc) to deal with.... so as long as your device interfaces correctly with outside world, the question is: Does it do something useful/interesting? Not: Did you use CouchDB, MongoDB or SQL?

Ah, the simple life.

Also, a shout out to GreenArrays for releasing initial measurements in their G144A12 spec sheet.

Ugh. I really need to find the time (and money) to play with the dev kit.


Sunday, April 10, 2011

File under "Elegant": Factorial in Plan 9 rc (under Linux)

I've posted before that I find Plan 9's rc shell elegant.  I've been using a "slightly" modified (I've made read and echo builtins) version for a few months now and have been doing extensive scripting. I hope never to go back to bash.

Here is a small script to compute factorials. Since "bc" deals with arbitrary precision, we can go much higher than the 32 or 64 bits.

Chew on this:

fn fac {
num=0 factorial=1 frombc=$2 tobc=$3 {
for (num in `{seq $1}) {
echo $factorial '*' $num >$tobc
factorial=`{read <$frombc}
echo $factorial
fn fixlinebreaks {
awk -F '\\' '{printf("%s",$1)}
$0 !~ /\\$/ {printf("\n"); fflush("");}'
fac $1 <>{bc | fixlinebreaks}

There are several interesting things here:
  1. Concurrent processing (co-processes actually).
  2. Messaging through unix pipes.
  3. Lazy computation (generator).
The last line launches bc as a coprocessing service (filtering output through fixlinebreaks which is needed due to a "feature" of bc where it breaks up long numbers into multiple lines -- blech).  We treat bc as a service and send it requests (steps in the factorial computation) and it responds with our intermediate results.
This factorial algorithm is iterative rather than recursive, but rather than using an incrementing counter loop, we generate all numbers using the 'seq' program and loop through that lazily generated list!

How slow do you think this script will run?  Well on my Toshiba Portege r705 notebook with a Core i3, factorial of 1024 takes 2.4 seconds. Is that slow?

Earlier I said that I had enhanced rc with "echo" and "read" as builtins (normally they are external). Using the non-builtin "echo" and "read" increases the run time to 5.1 seconds.

Of course this isn't production code, but here is the take-away: "bc" gives you a bignum calculator for free.  Use it.

Monday, March 14, 2011

Tackling the Simple Problems: The domain of the minimalist

The hard problems are more interesting by nature and the world is full of hard problems.  This blog post isn't about them. Instead, I want to talk about simple problems.

Simple problems are still problems, they just don't have world shaking impact (or so you would think).

To be honest: most simple problems are only simple on the surface.  Underneath, complexity is always lurking.

Take, for instance, my desire to (re)build a very simple blogging system (for my own personal use).  Blog software isn't all that hard to build. If you don't care about performance and scalability, then it is pretty straightforward. That is, until you get down to building one. As soon as you start thinking about security, feeds, multimedia, etc.  you start to expose the underlying complexity of "working" software.

Now, as I said earlier, this is still something of a simple problem. Developing blogger software isn't rocket science.  But, in some ways, that makes it harder.

When something is so simple (conceptually), it can be quite difficult to "get it right".  Getting it right is about hitting that sweet spot. Blogging software needs to do its simple job correctly and intuitively. If it is hard to install, or has "hard to grok" idiosyncrasies, then it doesn't solve the "simple problem" of blogging.

Consider another "simple problem".  I have around 40GB of music (mostly in MP3 format) that I want to play on my living room stereo (away from a computer). There are solutions I can buy, but none quite fit. I don't need streaming (although I would like to listen to online radio sometimes) and I don't need a "total entertainment solution".  I tend to listen to whole albums, not mixes or "randomized" selections based on genre.

All I need is a single MP3 storage device, the ability to add/delete queued albums  from any of my household PCs (web browser NOT a hard requirement), and a simple "remote" (pause, play, next song, previous song).  What I want is a music "server" and it only has to serve one sound system. (Wi-fi streaming of music is broken in my house -- too much sporadic interference).

There are server based (free!) software solutions out there, but they usually solve (only) 90% of my "simple problem".  They then throw UPnP, webservers, GUIs and all sorts of networking into the mix.  This is more than I want (after all I am a minimalist).

Note: Before computers, my problem was solved 100% by a CD player w/ 200+ CDs and before that it was solved by vinyl LPs.  Now I have a bunch of MP3s and less capability to enjoy music than when I had CDs.

Simple problems are harder than you think.

uForth Dump...and run

uForth has been mentioned here several times last year. It was my attempt at a very, very portable Forth (no dynamic memory allocation, ANSI C,  bytecode generator for portable images, etc). It has been run successfully on MSP430s as well as Windows/Linux.  No MSP430 code here unfortunately. I did most of the MSP430 code as part of my day job in 2010. It isn't mine to give away.

However, you can get a dump of the generic ANSI code here.  I haven't touched it in months and it needs documentation (and some general lovin').  Unfortunately, I don't have access to MSP430s anymore and so that is left as an exercise for the reader :-(

Monday, March 07, 2011

Notes on Mail header (and MIME) parsers...

I'm trying to resurrect my old gawk based blogging system BLOGnBOX. It (ab)uses gawk to do everything from POP3 mail retrieval (you email your blog entry...) to FTP based posting of the blog (it is a static html blog).

I intend on cleaning it up by doing away from the gawk abuses. I am either going to make it (Plan 9) rc based (with Plan 9 awk and some C for the networking) or perhaps Haskell.  That is quite a choice, eh?

I've done a bit of Haskell over the past few months and feel strong enough to do the next generation BLOGnBOX, but the main problem is actually getting the thing going. (This is a nighttime CFT and, well, I have to get into a Haskell frame of thinking).

The first task up is a parser for mime encoded email. I plan on using regular expressions (yes, I know -- use Parsec or something more Haskell-ish).  Awk is somewhat of a natural for this, but Gawk has a little more "oomph".  I can visualize how I would do it in Awk, but the Haskell is not coming naturally.

Well, it isn't all that difficult to get started in Haskell:

module MailParser where
import Text.Regex
import qualified Data.Map  as Map
type Header = Map.Map String [String]
header_regex = mkRegex "^(From|To|Subject)[ ]*:[ ]*(.+)"
parseHeader :: String -> Header -> Header
parseHeader s h = case matchRegex header_regex s
                                     of Nothing -> h
                                         Just (k:v) -> Map.insert k v h

  Well, that is a beginning. Of course, I should be using ByteStrings for efficiency...  and, yes... I know... I know... I should be using Parsec


Thursday, February 24, 2011

Rc - Making shell scripting suck less

There are some tremendous ideas behind the ubiquitous Unix shell (um, that would be Bourne, bash, (d)ash or maybe ksh?).  The problem is that a lot of these ideas are very, very dated.  Bash is probably the best example of how to keep a Unix bourne dialect alive. Ksh was beastly (tons of features), but I think bash has finally passed it.  But is this a good thing?

As I start writing more complex scripts I begin to feel the age of Bourne.  I have been using (d)ash (due to it being the Busybox shell and much smaller than bash -- GNU seem to be set on add the kitchen sink to every tool.) You can pretty much do general purpose scripting with Bash, but still with the legacy syntax of Bourne. You might as well go with Perl or Python (and their associated huge installation footprints).

Then there is rc (the Plan 9 shell). It starts with Bourne and "fixes" things rather than tack on stuff around the edges. It is very minimalistic and has a certain elegance I haven't seen since Awk.  Plan 9's toolbox minimalism was an attempt to get back to the origins of Unix (lots of small single purpose tools). The famous anti-example of this is probably GNU ls.  Look at the options, the many, many options.

Rc isn't actively supported much (Plan 9 has since faded -- if it ever shone brightly to begin with), but it has the feel of something well thought out.

You'll hear more from me about that in upcoming posts.
Time to shut up and code.

Monday, February 21, 2011

Exceptions and Errors in embedded systems

These past few posts have been ramblings to myself at the cusp of starting a new CFT (Copious Free Time) project.  I am weighing an "elegant" path (Haskell) vs a "Old Unix hacker" path (Shell scripts).

While the Haskell approach is alluring, there is a lot of learning to do there and I am an "Old Unix hacker".  I am very familiar with the benefits of functional programming and have found the past 3 months doing Haskell (some on my day job) a lot of fun.

But, I know I can get more accomplished sooner if I take a "Unix hacker" approach.

Now, for the meat of this post (and an often arguing point against using shell scripts in critical environments): Safety.

Or, more specifically, what about all of the points of unchecked failure in a shell script?
Doesn't this betray the notion of an embedded system?

Well, there is the dangerous situation of uncaught typos, but let's say we are real careful. How do we handle problems like:
1. A process in the pipeline dies unexpectedly.
2. The filesystem becomes 100% full.

Interestingly, while something like "dd if=$1 | transform | gzip >$2" looks like it can be full of the above problems, I could argue that you have this problem using any programming language/approach.

However, because it is so difficult to catch "exceptional" errors in the shell, it starts to make me wonder how I would handle this in a language that supports "exceptions".

This is where things start to unravel (for me).  What do you do in that exception? How do you recover?
Let's look at some approaches:

1. Unix approach:   Wrap the "dd" line in a script and have a monitor start it, capture and log stderr and restart it if necessary (but not too aggressively -- maybe at some point give up and shutdown the system).
2. Erlang approach:  Interestingly similar to above.
3. Language w/ exceptions: Catch the error, close the files and.... um, restart?

In the Unix approach, the cleanup is mostly done for you. Good fault tolerance practice (as suggested by Erlang) is pretty much handled by variants of init (I believe that daemontool's supervisor has been doing this well for years).

I am sure there are holes in my argument, but for my CFT, I am persisting all important data on disk (an event queue is central to my system). Every change (addition, execution, removal) of an event is an atomic disk transaction. If any process dies, it can be relaunched and pick up where it left off.

For fault tolerant (embedded) systems I am not sure what I would do in an "exception" handler... outside of clean up and die.


Haskell scripting for robust embedded systems...

A convincing (unrelated) counter view to my prior posts here: Practical Haskell: scripting with types.

Sunday, February 20, 2011

Unix Shell scripting for robust embedded systems

The summary/ramification of my previous post:

Shell scripting (in this case Busybox) is a viable approach to developing robust, long running embedded systems. 

If you can afford to run a (multi-tasking, memory managed) Linux kernel in your embedded system and Busybox is there, then the shell (ash in this case) becomes a potential basis for a system architecture.

Of course, this is not breaking news. But I think it gets lost when we start taking a " single programming language" view of system development (as advocated by almost every modern programming language).  If you are trying hard to figure out how to get your favorite programming language to do what the shell does, then maybe it isn't the right tool for the job. 

Sure, the "shell" isn't elegant and is full of pitfalls and gotchas when you use it beyond a couple of lines, but when your shell script starts to grow, you too should consider looking elsewhere for help (i.e. commands beyond what is built into the shell).

An example: Don't get caught up in gawk/bash's ability to read from TCP sockets, leverage netcat (nc).


Haskell vs Busybox (for an embedded soft-realtime control system)

I'm building an embedded soft-real-time control system. It will handle sensor events and provide feedback to the user using voice synthesis.

I really want to use Haskell for this CFT project, but I can get something running so much quicker by shell scripting.  There won't be a lot of sophisticated algorithms and I don't see scalability as a concern.

When it comes down to it, I am find it harder and harder to do system programming in a "programming language" vs something in a shell (with support from awk and friends).  It doesn't matter if it is C or Haskell, it starts to feel like (once again) re-inventing a wheel.

As an example (and it has nothing to do with this current CFT project), consider this problem: I want to transform 1024 byte chunks of a file and write the results as a compressed file. The transformation doesn't matter, but let's say the transformation is written in C (or Haskell for that matter) and takes 50-100 ms per 1024 byte chunk.

I want to do this task as fast as possible. I have (at least) 2 CPU cores to work with.  Let's look at two approaches:

Approach A:   Write a Haskell/C program to read 1024 bytes at a time, perform the translation, then the compression  and write the 1024 bytes to an output file.

Okay, so I need to link in a decent gzip compression library and I use an appropriate "opt" parser to grab the input and output file.  Done.

Approach B:  dd if=$1 bs=1024 | translator | gzip > $2

This assumes that I write the same core "translator" code as above, so we can ignore that and focus on reading, compression and writing.

You can guess which will take shorter to implement, but which is the more efficient?

Well, my wild guess would be Approach B. Why? Well I already have a couple of things going for me. One is that I have automatic concurrency! While "dd" is just sitting there reading the disk, translator is running and gzip is also doing its thing. If I have 3 cores, then I have a good chance that each process can run in parallel (for at least a little while before they block). There is some cost in the piping, but that is something that linux/unix is optimized to perform.  Given that, "dd" has a good chance of causing more efficient file input buffering than my single threaded app in Approach A. The dd process has disk buffering + pipe buffering working for it so it may fetch (and dispatch) several 1024 byte chunks before it blocks on a full pipe.  A similar (but reverse) caching is happening with gzip too.

So, you then consider rewriting Approach A but using a concurrency module/library.  Ugh. Let's not go there.

So, if I take a scripting approach, my "controlling" part of the system can be written using the Shell and I can optimize to Haskell (or C) as needed.

The Wisdom of Unix shell scripting for systems

Prototype with Perl, Ruby, Python, Tcl, etc and to optimize you have to dive into using an FFI (foreign function interface).

Prototype with a Unix/Linux shell (bash, ash, ksh, etc) and to optimize you rewrite proc/commands in your favorite (compiled?) language and use stdin/stdout as the interface.

When piping makes sense (or concurrent processes with lightweight I/O requirements), 30+ years of shell wisdom is your friend.

If shell scripts can be trusted to boot your Unix/Linux distribution, can the shell be trusted as the controller/glue for your application?


Monday, February 07, 2011

Haskell Revelation -- Optimizing via refactoring

I have some code that lazily transforms a fairly large list of data (from an IO source) that must be search sequentially. Since it is lazy, the list isn't fully transformed until the first search. Since, I presume, a rather large stack of thunks are constructed instead, this first search takes a really, really long time. (It would be faster, I surmised, to strictly transform the list as it was being built, rather than lazily upon the first search).

I started playing with `seq` but couldn't quite get the strictness right -- the code represented some of my first attempts at Haskell.  So, I decided to refactor the code (replace naive tail recursion with maps, filters, folds, etc). I figured at this point I would be able to see more clearly how to avoid the lazy list.

Surprisingly, this refactoring was enough for the compiler to "do the right thing" and sped my application up significantly. What was the compiling doing here? Did it remove the laziness? Or did it just optimize the hell out of what I thought was a lazy vs strict problem?


Monday, January 24, 2011

How low can Haskell go? (Or what should I do with Haskell?)

Is Haskell a suitable language for programming robots?  Would a robot with a strictly functional brain be safer?  I have run a fairly hairy 2K line Haskell app on a target as small as an Atom based netbook.  It ran reasonably.

I suppose, for a linux target, that a Haskell executable is pretty much like any other executable.

What is the benefit to taking a purely functional approach to robotics?

For some pioneers (think Greenblatt, Stallman, etc) Lisp was a "systems programming" language. Is Haskell a reasonable successor?

These are questions I am pondering.  I am doing some Haskell (here an there -- prototyping stuff at work, etc), but I keep wondering what the killer application would be *for me*.