Thursday, September 08, 2011

My new language: fil (Forth Inspired Language)

My last Forth was uForth. I wrote it to run on PCs (Linux, Cygwin, etc) and MCUs (TI MSP430 and any other Harvard architecture).  The implementation was a subset of ANSI and most of the Forth words were coded in Forth.  The interpreter, compiler and VM were coded in C.

Since then, I've become (re)fascinated by more minimalistic Forths like ColorForth and MyForth.  uForth isn't a good playground for minimalistic experimentation, so I am writing a new Forth inspired language to be called fil.

Like uForth, fil will work on small MCUs as well as big PCs. It will work on Harvard based memory architectures (separate flash/ROM and RAM address spaces) as well as the more familiar linear address space.  It will have a 16 bit instruction space (limited currently to a 64KB dictionary -- quite large in Forth terrms) and a 32 bit stack/variable cell size.   Using a 32 bit instruction space will force a trade off of code bloat (double the size of code) or speed/complexity (right now I used a switch based code interpreter that assumes each token is 16 bits). In the future I may silently upgrade to a 32 bit dictionary.  This shouldn't require a rewrite ;-)

But, where do I start?  Well, uForth is a good place.  I figured I would bootstrap fil off of uForth.  In an ideal world (and ideal implementation of Forth), I would metacompile fil straight from uForth.  Unfortunately, there are some limitations/assumptions in the uForth C-based core.  So, instead, I am taking a hybrid approach of modifying the core (C) code and the uForth (Forth) code.  In essence, I am rewriting uForth to support metacompiling my new language (fil).

Metacompiling is not new. It is a time honored Forth technique of building Forth from Forth.  However, while traditional metacompilers target machine code, I am targeting a strip down version of uForth's VM (bytecode interpreter).

My approach is has three stages:

 1. I implement as much of uForth in Forth so that I can remove any underlying "C" assumptions and basically simplify the VM.  What I'll have left is a uForth/fil with the interpreter/compiler/VM written in C.  Let's call that C based executable "bootstrap.exe".

2.  I rewrite the interpreter/compiler in uForth/fil.

3. I submit the uForth/fil (Forth) source code to itself (the new interpreter/compiler) and produce a new byte code image.  I can then strip the interpreter/compiler out of  the C code and produce a simple C VM that doesn't know squat about interpreting or compiling.  This new VM executable (fil.exe) and byte code image will be fil.   I no longer use "bootstrap.exe".

After this, I can port the new VM to various MCUs.

I have already finished Stage 1, but I reserve the option to spiral back in order to remove further C assumptions that prevent progress on Stage 2.  I am also not being very careful to retain full uForth backward compatibility. At the end of Stage 1 I already have a "hybrid" fi/uForth language.

Once fil is complete, I will probably revisit the 16 bit dictionary and consider extending it to 32 bit.  If I do this, I don't want to break the idea of fil running on small (8/16 bit) MCUs efficiently.  I may consider a bank switched approach instead (multiple 16 bit dictionaries).  Don't forget: You can pack a lot of code into a 16 bit Forth!

No comments: