I'm looking to build a VM into a game and was wondering if anyone knew of any really simple VM's (I was thinking RISC/PIC was close to what I wanted) that are usually used for embedded projects such as controlling robots, motors, sensors, etc. My main concern is having to write a compiler/assembler if I roll my own. I'd be nice to use the tools that are already out there or in its simplest form just a C compiler that can compile for it :-p.
I really don't want to re-invent the wheel here but I also need thousands of these running around a virtual world so they have to be as simple and as fast as possible. As one person has already mentioned I also don't care about real world issues such a timing and buses and all that fun stuff. I think their virtual clocks will be limited to somthing quite slow; and eventually I'll probably have to look into native compiling to make them run even faster but for now I'm just putting together prototypes to get a general proof of concept.
As input, I'm planning on distance, light, material and touch sensors mounted around the cylindrical body (16, maybe 32 of them), then simply 2 motors for directional output to control a sort of wheel on each side. essentially the processing won't be too strenuous and the world will be simple enough so that the machine's don't have to throw lots of processing power at simple tasks.
In terms of memory, I'd like them to be able to store enough data to be left alone for a couple of days without intervention for building maps and gathering stats. I don't like 8bit would cut it for processing or memory but 16bit would definitely be a contender. 32 and 64bit would just be pushing it and there's no way they'll have any more than 1mb each of memory - probably closer to 256-512k. (Bill one said 640k would be enough so why can't I!!)
if you want something rooted in the real world, one of the most-used embedded RISC microcontrollers is the PIC family. google gives several emulators, but i don't think the source is available for most.
another possibility is QEMU, which already emulates several ARM varieties.
and, of course, if you're not interested in emulating a real-world device, far easier and better performance would be to roll your own. with only what you need, and not getting into the mess of state flags, overflow bits, limited bus width, RAM timings, etc.
I wrote Wren for a friend who wanted a VM language running on an embedded controller with around 16K of RAM. (But it allows up to 64k per process in the code as written.) It includes a compiler for a dumb little programming language. It's all, uh, pretty basic and hasn't seen much use, but it is just what you described in your first paragraph.
Many people writing game programs and other applications embed a language into the application to allow users to write small programs.
As far as I can tell, the most popular embedded languages in very roughtly most-popular-first order (although "more popular" doesn't necessarily mean "better") seem to be:
You may want to check out the Gamedev StackExchange, in particular questions like "How do you add a scripting language to a game?".
You may want to check out some of the questions here on StackOverflow tagged "embedded language"; such as "Selecting An Embedded Language", "What is a good embeddable language I can use for scripting inside my software?" "Alternatives to Lua as an embedded language?" "Which game scripting language is better to use: Lua or Python?", etc.
Many implementations of these languages use some sort of bytecode internally. Often two different implementations of the same high-level programming language such as JavaScript use two completely different bytecode languages internally ( a ). Often several high-level programming languages compile to the same underlying bytecode language -- for example, the Jython implementation of Python, the Rhino implementation of JavaScript, the Jacl implementation of Tcl, JScheme and several other implementations of Scheme, and several implementations Pascal; all compile to the same JVM bytecode.
details
Why use a scripting language rather than interpreting some hardware machine language?
Why "Alternate Hard And Soft Layers"? To gain simplicity, and faster development.
faster development
People generally get stuff working faster with scripting languages rather than compiled languages.
Getting the initial prototype working is generally much quicker -- the interpreter handles a bunch of stuff behind-the-scenes that machine language forces you to explicitly write out: setting the initial values of variables to zero, subroutine-prolog and subroutine-epilog code, malloc and realloc and free and related memory-management stuff, increasing the size of containers when they get full, etc.
Once you have an initial prototype, adding new features is faster: Scripting languages have rapid edit-execute-debug cycles, since they avoid the "compile" stage of edit-compile-execute-debug cycles of compiled languages.
simplicity
We want the embedded language language to be "simple" in two ways:
If a user wants to write a little code that does some conceptually trivial task, we don't want to scare this person off with a complex language that takes 20 pounds of books and months of study in order to write a "Hello, $USER" without buffer overflows.
Since we're implementing the language, we want something easy to implement. Perhaps a few simple underlying instructions we can knock out a simple interpreter for in a weekend, and perhaps some sort of pre-existing compiler we can reuse with minimal tweaking.
When people build CPUs, hardware restrictions always end up limiting the instruction set. Many conceptually "simple" operations -- things people use all the time -- end up requiring lots of machine-language instructions to implement.
Embedded languages don't have these hardware restrictions, allowing us to implement more complicated "instructions" that do things that (to a human) seem conceptually simple. This often makes the system simpler in both ways mentioned above:
People writing directly in the language (or people writing compilers for the language) end up writing much less code, spending less time single-stepping through the code debugging it, etc.
For each such higher-level operation, we shift complexity from the compiler to an instruction's implementation inside the interpreter. Rather than (you writing code in) the compiler breaking some higher-level operation into a short loop in the intermediate language (and repeatedly stepping through that loop in your interpreter at runtime), the compiler emits one instructions in the intermediate language (and you write the same series of operations in your interpreter's implementation of that intermediate "instruction"). With all the CPU intensive stuff implemented in your compiled language ("inside" complex instructions), extremely simple interpreters are often more than fast enough. (I.e., you avoid spending a lot of time building a JIT or trying to speed things up in other ways).
For these reasons and others, many game programmers use a "scripting" language as their "embedded language".
(I see now that Javier already recommended "use an embedded scripting language", so this has turned into a long rant on why that's a good alternative to interpreting a hardware machine language, and pointing out alternatives when one particular scripting language doesn't seem suitable).
If you want simple, consider the Manchester Mark I. See page 15 of this PDF. The machine has 7 instructions. It takes about an hour to write an interpreter for it. Unfortunately, the instructions are pretty dang limited (which is why pretty much a full spec of the machine can fit on one page).
Javier's approach of rolling your own is very pragmatic. Designing and creating a tiny machine is a two day task, if that. I built a tiny VM for a project a few years ago and it took me two days to write the VM with a simple visual debugger.
Also - does it have to be RISC? You could choose, say, 68K for which there are open source emulators, and 68K was a well-understood target for gcc.
The FORTH "virtual machine" is about as simple as they come. 16-bit address space (typically), 16-bit data words, two stacks, memory. Loeliger's "Threaded Interpretive Languages" tells you a lot about how to build a FORTH interpreter on a Z80.