Inverting Programming Levels

The history of programming languages has been primarily one of rising to ever higher levels of abstraction. First there was Assembly, which was a human-readable version of machine code, which abstracted out code addresses into labels. Then came the languages that were called "high level languages" when they were created, the most famous of which is C, which abstracted out types, functions, and arithmetic. Later came object-oriented, scripting, and functional languages, which were mostly implemented in C. Although languages did not necessarily follow this timeline literally (Lisp preceded C by 14 years), this was the trend.

These high level languages are excellent in their ability to create and manipulate abstractions, but this comes with a cost: these languages require more resources of various sorts than lower-level languages, making them unable to run on limited hardware. Time and memory constraints are the most common barriers to adopting high level languages, but hardware constraints can also be a barrier, such as the lack of virtual memory, a filesystem, or even an operating system. Such limited systems are still used and programmed for, because of their cheap cost or their novelty. Programmers for these systems generally resort to a lower level language, such as C or Assembly, using the traditional tried-and-true tools of the trade.

Is there a way for these programmers to take advantage of modern high-level abstractions and yet target their programs for limited environments? I believe this is possible, though it requires a way of generating programs not commonly encountered. Rather than using a compiler for a low level language to generate a low level program, it is possible to use a high level language as the compiler itself, creating a high level program that outputs the low level program when it runs. Instead of using the high level language to run programs, you use it to build programs. This sounds like quite a complex system, doesn't it? Well, in truth, it is not really much more complex than using a compiled high level language like C++. Take for instance this fragment of C++.

std::vector<int> numbers;

There are actually two layers of language abstraction here! First, C++'s template system translates the metatype std::vector<int> into a lower level type, yielding a statement like the following.

St6vectorIiSaIiEE numbers;

See that C++'s template system can be thought of as a high level language that outputs a lower level program. In fact, C++ template metaprogramming is actually Turing-complete! After the template system produces a C-like language, a more C-like compiler phase translates that into a reservation for 24 bytes of memory. If you ask the C++ compiler to output ASM, you will probably find two lines like this.

numbers: .zero 24

Wait, couldn't this mean that even the C-like portion of C++ (and C itself) is a language that describes the creation of a program rather than the running of it? Indeed, any compiled language can be thought of as a language for describing programs rather than running them. From there it is not a stretch to consider a high level language that outputs a low level language to be a compiler targeting the low level language. In fact a high level language is much more powerful than traditional low level language compilers.

Can one find instances of high level languages being used as compilers in the wild? In my limited experience in the programming world, I have not seen many such systems. I have seen code-generating programs, but almost all of those simply use string manipulation to build the source code of another high level language, making them effectively equivalent to the C preprocessor--and just as difficult to debug. I would rather be able to manipulate well-typed data structures that represent the program I wish to compile. In many ways, creating a library that allows you to do this is necessarily as difficult as creating a compiler, though it at least saves you from having to write a parser. So the only such system I have been able to write so far has been the equivalent of assembly language, although since it is in a higher level language it allows higher level abstractions than traditional assembly language would allow.

The system I wrote is in Haskell; it defines a monad that mirrors 6502 Assembly, which I have used to create programs for the NES. For some illustration, I shall compare some code in both Assembly and Haskell. If it'd take too much effort to understand these things, feel free to stop reading.

Here is a function in 6502 Assembly for multiplication, with running commentary in case you're not familiar with this brand of ASM.

multiply: ; Multiplies A by Y; scratches Y and $00 sta $00 ; Store A into location 0 lda #$00 ; Set A to 0 cpy #$00 ; Compare Y to 0, branch if equal beq multiply_ret multiply_loop: clc ; Clear carry flag adc $00 ; Add with carry dey ; Decrement Y, branch if not 0 bne multiply_loop multiply_ret: rts ; Return from subroutine

And here is a literal translation to the ASM6502 monad in Haskell.

multiply <- here -- Multiplies A by Y; scratches Y and 0x00 mdo sta 0x00 ldai 0x00 cpyi 0x00 beq multiply_ret multiply_loop <- here clc adc 0x00 dey bne multiply_loop multiply_ret <- here rts

The word mdo is an extension to Haskell which is like do syntax for monads, except it allows me to use the name multiply_ret before binding to a value to it, by feeding the monad cyclically into itself. This is only possible because of Haskell's laziness, and is the reason why I picked Haskell for this task. Just as importantly, Haskell is good at abstracting things into functions, and we can take advantage of this to make the code easier to understand. Here are some examples of these useful functions.

skip :: (Label -> ASM6502 a) -> ASM6502 () -> ASM6502 a skip brancher body = mdo brancher end retval <- body end <- here return retval rep :: (Label -> ASM6502 a) -> ASM6502 () -> ASM6502 a rep brancher body = mdo begin <- here retval <- body brancher begin return retval section :: ASM6502 a -> ASM6502 Label section body = mdo loc <- here body return loc add = clc >> adc -- This isn't a function, but it's still useful (pre >>. branch) label = pre >> branch label

And here is the same multiplier code using these functions.

multiply <- section $ mdo -- Multiplies A by Y; scratches Y and 0x00 sta 0x00 ldai 0x00 skip (cpyi 0x00 >>. beq) $ mdo rep (dey >>. bne) $ mdo add 0x00 rts

Which is shorter, better structured, and easier to reason about, and imposes no performance cost in the compiled program. These kinds of functional abstractions are type-safe and cannot be done with traditional assembly language, unless you use preprocessor macros, which are unclean and unsafe. There are other advantages that come using a high level language instead of assembly language too, which I do not have time to go into, such as modularity, static assertions, rudimentary memory management, and bulk data manipulation. I hope that I will be able to create some interesting NES software with this system.

If you are interested in seeing this kind of programming actually used, you can find it on Github.