; Mu: An exploration on making the global structure of programs more accessible. ; ; "Is it a language, or an operating system, or a virtual machine? Mu." ; (with apologies to Robert Pirsig: http://en.wikipedia.org/wiki/Mu_%28negative%29#In_popular_culture) ; ;; Motivation ; ; I want to live in a world where I can have an itch to tweak a program, clone ; its open-source repository, orient myself on how it's organized, and make ; the simple change I envisioned, all in an afternoon. This codebase tries to ; make this possible for its readers. (More details: http://akkartik.name/about) ; ; What helps comprehend the global structure of programs? For starters, let's ; enumerate what doesn't: idiomatic code, adherence to a style guide or naming ; convention, consistent indentation, API documentation for each class, etc. ; These conventional considerations improve matters in the small, but don't ; help understand global organization. They help existing programmers manage ; day-to-day operations, but they can't turn outsider programmers into ; insiders. (Elaboration: http://akkartik.name/post/readable-bad) ; ; In my experience, two things have improved matters so far: version control ; and automated tests. Version control lets me rewind back to earlier, simpler ; times when the codebase was simpler, when its core skeleton was easier to ; ascertain. Indeed, arguably what came first is by definition the skeleton of ; a program, modulo major rewrites. Once you understand the skeleton, it ; becomes tractable to 'play back' later major features one by one. (Previous ; project that fleshed out this idea: http://akkartik.name/post/wart-layers) ; ; The second and biggest boost to comprehension comes from tests. Tests are ; good for writers for well-understood reasons: they avoid regressions, and ; they can influence code to be more decoupled and easier to change. In ; addition, tests are also good for the outsider reader because they permit ; active reading. If you can't build a program and run its tests it can't help ; you understand it. It hangs limp at best, and might even be actively ; misleading. If you can run its tests, however, it comes alive. You can step ; through scenarios in a debugger. You can add logging and scan logs to make ; sense of them. You can run what-if scenarios: "why is this line not written ; like this?" Make a change, rerun tests: "Oh, that's why." (Elaboration: ; http://akkartik.name/post/literate-programming) ; ; However, tests are only useful to the extent that they exist. Think back to ; your most recent codebase. Do you feel comfortable releasing a new version ; just because the tests pass? I'm not aware of any such project. There's just ; too many situations envisaged by the authors that were never encoded in a ; test. Even disciplined authors can't test for performance or race conditions ; or fault tolerance. If a line is phrased just so because of some subtle ; performance consideration, it's hard to communicate to newcomers. ; ; This isn't an arcane problem, and it isn't just a matter of altruism. As ; more and more such implicit considerations proliferate, and as the original ; authors are replaced by latecomers for day-to-day operations, knowledge is ; actively forgotten and lost. The once-pristine codebase turns into legacy ; code that is hard to modify without expensive and stress-inducing ; regressions. ; ; How to write tests for performance, fault tolerance, race conditions, etc.? ; How can we state and verify that a codepath doesn't ever perform memory ; allocation, or write to disk? It requires better, more observable primitives ; than we currently have. Modern operating systems have their roots in the ; 70s. Their interfaces were not designed to be testable. They provide no way ; to simulate a full disk, or a specific sequence of writes from different ; threads. We need something better. ; ; This project tries to move, groping, towards that 'something better', a ; platform that is both thoroughly tested and allows programs written for it ; to be thoroughly tested. It tries to answer the question: ; ; If Denis Ritchie and Ken Thompson were to set out today to co-design unix ; and C, knowing what we know about automated tests, what would they do ; differently? ; ; To try to impose *some* constraints on this gigantic yak-shave, we'll try to ; keep both language and OS as simple as possible, focused entirely on ; permitting more kinds of tests, on first *collecting* all the information ; about implicit considerations in some form so that readers and tools can ; have at least some hope of making sense of it. ; ; The initial language will be just assembly. We'll try to make it convenient ; to program in with some simple localized rewrite rules inspired by lisp ; macros and literate programming. Programmers will have to do their own ; memory management and register allocation, but we'll provide libraries to ; help with them. ; ; The initial OS will provide just memory management and concurrency ; primitives. No users or permissions (we don't live on mainframes anymore), ; no kernel- vs user-mode, no virtual memory or process abstraction, all ; threads sharing a single address space (use VMs for security and ; sandboxing). The only use case we care about is getting a test harness to ; run some code, feed it data through blocking channels, stop it and observe ; its internals. The code under test is expected to cooperate in such testing, ; by logging important events for the test harness to observe. (More info: ; http://akkartik.name/post/tracing-tests) ; ; The common thread here is elimination of abstractions, and it's not an ; accident. Abstractions help insiders manage the evolution of a codebase, but ; they actively hinder outsiders in understanding it from scratch. This ; matters, because the funnel to turn outsiders into insiders is critical to ; the long-term life of a codebase. Perhaps authors should raise their ; estimation of the costs of abstraction, and go against their instincts for ; introducing it. That's what I'll be trying to do: question every abstraction ; before I introduce it. We'll see how it goes. ; --- ;; Getting started ; ; Mu is currently built atop Racket and Arc, but this is temporary and ; contingent. We want to keep our options open, whether to port to a different ; host language, and easy to rewrite to native code for any platform. So we'll ; try to avoid 'cheating': relying on the host platform for advanced ; functionality. ; ; Other than that, we'll say no more about the code, and focus in the rest of ; this file on the scenarios the code cares about. (selective-load "mu.arc" section-level) (ero "running tests in mu.ar.c.t (takes ~30s)") ;? (quit) (set allow-raw-addresses*) (section 20 ; Our language is assembly-like in that functions consist of series of ; statements, and statements consist of an operation and its arguments (input ; and output). ; ; oarg1, oarg2, ... <- op arg1, arg2, ... ; ; Args must be atomic, like an integer or a memory address, they can't be ; expressions doing arithmetic or function calls. But we can have any number ; of them. ; ; Since we're building on lisp, our code samples won't look quite like the ; idealized syntax above. For now they will look like this: ; ; (function f [ ; (oarg1 oarg2 ... <- op arg1 arg2 ...) ; ... ; ... ; ]) ; ; Each arg/oarg can contain metadata separated by slashes and colons. In this ; first example below, the only metadata is types: 'integer' for a memory ; location containing an integer, and 'literal' for a value included directly ; in code. (Assembly languages traditionally call them 'immediate' operands.) ; In the future a simple tool will check that the types line up as expected in ; each op. A different tool might add types where they aren't provided. ; Instead of a monolithic compiler I want to build simple, lightweight tools ; that can be combined in various ways, say for using different typecheckers ; in different subsystems. ; ; In our tests we'll define such mu functions using a call to 'add-code', so ; look for it when reading the code examples. Everything outside 'add-code' is ; just test-harness details that can be skipped at first. (reset) ;? (set dump-trace*) (new-trace "literal") (add-code '((function main [ (1:integer <- copy 23:literal) ]))) ;? (set dump-trace*) (run 'main) ;? (prn memory*) (when (~is memory*.1 23) (prn "F - 'copy' writes its lone 'arg' after the instruction name to its lone 'oarg' or output arg before the arrow. After this test, the value 23 is stored in memory address 1.")) ;? (reset) ;? 2 ;? (quit) ;? 2 ; Our basic arithmetic ops can operate on memory locations or literals. ; (Ignore hardware details like registers for now.) (reset) (new-trace "add") (add-code '((function main [ (1:integer <- copy 1:literal) (2:integer <- copy 3:literal) (3:integer <- add 1:integer 2:integer) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 1 2 3 3 4)) (prn "F - 'add' operates on two addresses")) ;? (reset) ;? 1 ;? (quit) ;? 1 (reset) (new-trace "add-literal") (add-code '((function main [ (1:integer <- add 2:literal 3:literal) ]))) (run 'main) (when (~is memory*.1 5) (prn "F - ops can take 'literal' operands (but not return them)")) (reset) (new-trace "sub-literal") (add-code '((function main [ (1:integer <- subtract 1:literal 3:literal) ]))) (run 'main) ;? (prn memory*) (when (~is memory*.1 -2) (prn "F - 'subtract'")) (reset) (new-trace "mul-literal") (add-code '((function main [ (1:integer <- multiply 2:literal 3:literal) ]))) (run 'main) ;? (prn memory*) (when (~is memory*.1 6) (prn "F - 'multiply'")) (reset) (new-trace "div-literal") (add-code '((function main [ (1:integer <- divide 8:literal 3:literal) ]))) (run 'main) ;? (prn memory*) (when (~is memory*.1 (/ real.8 3)) (prn "F - 'divide'")) (reset) (new-trace "idiv-literal") (add-code '((function main [ (1:integer 2:integer <- divide-with-remainder 23:literal 6:literal) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 3 2 5)) (prn "F - 'divide-with-remainder' performs integer division")) (reset) (new-trace "dummy-oarg") ;? (set dump-trace*) (add-code '((function main [ (_ 2:integer <- divide-with-remainder 23:literal 6:literal) ]))) (run 'main) (when (~iso memory* (obj 2 5)) (prn "F - '_' oarg can ignore some results")) ;? (quit) ; Basic boolean operations: and, or, not ; There are easy ways to encode booleans in binary, but we'll skip past those ; details for now. (reset) (new-trace "and-literal") (add-code '((function main [ (1:boolean <- and t:literal nil:literal) ]))) ;? (set dump-trace*) (run 'main) ;? (prn memory*) (when (~is memory*.1 nil) (prn "F - logical 'and' for booleans")) ; Basic comparison operations (reset) (new-trace "lt-literal") (add-code '((function main [ (1:boolean <- less-than 4:literal 3:literal) ]))) (run 'main) ;? (prn memory*) (when (~is memory*.1 nil) (prn "F - 'less-than' inequality operator")) (reset) (new-trace "le-literal-false") (add-code '((function main [ (1:boolean <- lesser-or-equal 4:literal 3:literal) ]))) (run 'main) ;? (prn memory*) (when (~is memory*.1 nil) (prn "F - 'lesser-or-equal'")) (reset) (new-trace "le-literal-true") (add-code '((function main [ (1:boolean <- lesser-or-equal 4:literal 4:literal) ]))) (run 'main) ;? (prn memory*) (when (~is memory*.1 t) (prn "F - 'lesser-or-equal' returns true for equal operands")) (reset) (new-trace "le-literal-true-2") (add-code '((function main [ (1:boolean <- lesser-or-equal 4:literal 5:literal) ]))) (run 'main) ;? (prn memory*) (when (~is memory*.1 t) (prn "F - 'lesser-or-equal' - 2")) ; Control flow operations: jump, jump-if, jump-unless ; These introduce a new type -- 'offset' -- for literals that refer to memory ; locations relative to the current location. (reset) (new-trace "jump-skip") (add-code '((function main [ (1:integer <- copy 8:literal) (jump 1:offset) (2:integer <- copy 3:literal) ; should be skipped (reply) ]))) ;? (set dump-trace*) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 8)) (prn "F - 'jump' skips some instructions")) ;? (quit) (reset) (new-trace "jump-target") (add-code '((function main [ (1:integer <- copy 8:literal) (jump 1:offset) (2:integer <- copy 3:literal) ; should be skipped (reply) (3:integer <- copy 34:literal) ]))) ; never reached (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 8)) (prn "F - 'jump' doesn't skip too many instructions")) ;? (quit) (reset) (new-trace "jump-if-skip") (add-code '((function main [ (2:integer <- copy 1:literal) (1:boolean <- equal 1:literal 2:integer) (jump-if 1:boolean 1:offset) (2:integer <- copy 3:literal) (reply) (3:integer <- copy 34:literal) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 t 2 1)) (prn "F - 'jump-if' is a conditional 'jump'")) (reset) (new-trace "jump-if-fallthrough") (add-code '((function main [ (1:boolean <- equal 1:literal 2:literal) (jump-if 3:boolean 1:offset) (2:integer <- copy 3:literal) (reply) (3:integer <- copy 34:literal) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 nil 2 3)) (prn "F - if 'jump-if's first arg is false, it doesn't skip any instructions")) (reset) (new-trace "jump-if-backward") (add-code '((function main [ (1:integer <- copy 2:literal) (2:integer <- copy 1:literal) ; loop (2:integer <- add 2:integer 2:integer) (3:boolean <- equal 1:integer 2:integer) (jump-if 3:boolean -3:offset) ; to loop (4:integer <- copy 3:literal) (reply) (3:integer <- copy 34:literal) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 2 2 4 3 nil 4 3)) (prn "F - 'jump-if' can take a negative offset to make backward jumps")) (reset) (new-trace "jump-label") (add-code '((function main [ (1:integer <- copy 2:literal) (2:integer <- copy 1:literal) loop (2:integer <- add 2:integer 2:integer) (3:boolean <- equal 1:integer 2:integer) (jump-if 3:boolean loop:offset) (4:integer <- copy 3:literal) (reply) (3:integer <- copy 34:literal) ]))) ;? (set dump-trace*) ;? (= dump-trace* (obj whitelist '("-"))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 2 2 4 3 nil 4 3)) (prn "F - 'jump-if' can take a negative offset to make backward jumps")) ;? (quit) ; Data movement relies on addressing modes: ; 'direct' - refers to a memory location; default for most types. ; 'literal' - directly encoded in the code; implicit for some types like 'offset'. (reset) (new-trace "direct-addressing") (add-code '((function main [ (1:integer <- copy 34:literal) (2:integer <- copy 1:integer) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 34 2 34)) (prn "F - 'copy' performs direct addressing")) ; 'Indirect' addressing refers to an address stored in a memory location. ; Indicated by the metadata '/deref'. Usually requires an address type. ; In the test below, the memory location 1 contains '2', so an indirect read ; of location 1 returns the value of location 2. (reset) (new-trace "indirect-addressing") (add-code '((function main [ (1:integer-address <- copy 2:literal) ; unsafe; can't do this in general (2:integer <- copy 34:literal) (3:integer <- copy 1:integer-address/deref) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 2 2 34 3 34)) (prn "F - 'copy' performs indirect addressing")) ; Output args can use indirect addressing. In the test below the value is ; stored at the location stored in location 1 (i.e. location 2). (reset) (new-trace "indirect-addressing-oarg") (add-code '((function main [ (1:integer-address <- copy 2:literal) (2:integer <- copy 34:literal) (1:integer-address/deref <- add 2:integer 2:literal) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 2 2 36)) (prn "F - instructions can perform indirect addressing on output arg")) ;; Compound data types ; ; Until now we've dealt with scalar types like integers and booleans and ; addresses, where mu looks like other assembly languages. In addition, mu ; provides first-class support for compound types: arrays and and-records. ; ; 'get' accesses fields in and-records ; 'index' accesses indices in arrays ; ; Both operations require knowledge about the types being worked on, so all ; types used in mu programs are defined in a single global system-wide table ; (see type* in mu.arc for the complete list of types; we'll add to it over ; time). ; first a sanity check that the table of types is consistent (reset) (each (typ typeinfo) type* (when typeinfo!and-record (assert (is typeinfo!size (len typeinfo!elems))) (when typeinfo!fields (assert (is typeinfo!size (len typeinfo!fields)))))) (reset) (new-trace "get-record") (add-code '((function main [ (1:integer <- copy 34:literal) (2:boolean <- copy nil:literal) (3:boolean <- get 1:integer-boolean-pair 1:offset) (4:integer <- get 1:integer-boolean-pair 0:offset) ]))) ;? (set dump-trace*) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 34 2 nil 3 nil 4 34)) (prn "F - 'get' accesses fields of and-records")) ;? (quit) (reset) (new-trace "get-indirect") (add-code '((function main [ (1:integer <- copy 34:literal) (2:boolean <- copy nil:literal) (3:integer-boolean-pair-address <- copy 1:literal) (4:boolean <- get 3:integer-boolean-pair-address/deref 1:offset) (5:integer <- get 3:integer-boolean-pair-address/deref 0:offset) ]))) ;? (set dump-trace*) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 34 2 nil 3 1 4 nil 5 34)) (prn "F - 'get' accesses fields of and-record address")) (reset) (new-trace "get-indirect-repeated") (add-code '((function main [ (1:integer <- copy 34:literal) (2:integer <- copy 35:literal) (3:integer <- copy 36:literal) (4:integer-point-pair-address <- copy 1:literal) ; unsafe (5:integer-point-pair-address-address <- copy 4:literal) ; unsafe (6:integer-integer-pair <- get 5:integer-point-pair-address-address/deref/deref 1:offset) (8:integer <- get 5:integer-point-pair-address-address/deref/deref 0:offset) ]))) (run 'main) (when (~memory-contains 6 '(35 36 34)) (prn "F - 'get' can deref multiple times")) ;? (quit) (reset) (new-trace "get-compound-field") (add-code '((function main [ (1:integer <- copy 34:literal) (2:integer <- copy 35:literal) (3:integer <- copy 36:literal) (4:integer-integer-pair <- get 1:integer-point-pair 1:offset) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 34 2 35 3 36 4 35 5 36)) (prn "F - 'get' accesses fields spanning multiple locations")) (reset) (new-trace "get-address") (add-code '((function main [ (1:integer <- copy 34:literal) (2:boolean <- copy t:literal) (3:boolean-address <- get-address 1:integer-boolean-pair 1:offset) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 34 2 t 3 2)) (prn "F - 'get-address' returns address of fields of and-records")) (reset) (new-trace "get-address-indirect") (add-code '((function main [ (1:integer <- copy 34:literal) (2:boolean <- copy t:literal) (3:integer-boolean-pair-address <- copy 1:literal) (4:boolean-address <- get-address 3:integer-boolean-pair-address/deref 1:offset) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 34 2 t 3 1 4 2)) (prn "F - 'get-address' accesses fields of and-record address")) (reset) (new-trace "index-literal") (add-code '((function main [ (1:integer <- copy 2:literal) (2:integer <- copy 23:literal) (3:boolean <- copy nil:literal) (4:integer <- copy 24:literal) (5:boolean <- copy t:literal) (6:integer-boolean-pair <- index 1:integer-boolean-pair-array 1:literal) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 2 2 23 3 nil 4 24 5 t 6 24 7 t)) (prn "F - 'index' accesses indices of arrays")) ;? (quit) (reset) (new-trace "index-direct") (add-code '((function main [ (1:integer <- copy 2:literal) (2:integer <- copy 23:literal) (3:boolean <- copy nil:literal) (4:integer <- copy 24:literal) (5:boolean <- copy t:literal) (6:integer <- copy 1:literal) (7:integer-boolean-pair <- index 1:integer-boolean-pair-array 6:integer) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 2 2 23 3 nil 4 24 5 t 6 1 7 24 8 t)) (prn "F - 'index' accesses indices of arrays")) ;? (quit) (reset) (new-trace "index-indirect") (add-code '((function main [ (1:integer <- copy 2:literal) (2:integer <- copy 23:literal) (3:boolean <- copy nil:literal) (4:integer <- copy 24:literal) (5:boolean <- copy t:literal) (6:integer <- copy 1:literal) (7:integer-boolean-pair-array-address <- copy 1:literal) (8:integer-boolean-pair <- index 7:integer-boolean-pair-array-address/deref 6:integer) ]))) ;? (= dump-trace* (obj blacklist '("sz" "mem" "addr" "cvt0" "cvt1"))) ;? (set dump-trace*) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 2 2 23 3 nil 4 24 5 t 6 1 7 1 8 24 9 t)) (prn "F - 'index' accesses indices of array address")) ;? (quit) (reset) (new-trace "index-indirect-multiple") (add-code '((function main [ (1:integer <- copy 4:literal) (2:integer <- copy 23:literal) (3:integer <- copy 24:literal) (4:integer <- copy 25:literal) (5:integer <- copy 26:literal) (6:integer-array-address <- copy 1:literal) ; unsafe (7:integer-array-address-address <- copy 6:literal) ; unsafe (8:integer <- index 7:integer-array-address-address/deref/deref 1:literal) ]))) (run 'main) (when (~is memory*.8 24) (prn "F - 'index' can deref multiple times")) (reset) (new-trace "index-address") (add-code '((function main [ (1:integer <- copy 2:literal) (2:integer <- copy 23:literal) (3:boolean <- copy nil:literal) (4:integer <- copy 24:literal) (5:boolean <- copy t:literal) (6:integer <- copy 1:literal) (7:integer-boolean-pair-address <- index-address 1:integer-boolean-pair-array 6:integer) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 2 2 23 3 nil 4 24 5 t 6 1 7 4)) (prn "F - 'index-address' returns addresses of indices of arrays")) (reset) (new-trace "index-address-indirect") (add-code '((function main [ (1:integer <- copy 2:literal) (2:integer <- copy 23:literal) (3:boolean <- copy nil:literal) (4:integer <- copy 24:literal) (5:boolean <- copy t:literal) (6:integer <- copy 1:literal) (7:integer-boolean-pair-array-address <- copy 1:literal) (8:integer-boolean-pair-address <- index-address 7:integer-boolean-pair-array-address/deref 6:integer) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 2 2 23 3 nil 4 24 5 t 6 1 7 1 8 4)) (prn "F - 'index-address' returns addresses of indices of array addresses")) ; Array values know their length. Record lengths are saved in the types table. (reset) (new-trace "len-array") (add-code '((function main [ (1:integer <- copy 2:literal) (2:integer <- copy 23:literal) (3:boolean <- copy nil:literal) (4:integer <- copy 24:literal) (5:boolean <- copy t:literal) (6:integer <- length 1:integer-boolean-pair-array) ]))) (run 'main) ;? (prn memory*) (when (~is memory*.6 2) (prn "F - 'length' of array")) (reset) (new-trace "len-array-indirect") (add-code '((function main [ (1:integer <- copy 2:literal) (2:integer <- copy 23:literal) (3:boolean <- copy nil:literal) (4:integer <- copy 24:literal) (5:boolean <- copy t:literal) (6:integer-address <- copy 1:literal) (7:integer <- length 6:integer-boolean-pair-array-address/deref) ]))) ;? (set dump-trace*) ;? (= dump-trace* (obj blacklist '("sz" "mem" "addr" "cvt0" "cvt1"))) (run 'main) ;? (prn memory*) (when (~is memory*.7 2) (prn "F - 'length' of array address")) ; 'sizeof' is a helper to determine the amount of memory required by a type. ; Only for non-arrays. (reset) (new-trace "sizeof-record") (add-code '((function main [ (1:integer <- sizeof integer-boolean-pair:literal) ]))) (run 'main) ;? (prn memory*) (when (~is memory*.1 2) (prn "F - 'sizeof' returns space required by arg")) (reset) (new-trace "sizeof-record-not-len") (add-code '((function main [ (1:integer <- sizeof integer-point-pair:literal) ]))) (run 'main) ;? (prn memory*) (when (is memory*.1 2) (prn "F - 'sizeof' is different from number of elems")) ; Regardless of a type's length, you can move it around just like a primitive. (reset) (new-trace "copy-record") (add-code '((function main [ (1:integer <- copy 34:literal) (2:boolean <- copy nil:literal) (4:boolean <- copy t:literal) (3:integer-boolean-pair <- copy 1:integer-boolean-pair) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 34 2 nil 3 34 4 nil)) (prn "F - ops can operate on records spanning multiple locations")) (reset) (new-trace "copy-record2") (add-code '((function main [ (1:integer <- copy 34:literal) (2:integer <- copy 35:literal) (3:integer <- copy 36:literal) (4:integer <- copy 0:literal) (5:integer <- copy 0:literal) (6:integer <- copy 0:literal) (4:integer-point-pair <- copy 1:integer-point-pair) ]))) ;? (= dump-trace* (obj whitelist '("run" "sizeof"))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 34 2 35 3 36 ; result 4 34 5 35 6 36)) (prn "F - ops can operate on records with fields spanning multiple locations")) ) ; section 20 (section 100 ; A special kind of record is the 'tagged type'. It lets us represent ; dynamically typed values, which save type information in memory rather than ; in the code to use them. This will let us do things like create heterogenous ; lists containing both integers and strings. Tagged values admit two ; operations: ; ; 'save-type' - turns a regular value into a tagged-value of the appropriate type ; 'maybe-coerce' - turns a tagged value into a regular value if the type matches ; ; The payload of a tagged value must occupy just one location. Save pointers ; to records. (reset) (new-trace "tagged-value") ;? (= dump-trace* (obj blacklist '("sz" "mem" "addr" "cvt0" "cvt1"))) (add-code '((function main [ (1:type <- copy integer:literal) (2:integer <- copy 34:literal) (3:integer 4:boolean <- maybe-coerce 1:tagged-value integer:literal) ]))) ;? (set dump-trace*) (run 'main) ;? (prn completed-routines*) (each routine completed-routines* (aif rep.routine!error (prn "error - " it))) ;? (prn memory*) (when (or (~is memory*.3 34) (~is memory*.4 t)) (prn "F - 'maybe-coerce' copies value only if type tag matches")) ;? (quit) (reset) (new-trace "tagged-value-2") ;? (set dump-trace*) (add-code '((function main [ (1:type <- copy integer-address:literal) (2:integer <- copy 34:literal) (3:boolean 4:boolean <- maybe-coerce 1:tagged-value boolean:literal) ]))) (run 'main) ;? (prn memory*) (when (or (~is memory*.3 0) (~is memory*.4 nil)) (prn "F - 'maybe-coerce' doesn't copy value when type tag doesn't match")) (reset) (new-trace "save-type") (add-code '((function main [ (1:integer <- copy 34:literal) (2:tagged-value <- save-type 1:integer) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 34 2 'integer 3 34)) (prn "F - 'save-type' saves the type of a value at runtime, turning it into a tagged-value")) (reset) (new-trace "init-tagged-value") (add-code '((function main [ (1:integer <- copy 34:literal) (2:tagged-value-address <- init-tagged-value integer:literal 1:integer) (3:integer 4:boolean <- maybe-coerce 2:tagged-value-address/deref integer:literal) ]))) ;? (= dump-trace* (obj blacklist '("sz" "mem" "addr" "cvt0" "cvt1" "sizeof"))) (run 'main) ;? (prn memory*) (when (or (~is memory*.3 34) (~is memory*.4 t)) (prn "F - 'init-tagged-value' is the converse of 'maybe-coerce'")) ;? (quit) ; Now that we can package values together with their types, we can construct a ; dynamically typed list. (reset) (new-trace "list") ;? (set dump-trace*) (add-code '((function main [ ; 1 points at first node: tagged-value (int 34) (1:list-address <- new list:literal) (2:tagged-value-address <- list-value-address 1:list-address) (3:type-address <- get-address 2:tagged-value-address/deref type:offset) (3:type-address/deref <- copy integer:literal) (4:location <- get-address 2:tagged-value-address/deref payload:offset) (4:location/deref <- copy 34:literal) (5:list-address-address <- get-address 1:list-address/deref cdr:offset) (5:list-address-address/deref <- new list:literal) ; 6 points at second node: tagged-value (boolean t) (6:list-address <- copy 5:list-address-address/deref) (7:tagged-value-address <- list-value-address 6:list-address) (8:type-address <- get-address 7:tagged-value-address/deref type:offset) (8:type-address/deref <- copy boolean:literal) (9:location <- get-address 7:tagged-value-address/deref payload:offset) (9:location/deref <- copy t:literal) (10:list-address <- get 6:list-address/deref 1:offset) ]))) (let routine make-routine!main (enq routine running-routines*) (let first rep.routine!alloc ;? (= dump-trace* (obj whitelist '("run"))) ;? (set dump-trace*) (run) ;? (prn memory*) (each routine completed-routines* (aif rep.routine!error (prn "error - " it))) (when (or (~all first (map memory* '(1 2 3))) (~is memory*.first 'integer) (~is memory*.4 (+ first 1)) (~is (memory* (+ first 1)) 34) (~is memory*.5 (+ first 2)) (let second memory*.6 (or (~is (memory* (+ first 2)) second) (~all second (map memory* '(6 7 8))) (~is memory*.second 'boolean) (~is memory*.9 (+ second 1)) (~is (memory* (+ second 1)) t) (~is memory*.10 nil)))) (prn "F - lists can contain elements of different types")))) (run-code test2 (10:list-address <- list-next 1:list-address)) (each routine completed-routines* (aif rep.routine!error (prn "error - " it))) (when (~is memory*.10 memory*.6) (prn "F - 'list-next can move a list pointer to the next node")) ;? (quit) ; 'init-list' takes a variable number of args and constructs a list containing ; them. Just integers for now. (reset) (new-trace "init-list") (add-code '((function main [ (1:integer <- init-list 3:literal 4:literal 5:literal) ]))) ;? (= dump-trace* (obj blacklist '("sz" "mem" "addr" "cvt0" "cvt1" "sizeof"))) (run 'main) ;? (prn memory*) (let first memory*.1 ;? (prn first) (when (or (~is memory*.first 'integer) (~is (memory* (+ first 1)) 3) (let second (memory* (+ first 2)) ;? (prn second) (or (~is memory*.second 'integer) (~is (memory* (+ second 1)) 4) (let third (memory* (+ second 2)) ;? (prn third) (or (~is memory*.third 'integer) (~is (memory* (+ third 1)) 5) (~is (memory* (+ third 2) nil))))))) (prn "F - 'init-list' can construct a list of integers"))) ) ; section 100 (section 20 ;; Functions ; ; Just like the table of types is centralized, functions are conceptualized as ; a centralized table of operations just like the "primitives" we've seen so ; far. If you create a function you can call it like any other op. (reset) (new-trace "new-fn") (add-code '((function test1 [ (3:integer <- add 1:integer 2:integer) ]) (function main [ (1:integer <- copy 1:literal) (2:integer <- copy 3:literal) (test1) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 1 2 3 3 4)) (prn "F - calling a user-defined function runs its instructions")) ;? (quit) (reset) (new-trace "new-fn-once") (add-code '((function test1 [ (1:integer <- copy 1:literal) ]) (function main [ (test1) ]))) ;? (= dump-trace* (obj whitelist '("run"))) (run 'main) (when (~is 2 curr-cycle*) (prn "F - calling a user-defined function runs its instructions exactly once " curr-cycle*)) ;? (quit) ; User-defined functions communicate with their callers through two ; primitives: ; ; 'arg' - to access inputs ; 'reply' - to return outputs (reset) (new-trace "new-fn-reply") (add-code '((function test1 [ (3:integer <- add 1:integer 2:integer) (reply) (4:integer <- copy 34:literal) ]) (function main [ (1:integer <- copy 1:literal) (2:integer <- copy 3:literal) (test1) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 1 2 3 3 4)) (prn "F - 'reply' stops executing the current function")) ;? (quit) (reset) (new-trace "new-fn-reply-nested") (add-code '((function test1 [ (3:integer <- test2) ]) (function test2 [ (reply 2:integer) ]) (function main [ (2:integer <- copy 34:literal) (test1) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 2 34 3 34)) (prn "F - 'reply' stops executing any callers as necessary")) ;? (quit) (reset) (new-trace "new-fn-reply-once") (add-code '((function test1 [ (3:integer <- add 1:integer 2:integer) (reply) (4:integer <- copy 34:literal) ]) (function main [ (1:integer <- copy 1:literal) (2:integer <- copy 3:literal) (test1) ]))) ;? (= dump-trace* (obj whitelist '("run"))) (run 'main) (when (~is 5 curr-cycle*) (prn "F - 'reply' executes instructions exactly once " curr-cycle*)) ;? (quit) (reset) (new-trace "reply-increments-caller-pc") (add-code '((function callee [ (reply) ]) (function caller [ (1:integer <- copy 0:literal) (2:integer <- copy 0:literal) ]))) (freeze function*) (= routine* (make-routine 'caller)) (assert (is 0 pc.routine*)) (push-stack routine* 'callee) ; pretend call was at first instruction of caller (run-for-time-slice 1) (when (~is 1 pc.routine*) (prn "F - 'reply' increments pc in caller (to move past calling instruction)")) (reset) (new-trace "new-fn-arg-sequential") (add-code '((function test1 [ (4:integer <- next-input) (5:integer <- next-input) (3:integer <- add 4:integer 5:integer) (reply) (4:integer <- copy 34:literal) ]) (function main [ (1:integer <- copy 1:literal) (2:integer <- copy 3:literal) (test1 1:integer 2:integer) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 1 2 3 3 4 ; test1's temporaries 4 1 5 3)) (prn "F - 'arg' accesses in order the operands of the most recent function call (the caller)")) ;? (quit) (reset) (new-trace "new-fn-arg-random-access") ;? (set dump-trace*) (add-code '((function test1 [ (5:integer <- input 1:literal) (4:integer <- input 0:literal) (3:integer <- add 4:integer 5:integer) (reply) (4:integer <- copy 34:literal) ; should never run ]) (function main [ (1:integer <- copy 1:literal) (2:integer <- copy 3:literal) (test1 1:integer 2:integer) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 1 2 3 3 4 ; test's temporaries 4 1 5 3)) (prn "F - 'arg' with index can access function call arguments out of order")) ;? (quit) (reset) (new-trace "new-fn-arg-random-then-sequential") ;? (set dump-trace*) (add-code '((function test1 [ (_ <- input 1:literal) (1:integer <- next-input) ; takes next arg after index 1 ]) ; should never run (function main [ (test1 1:literal 2:literal 3:literal) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 3)) (prn "F - 'arg' with index resets index for later calls")) ;? (quit) (reset) (new-trace "new-fn-arg-status") (add-code '((function test1 [ (4:integer 5:boolean <- next-input) ]) (function main [ (test1 1:literal) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 4 1 5 t)) (prn "F - 'arg' sets a second oarg when arg exists")) ;? (quit) (reset) (new-trace "new-fn-arg-missing") (add-code '((function test1 [ (4:integer <- next-input) (5:integer <- next-input) ]) (function main [ (test1 1:literal) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 4 1)) (prn "F - missing 'arg' doesn't cause error")) ;? (quit) (reset) (new-trace "new-fn-arg-missing-2") (add-code '((function test1 [ (4:integer <- next-input) (5:integer 6:boolean <- next-input) ]) (function main [ (test1 1:literal) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 4 1 6 nil)) (prn "F - missing 'arg' wipes second oarg when provided")) ;? (quit) (reset) (new-trace "new-fn-arg-missing-3") (add-code '((function test1 [ (4:integer <- next-input) (5:integer <- copy 34:literal) (5:integer 6:boolean <- next-input) ]) (function main [ (test1 1:literal) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 4 1 6 nil)) (prn "F - missing 'arg' consistently wipes its oarg")) ;? (quit) (reset) (new-trace "new-fn-arg-missing-4") (add-code '((function test1 [ ; if given two args, adds them; if given one arg, increments (4:integer <- next-input) (5:integer 6:boolean <- next-input) { begin (break-if 6:boolean) (5:integer <- copy 1:literal) } (7:integer <- add 4:integer 5:integer) ]) (function main [ (test1 34:literal) ]))) ;? (set dump-trace*) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 4 34 5 1 6 nil 7 35)) (prn "F - function with optional second arg")) ;? (quit) (reset) (new-trace "new-fn-arg-by-value") (add-code '((function test1 [ (1:integer <- copy 0:literal) ; overwrite caller memory (2:integer <- next-input) ]) ; arg not clobbered (function main [ (1:integer <- copy 34:literal) (test1 1:integer) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 0 2 34)) (prn "F - 'arg' passes by value")) (reset) (new-trace "arg-record") (add-code '((function test1 [ (4:integer-boolean-pair <- next-input) ]) (function main [ (1:integer <- copy 34:literal) (2:boolean <- copy nil:literal) (test1 1:integer-boolean-pair) ]))) (run 'main) (when (~iso memory* (obj 1 34 2 nil 4 34 5 nil)) (prn "F - 'arg' can copy records spanning multiple locations")) (reset) (new-trace "arg-record-indirect") ;? (set dump-trace*) (add-code '((function test1 [ (4:integer-boolean-pair <- next-input) ]) (function main [ (1:integer <- copy 34:literal) (2:boolean <- copy nil:literal) (3:integer-boolean-pair-address <- copy 1:literal) (test1 3:integer-boolean-pair-address/deref) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 34 2 nil 3 1 4 34 5 nil)) (prn "F - 'arg' can copy records spanning multiple locations in indirect mode")) (reset) (new-trace "new-fn-reply-oarg") (add-code '((function test1 [ (4:integer <- next-input) (5:integer <- next-input) (6:integer <- add 4:integer 5:integer) (reply 6:integer) (4:integer <- copy 34:literal) ]) (function main [ (1:integer <- copy 1:literal) (2:integer <- copy 3:literal) (3:integer <- test1 1:integer 2:integer) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 1 2 3 3 4 ; test1's temporaries 4 1 5 3 6 4)) (prn "F - 'reply' can take aguments that are returned, or written back into output args of caller")) (reset) (new-trace "new-fn-reply-oarg-multiple") (add-code '((function test1 [ (4:integer <- next-input) (5:integer <- next-input) (6:integer <- add 4:integer 5:integer) (reply 6:integer 5:integer) (4:integer <- copy 34:literal) ]) (function main [ (1:integer <- copy 1:literal) (2:integer <- copy 3:literal) (3:integer 7:integer <- test1 1:integer 2:integer) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 1 2 3 3 4 7 3 ; test1's temporaries 4 1 5 3 6 4)) (prn "F - 'reply' permits a function to return multiple values at once")) ; 'prepare-reply' is useful for doing cleanup before exiting a function (reset) (new-trace "new-fn-prepare-reply") (add-code '((function test1 [ (4:integer <- next-input) (5:integer <- next-input) (6:integer <- add 4:integer 5:integer) (prepare-reply 6:integer 5:integer) (reply) (4:integer <- copy 34:literal) ]) (function main [ (1:integer <- copy 1:literal) (2:integer <- copy 3:literal) (3:integer 7:integer <- test1 1:integer 2:integer) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 1 2 3 3 4 7 3 ; test1's temporaries 4 1 5 3 6 4)) (prn "F - without args, 'reply' returns values from previous 'prepare-reply'.")) ; When you have arguments that are both read from and written to, include them ; redundantly in both ingredients and results. That'll help tools track what ; changed. ; To enforce that the result and ingredient must always match, use the ; 'same-as-arg' property. Results with 'same-as-arg' properties should only be ; copied to a caller output arg identical to the specified caller arg. (reset) (new-trace "new-fn-same-as-arg") (add-code '((function test1 [ ; increment the contents of an address (default-space:space-address <- new space:literal 2:literal) (x:integer-address <- next-input) (x:integer-address/deref <- add x:integer-address/deref 1:literal) (reply x:integer-address/same-as-arg:0) ]) (function main [ (2:integer-address <- new integer:literal) (2:integer-address/deref <- copy 0:literal) (3:integer-address <- test1 2:integer-address) ]))) (run 'main) (let routine (car completed-routines*) ;? (prn rep.routine!error) ;? 1 (when (no rep.routine!error) (prn "F - 'same-as-arg' results must be identical to a given input"))) ;? (quit) ;? 2 ) ; section 20 (section 11 ;; Structured programming ; ; Our jump operators are quite inconvenient to use, so mu provides a ; lightweight tool called 'convert-braces' to work in a slightly more ; convenient format with nested braces: ; ; { ; some instructions ; { ; more instructions ; } ; } ; ; Braces are like labels in assembly language, they require no special ; parsing. The operations 'loop' and 'break' jump to just after the enclosing ; '{' and '}' respectively. ; ; Conditional and unconditional 'loop' and 'break' should give us 80% of the ; benefits of the control-flow primitives we're used to in other languages, ; like 'if', 'while', 'for', etc. ; ; Compare 'unquoted blocks' using {} with 'quoted blocks' using [] that we've ; gotten used to seeing. Quoted blocks are used by top-level instructions to ; provide code without running it. (reset) (new-trace "convert-braces") (= traces* (queue)) ;? (= dump-trace* (obj whitelist '("c{0" "c{1"))) (when (~iso (convert-braces '((((1 integer)) <- ((copy)) ((0 literal))) (((2 integer)) <- ((copy)) ((0 literal))) (((3 integer)) <- ((copy)) ((0 literal))) { begin ; 'begin' is just a hack because racket turns braces into parens (((4 boolean)) <- ((not-equal)) ((1 integer)) ((3 integer))) (((break-if)) ((4 boolean))) (((5 integer)) <- ((copy)) ((0 literal))) } (((reply))))) '((((1 integer)) <- ((copy)) ((0 literal))) (((2 integer)) <- ((copy)) ((0 literal))) (((3 integer)) <- ((copy)) ((0 literal))) (((4 boolean)) <- ((not-equal)) ((1 integer)) ((3 integer))) (((jump-if)) ((4 boolean)) ((1 offset))) (((5 integer)) <- ((copy)) ((0 literal))) (((reply))))) (prn "F - convert-braces replaces break-if with a jump-if to after the next close-brace")) ;? (quit) (reset) (new-trace "convert-braces-empty-block") (= traces* (queue)) ;? (= dump-trace* (obj whitelist '("c{0" "c{1"))) (when (~iso (convert-braces '((((1 integer)) <- ((copy)) ((0 literal))) (((2 integer)) <- ((copy)) ((0 literal))) (((3 integer)) <- ((copy)) ((0 literal))) { begin (((break))) } (((reply))))) '((((1 integer)) <- ((copy)) ((0 literal))) (((2 integer)) <- ((copy)) ((0 literal))) (((3 integer)) <- ((copy)) ((0 literal))) (((jump)) ((0 offset))) (((reply))))) (prn "F - convert-braces works for degenerate blocks")) ;? (quit) (reset) (new-trace "convert-braces-nested-break") (= traces* (queue)) (when (~iso (convert-braces '((((1 integer)) <- ((copy)) ((0 literal))) (((2 integer)) <- ((copy)) ((0 literal))) (((3 integer)) <- ((copy)) ((0 literal))) { begin (((4 boolean)) <- ((not-equal)) ((1 integer)) ((3 integer))) (((break-if)) ((4 boolean))) { begin (((5 integer)) <- ((copy)) ((0 literal))) } } (((reply))))) '((((1 integer)) <- ((copy)) ((0 literal))) (((2 integer)) <- ((copy)) ((0 literal))) (((3 integer)) <- ((copy)) ((0 literal))) (((4 boolean)) <- ((not-equal)) ((1 integer)) ((3 integer))) (((jump-if)) ((4 boolean)) ((1 offset))) (((5 integer)) <- ((copy)) ((0 literal))) (((reply))))) (prn "F - convert-braces balances braces when converting break")) (reset) (new-trace "convert-braces-repeated-jump") (= traces* (queue)) ;? (= dump-trace* (obj whitelist '("c{0" "c{1"))) (when (~iso (convert-braces '((((1 integer)) <- ((copy)) ((0 literal))) { begin (((break))) (((2 integer)) <- ((copy)) ((0 literal))) } { begin (((break))) (((3 integer)) <- ((copy)) ((0 literal))) } (((4 integer)) <- ((copy)) ((0 literal))))) '((((1 integer)) <- ((copy)) ((0 literal))) (((jump)) ((1 offset))) (((2 integer)) <- ((copy)) ((0 literal))) (((jump)) ((1 offset))) (((3 integer)) <- ((copy)) ((0 literal))) (((4 integer)) <- ((copy)) ((0 literal))))) (prn "F - convert-braces handles jumps on jumps")) ;? (quit) (reset) (new-trace "convert-braces-nested-loop") (= traces* (queue)) (when (~iso (convert-braces '((((1 integer)) <- ((copy)) ((0 literal))) (((2 integer)) <- ((copy)) ((0 literal))) { begin (((3 integer)) <- ((copy)) ((0 literal))) { begin (((4 boolean)) <- ((not-equal)) ((1 integer)) ((3 integer))) } (((loop-if)) ((4 boolean))) (((5 integer)) <- ((copy)) ((0 literal))) } (((reply))))) '((((1 integer)) <- ((copy)) ((0 literal))) (((2 integer)) <- ((copy)) ((0 literal))) (((3 integer)) <- ((copy)) ((0 literal))) (((4 boolean)) <- ((not-equal)) ((1 integer)) ((3 integer))) (((jump-if)) ((4 boolean)) ((-3 offset))) (((5 integer)) <- ((copyspan class="o">:bool [ local-scope load-inputs s:text, is-atom?:bool <- maybe-convert *x, atom:variant return-unless is-atom?, false result <- equal pat, s ] scenario atom-match [ local-scope x:&:cell <- new-atom [abc] 10:bool/raw <- atom-match? x, [abc] memory-should-contain [ 10 <- 1 ] ] def first x:&:cell -> result:&:cell [ local-scope load-inputs pair:pair, pair?:bool <- maybe-convert *x, pair:variant return-unless pair?, null result <- get pair, first:offset ] def rest x:&:cell -> result:&:cell [ local-scope load-inputs pair:pair, pair?:bool <- maybe-convert *x, pair:variant return-unless pair?, null result <- get pair, rest:offset ] def set-first base:&:cell, new-first:&:cell -> base:&:cell [ local-scope load-inputs pair:pair, is-pair?:bool <- maybe-convert *base, pair:variant return-unless is-pair? pair <- put pair, first:offset, new-first *base <- merge 1/pair, pair ] def set-rest base:&:cell, new-rest:&:cell -> base:&:cell [ local-scope load-inputs pair:pair, is-pair?:bool <- maybe-convert *base, pair:variant return-unless is-pair? pair <- put pair, rest:offset, new-rest *base <- merge 1/pair, pair ] scenario cell-operations-on-atom [ local-scope s:text <- new [a] x:&:cell <- new-atom s 10:&:cell/raw <- first x 11:&:cell/raw <- rest x memory-should-contain [ 10 <- 0 # first is nil 11 <- 0 # rest is nil ] ] scenario cell-operations-on-pair [ local-scope # construct (a . nil) s:text <- new [a] x:&:cell <- new-atom s y:&:cell <- new-pair x, null x2:&:cell <- first y 10:bool/raw <- equal x, x2 11:&:cell/raw <- rest y memory-should-contain [ 10 <- 1 # first is correct 11 <- 0 # rest is nil ] ] ## convert lambda text to a tree of cells def parse in:text -> out:&:cell [ local-scope load-inputs s:&:stream:char <- new-stream in out, s <- parse s trace 2, [app/parse], out ] def parse in:&:stream:char -> out:&:cell, in:&:stream:char [ local-scope load-inputs # skip whitespace in <- skip-whitespace in c:char, eof?:bool <- peek in return-if eof?, null pair?:bool <- equal c, 40/open-paren { break-if pair? # atom buf:&:buffer:char <- new-buffer 30 { done?:bool <- end-of-stream? in break-if done? # stop before close paren or space c:char <- peek in done? <- equal c, 41/close-paren break-if done? done? <- space? c break-if done? c <- read in buf <- append buf, c loop } s:text <- buffer-to-array buf out <- new-atom s } { break-unless pair? # pair read in # skip the open-paren out <- new cell:type # start out with nil # read in first element of pair { end?:bool <- end-of-stream? in not-end?:bool <- not end? assert not-end?, [unbalanced '(' in expression] c <- peek in close-paren?:bool <- equal c, 41/close-paren break-if close-paren? first:&:cell, in <- parse in *out <- merge 1/pair, first, null } # read in any remaining elements curr:&:cell <- copy out { in <- skip-whitespace in end?:bool <- end-of-stream? in not-end?:bool <- not end? assert not-end?, [unbalanced '(' in expression] # termination check: ')' c <- peek in { close-paren?:bool <- equal c, 41/close-paren break-unless close-paren? read in # skip ')' break +end-pair } # still here? read next element of pair next:&:cell, in <- parse in is-dot?:bool <- atom-match? next, [.] { break-if is-dot? next-curr:&:cell <- new-pair next, null curr <- set-rest curr, next-curr curr <- rest curr } { break-unless is-dot? # deal with dotted pair in <- skip-whitespace in c <- peek in not-close-paren?:bool <- not-equal c, 41/close-paren assert not-close-paren?, [')' cannot immediately follow '.'] final:&:cell <- parse in curr <- set-rest curr, final # we're not gonna update curr, so better make sure the next iteration # is going to end the pair in <- skip-whitespace in c <- peek in close-paren?:bool <- equal c, 41/close-paren assert close-paren?, ['.' must be followed by exactly one expression before ')'] } loop } +end-pair } ] def skip-whitespace in:&:stream:char -> in:&:stream:char [ local-scope load-inputs { done?:bool <- end-of-stream? in return-if done?, null c:char <- peek in space?:bool <- space? c break-unless space? read in # skip loop } ] def to-text x:&:cell -> out:text [ local-scope load-inputs buf:&:buffer:char <- new-buffer 30 buf <- to-buffer x, buf out <- buffer-to-array buf ] def to-buffer x:&:cell, buf:&:buffer:char -> buf:&:buffer:char [ local-scope load-inputs # base case: empty cell { break-if x buf <- append buf, [<>] return } # base case: atom { s:text, atom?:bool <- maybe-convert *x, atom:variant break-unless atom? buf <- append buf, s return } # recursive case: pair buf <- append buf, [< ] first:&:cell <- first x buf <- to-buffer first, buf buf <- append buf, [ | ] rest:&:cell <- rest x buf <- to-buffer rest, buf buf <- append buf, [ >] ] scenario parse-single-letter-atom [ local-scope s:text <- new [a] x:&:cell <- parse s s2:text, 10:bool/raw <- maybe-convert *x, atom:variant 11:@:char/raw <- copy *s2 memory-should-contain [ 10 <- 1 # parse result is an atom 11:array:character <- [a] ] ] scenario parse-atom [ local-scope s:text <- new [abc] x:&:cell <- parse s s2:text, 10:bool/raw <- maybe-convert *x, atom:variant 11:@:char/raw <- copy *s2 memory-should-contain [ 10 <- 1 # parse result is an atom 11:array:character <- [abc] ] ] scenario parse-list-of-two-atoms [ local-scope s:text <- new [(abc def)] x:&:cell <- parse s trace-should-contain [ app/parse: < abc | < def | <> > > ] 10:bool/raw <- is-pair? x x1:&:cell <- first x x2:&:cell <- rest x s1:text, 11:bool/raw <- maybe-convert *x1, atom:variant 12:bool/raw <- is-pair? x2 x3:&:cell <- first x2 s2:text, 13:bool/raw <- maybe-convert *x3, atom:variant 14:&:cell/raw <- rest x2 20:@:char/raw <- copy *s1 30:@:char/raw <- copy *s2 memory-should-contain [ 10 <- 1 # parse result is a pair 11 <- 1 # result.first is an atom 12 <- 1 # result.rest is a pair 13 <- 1 # result.rest.first is an atom 14 <- 0 # result.rest.rest is nil 20:array:character <- [abc] # result.first 30:array:character <- [def] # result.rest.first ] ] scenario parse-list-with-extra-spaces [ local-scope s:text <- new [ ( abc def ) ] # extra spaces x:&:cell <- parse s trace-should-contain [ app/parse: < abc | < def | <> > > ] 10:bool/raw <- is-pair? x x1:&:cell <- first x x2:&:cell <- rest x s1:text, 11:bool/raw <- maybe-convert *x1, atom:variant 12:bool/raw <- is-pair? x2 x3:&:cell <- first x2 s2:text, 13:bool/raw <- maybe-convert *x3, atom:variant 14:&:cell/raw <- rest x2 20:@:char/raw <- copy *s1 30:@:char/raw <- copy *s2 memory-should-contain [ 10 <- 1 # parse result is a pair 11 <- 1 # result.first is an atom 12 <- 1 # result.rest is a pair 13 <- 1 # result.rest.first is an atom 14 <- 0 # result.rest.rest is nil 20:array:character <- [abc] # result.first 30:array:character <- [def] # result.rest.first ] ] scenario parse-list-of-more-than-two-atoms [ local-scope s:text <- new [(abc def ghi)] x:&:cell <- parse s trace-should-contain [ app/parse: < abc | < def | < ghi | <> > > > ] 10:bool/raw <- is-pair? x x1:&:cell <- first x x2:&:cell <- rest x s1:text, 11:bool/raw <- maybe-convert *x1, atom:variant 12:bool/raw <- is-pair? x2 x3:&:cell <- first x2 s2:text, 13:bool/raw <- maybe-convert *x3, atom:variant x4:&:cell <- rest x2 14:bool/raw <- is-pair? x4 x5:&:cell <- first x4 s3:text, 15:bool/raw <- maybe-convert *x5, atom:variant 16:&:cell/raw <- rest x4 20:@:char/raw <- copy *s1 30:@:char/raw <- copy *s2 40:@:char/raw <- copy *s3 memory-should-contain [ 10 <- 1 # parse result is a pair 11 <- 1 # result.first is an atom 12 <- 1 # result.rest is a pair 13 <- 1 # result.rest.first is an atom 14 <- 1 # result.rest.rest is a pair 15 <- 1 # result.rest.rest.first is an atom 16 <- 0 # result.rest.rest.rest is nil 20:array:character <- [abc] # result.first 30:array:character <- [def] # result.rest.first 40:array:character <- [ghi] # result.rest.rest ] ] scenario parse-nested-list [ local-scope s:text <- new [((abc))] x:&:cell <- parse s trace-should-contain [ app/parse: < < abc | <> > | <> > ] 10:bool/raw <- is-pair? x x1:&:cell <- first x 11:bool/raw <- is-pair? x x2:&:cell <- first x1 s1:text, 12:bool/raw <- maybe-convert *x2, atom:variant 13:&:cell/raw <- rest x1 14:&:cell/raw <- rest x 20:@:char/raw <- copy *s1 memory-should-contain [ 10 <- 1 # parse result is a pair 11 <- 1 # result.first is a pair 12 <- 1 # result.first.first is an atom 13 <- 0 # result.first.rest is nil 14 <- 0 # result.rest is nil 20:array:character <- [abc] # result.first.first ] ] scenario parse-nested-list-2 [ local-scope s:text <- new [((abc) def)] x:&:cell <- parse s trace-should-contain [ app/parse: < < abc | <> > | < def | <> > > ] 10:bool/raw <- is-pair? x x1:&:cell <- first x 11:bool/raw <- is-pair? x x2:&:cell <- first x1 s1:text, 12:bool/raw <- maybe-convert *x2, atom:variant 13:&:cell/raw <- rest x1 x3:&:cell <- rest x x4:&:cell <- first x3 s2:text, 14:bool/raw <- maybe-convert *x4, atom:variant 15:&:cell/raw <- rest x3 20:@:char/raw <- copy *s1 30:@:char/raw <- copy *s2 memory-should-contain [ 10 <- 1 # parse result is a pair 11 <- 1 # result.first is a pair 12 <- 1 # result.first.first is an atom 13 <- 0 # result.first.rest is nil 14 <- 1 # result.rest.first is an atom 15 <- 0 # result.rest.rest is nil 20:array:character <- [abc] # result.first.first 30:array:character <- [def] # result.rest.first ] ] # todo: uncomment these tests after we figure out how to continue tests after # assertion failures #? scenario parse-error [ #? local-scope #? s:text <- new [(] #? #? hide-errors #? x:&:cell <- parse s #? #? show-errors #? trace-should-contain [ #? error: unbalanced '(' in expression #? ] #? ] #? #? scenario parse-error-after-element [ #? local-scope #? s:text <- new [(abc] #? #? hide-errors #? x:&:cell <- parse s #? #? show-errors #? trace-should-contain [ #? error: unbalanced '(' in expression #? ] #? ] scenario parse-dotted-list-of-two-atoms [ local-scope s:text <- new [(abc . def)] x:&:cell <- parse s trace-should-contain [ app/parse: < abc | def > ] 10:bool/raw <- is-pair? x x1:&:cell <- first x x2:&:cell <- rest x s1:text, 11:bool/raw <- maybe-convert *x1, atom:variant s2:text, 12:bool/raw <- maybe-convert *x2, atom:variant 20:@:char/raw <- copy *s1 30:@:char/raw <- copy *s2 memory-should-contain [ # parses to < abc | def >