; Mu: An exploration on making the global structure of programs more accessible. ; ; "Is it a language, or an operating system, or a virtual machine? Mu." ; (with apologies to Robert Pirsig: http://en.wikipedia.org/wiki/Mu_%28negative%29#In_popular_culture) ; ;; Motivation ; ; I want to live in a world where I can have an itch to tweak a program, clone ; its open-source repository, orient myself on how it's organized, and make ; the simple change I envisioned, all in an afternoon. This codebase tries to ; make this possible for its readers. (More details: http://akkartik.name/about) ; ; What helps comprehend the global structure of programs? For starters, let's ; enumerate what doesn't: idiomatic code, adherence to a style guide or naming ; convention, consistent indentation, API documentation for each class, etc. ; These conventional considerations improve matters in the small, but don't ; help understand global organization. They help existing programmers manage ; day-to-day operations, but they can't turn outsider programmers into ; insiders. (Elaboration: http://akkartik.name/post/readable-bad) ; ; In my experience, two things have improved matters so far: version control ; and automated tests. Version control lets me rewind back to earlier, simpler ; times when the codebase was simpler, when its core skeleton was easier to ; ascertain. Indeed, arguably what came first is by definition the skeleton of ; a program, modulo major rewrites. Once you understand the skeleton, it ; becomes tractable to 'play back' later major features one by one. (Previous ; project that fleshed out this idea: http://akkartik.name/post/wart-layers) ; ; The second and biggest boost to comprehension comes from tests. Tests are ; good for writers for well-understood reasons: they avoid regressions, and ; they can influence code to be more decoupled and easier to change. In ; addition, tests are also good for the outsider reader because they permit ; active reading. If you can't build a program and run its tests it can't help ; you understand it. It hangs limp at best, and might even be actively ; misleading. If you can run its tests, however, it comes alive. You can step ; through scenarios in a debugger. You can add logging and scan logs to make ; sense of them. You can run what-if scenarios: "why is this line not written ; like this?" Make a change, rerun tests: "Oh, that's why." (Elaboration: ; http://akkartik.name/post/literate-programming) ; ; However, tests are only useful to the extent that they exist. Think back to ; your most recent codebase. Do you feel comfortable releasing a new version ; just because the tests pass? I'm not aware of any such project. There's just ; too many situations envisaged by the authors that were never encoded in a ; test. Even disciplined authors can't test for performance or race conditions ; or fault tolerance. If a line is phrased just so because of some subtle ; performance consideration, it's hard to communicate to newcomers. ; ; This isn't an arcane problem, and it isn't just a matter of altruism. As ; more and more such implicit considerations proliferate, and as the original ; authors are replaced by latecomers for day-to-day operations, knowledge is ; actively forgotten and lost. The once-pristine codebase turns into legacy ; code that is hard to modify without expensive and stress-inducing ; regressions. ; ; How to write tests for performance, fault tolerance, race conditions, etc.? ; How can we state and verify that a codepath doesn't ever perform memory ; allocation, or write to disk? It requires better, more observable primitives ; than we currently have. Modern operating systems have their roots in the ; 70s. Their interfaces were not designed to be testable. They provide no way ; to simulate a full disk, or a specific sequence of writes from different ; threads. We need something better. ; ; This project tries to move, groping, towards that 'something better', a ; platform that is both thoroughly tested and allows programs written for it ; to be thoroughly tested. It tries to answer the question: ; ; If Denis Ritchie and Ken Thompson were to set out today to co-design unix ; and C, knowing what we know about automated tests, what would they do ; differently? ; ; To try to impose *some* constraints on this gigantic yak-shave, we'll try to ; keep both language and OS as simple as possible, focused entirely on ; permitting more kinds of tests, on first *collecting* all the information ; about implicit considerations in some form so that readers and tools can ; have at least some hope of making sense of it. ; ; The initial language will be just assembly. We'll try to make it convenient ; to program in with some simple localized rewrite rules inspired by lisp ; macros and literate programming. Programmers will have to do their own ; memory management and register allocation, but we'll provide libraries to ; help with them. ; ; The initial OS will provide just memory management and concurrency ; primitives. No users or permissions (we don't live on mainframes anymore), ; no kernel- vs user-mode, no virtual memory or process abstraction, all ; threads sharing a single address space (use VMs for security and ; sandboxing). The only use case we care about is getting a test harness to ; run some code, feed it data through blocking channels, stop it and observe ; its internals. The code under test is expected to cooperate in such testing, ; by logging important events for the test harness to observe. (More info: ; http://akkartik.name/post/tracing-tests) ; ; The common thread here is elimination of abstractions, and it's not an ; accident. Abstractions help insiders manage the evolution of a codebase, but ; they actively hinder outsiders in understanding it from scratch. This ; matters, because the funnel to turn outsiders into insiders is critical to ; the long-term life of a codebase. Perhaps authors should raise their ; estimation of the costs of abstraction, and go against their instincts for ; introducing it. That's what I'll be trying to do: question every abstraction ; before I introduce it. We'll see how it goes. ; --- ;; Getting started ; ; Mu is currently built atop Racket and Arc, but this is temporary and ; contingent. We want to keep our options open, whether to port to a different ; host language, and easy to rewrite to native code for any platform. So we'll ; try to avoid 'cheating': relying on the host platform for advanced ; functionality. ; ; Other than that, we'll say no more about the code, and focus in the rest of ; this file on the scenarios the code cares about. (selective-load "mu.arc" section-level) (ero "running tests in mu.ar.c.t (takes ~30s)") ;? (quit) (set allow-raw-addresses*) (section 20 ; Our language is assembly-like in that functions consist of series of ; statements, and statements consist of an operation and its arguments (input ; and output). ; ; oarg1, oarg2, ... <- op arg1, arg2, ... ; ; Args must be atomic, like an integer or a memory address, they can't be ; expressions doing arithmetic or function calls. But we can have any number ; of them. ; ; Since we're building on lisp, our code samples won't look quite like the ; idealized syntax above. For now they will look like this: ; ; (function f [ ; (oarg1 oarg2 ... <- op arg1 arg2 ...) ; ... ; ... ; ]) ; ; Each arg/oarg can contain metadata separated by slashes and colons. In this ; first example below, the only metadata is types: 'integer' for a memory ; location containing an integer, and 'literal' for a value included directly ; in code. (Assembly languages traditionally call them 'immediate' operands.) ; In the future a simple tool will check that the types line up as expected in ; each op. A different tool might add types where they aren't provided. ; Instead of a monolithic compiler I want to build simple, lightweight tools ; that can be combined in various ways, say for using different typecheckers ; in different subsystems. ; ; In our tests we'll define such mu functions using a call to 'add-code', so ; look for it when reading the code examples. Everything outside 'add-code' is ; just test-harness details that can be skipped at first. (reset) ;? (set dump-trace*) (new-trace "literal") (add-code '((function main [ (1:integer <- copy 23:literal) ]))) ;? (set dump-trace*) (run 'main) ;? (prn memory*) (when (~is memory*.1 23) (prn "F - 'copy' writes its lone 'arg' after the instruction name to its lone 'oarg' or output arg before the arrow. After this test, the value 23 is stored in memory address 1.")) ;? (reset) ;? 2 ;? (quit) ;? 2 ; Our basic arithmetic ops can operate on memory locations or literals. ; (Ignore hardware details like registers for now.) (reset) (new-trace "add") (add-code '((function main [ (1:integer <- copy 1:literal) (2:integer <- copy 3:literal) (3:integer <- add 1:integer 2:integer) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 1 2 3 3 4)) (prn "F - 'add' operates on two addresses")) ;? (reset) ;? 1 ;? (quit) ;? 1 (reset) (new-trace "add-literal") (add-code '((function main [ (1:integer <- add 2:literal 3:literal) ]))) (run 'main) (when (~is memory*.1 5) (prn "F - ops can take 'literal' operands (but not return them)")) (reset) (new-trace "sub-literal") (add-code '((function main [ (1:integer <- subtract 1:literal 3:literal) ]))) (run 'main) ;? (prn memory*) (when (~is memory*.1 -2) (prn "F - 'subtract'")) (reset) (new-trace "mul-literal") (add-code '((function main [ (1:integer <- multiply 2:literal 3:literal) ]))) (run 'main) ;? (prn memory*) (when (~is memory*.1 6) (prn "F - 'multiply'")) (reset) (new-trace "div-literal") (add-code '((function main [ (1:integer <- divide 8:literal 3:literal) ]))) (run 'main) ;? (prn memory*) (when (~is memory*.1 (/ real.8 3)) (prn "F - 'divide'")) (reset) (new-trace "idiv-literal") (add-code '((function main [ (1:integer 2:integer <- divide-with-remainder 23:literal 6:literal) ]))) (run 'main) ;? (prn memory*) (when (~iso memory* (obj 1 3 2 5)) (prn "F - 'divide-with-remainder' performs integer division")) (reset) (new-trace "dummy-oarg") ;? (set dump-trace*) (add-code '((function main [ (_ 2:integer <- divide-with-remainder 23:literal 6:literal) ]))) (run 'main) (when (~iso memory* (obj 2 5)) (prn "F - '_' oarg can ignore some results")) ;? (quit) ; Basic boolean operations: and, or, not ; There are easy ways to encode booleans in binary, but we'll skip past those ; details for now. (reset) (new-trace "and-literal") (add-code '((function main [ (1:boolean <- and t:literal nil:literal) ]))) ;? (set dump-trace*) (run 'main) ;? (prn memory*) (when (~is
parse/0: instruction: 30
parse/0: ingredient: {name: "location", value: 0, type: 0, properties: ["location": "type"]}
parse/0: ingredient: {name: "30", value: 0, type: 0, properties: ["30": "literal"]}
parse/0: product: {name: "default-space", value: 0, type: 2-0, properties: ["default-space": "address":"space"]}
parse/0: instruction: 30
parse/0: ingredient: {name: "abc", value: 0, type: 0, properties: ["abc": "literal-string"]}
parse/0: product: {name: "x", value: 0, type: 2-5-8, properties: ["x": "address":"array":"character"]}
parse/0: instruction: 32
parse/0: ingredient: {name: "x", value: 0, type: 2-5-8, properties: ["x": "address":"array":"character"]}
parse/0: ingredient: {name: "x", value: 0, type: 2-5-8, properties: ["x": "address":"array":"character"]}
parse/0: product: {name: "3", value: 0, type: 3, properties: ["3": "boolean", "raw": ]}
new/0: location -> 1
new/0: abc -> 0
name/0: assign x 1
after-brace/0: recipe test-string-equal-reflexive
after-brace/0: new ...
after-brace/0: new ...
after-brace/0: string-equal ...
run/0: instruction test-string-equal-reflexive/0
mem/0: array size is 30
run/0: instruction test-string-equal-reflexive/1
mem/0: storing 1030 in location 1002
run/0: instruction test-string-equal-reflexive/2
mem/0: location 1002 is 1030
mem/0: location 1002 is 1030
run/0: instruction string-equal/0
mem/0: array size is 30
run/0: instruction string-equal/1
run/0: product 0 is 1030
mem/0: storing 1030 in location 1036
run/0: instruction string-equal/2
mem/0: location 1036 is 1030
mem/0: storing 3 in location 1037
run/0: instruction string-equal/3
run/0: product 0 is 1030
mem/0: storing 1030 in location 1038
run/0: instruction string-equal/4
mem/0: location 1038 is 1030
mem/0: storing 3 in location 1039
run/0: instruction string-equal/6
run/0: ingredient 0 is a-len
mem/0: location 1037 is 3
run/0: ingredient 1 is b-len
mem/0: location 1039 is 3
run/0: product 0 is 1
mem/0: storing 1 in location 1040
run/0: instruction string-equal/7
mem/0: location 1040 is 1
run/0: ingredient 0 is 1
run/0: ingredient 1 is
run/0: jumping to instruction 9
run/0: instruction string-equal/10
run/0: ingredient 0 is 0
mem/0: storing 0 in location 1041
run/0: instruction string-equal/12
run/0: ingredient 0 is i
mem/0: location 1041 is 0
run/0: ingredient 1 is a-len
mem/0: location 1037 is 3
run/0: product 0 is 0
mem/0: storing 0 in location 1042
run/0: instruction string-equal/13
mem/0: location 1042 is 0
run/0: ingredient 0 is 0
run/0: jump-if fell through
run/0: instruction string-equal/14
run/0: ingredient 0 is {name: "a", value: 1, type: 2-5-8, properties: ["a": "address":"array":"character", "deref": ]}
mem/0: location 1036 is 1030
run/0: ingredient 1 is {name: "i", value: 6, type: 1, properties: ["i": "integer"]}
mem/0: location 1041 is 0
run/0: address to copy is 1031
run/0: its type is 8
mem/0: location 1031 is 97
run/0: product 0 is 97
mem/0: storing 97 in location 1043
run/0: instruction string-equal/15
run/0: ingredient 0 is {name: "b", value: 3, type: 2-5-8, properties: ["b": "address":"array":"character", "deref": ]}
mem/0: location 1038 is 1030
run/0: ingredient 1 is {name: "i", value: 6, type: 1, properties: ["i": "integer"]}
mem/0: location 1041 is 0
run/0: address to copy is 1031
run/0: its type is 8
mem/0: location 1031 is 97
run/0: product 0 is 97
mem/0: storing 97 in location 1044
run/0: instruction string-equal/17
run/0: ingredient 0 is a2
mem/0: location 1043 is 97
run/0: ingredient 1 is b2
mem/0: location 1044 is 97
run/0: product 0 is 1
mem/0: storing 1 in location 1045
run/0: instruction string-equal/18
mem/0: location 1045 is 1
run/0: ingredient 0 is 1
run/0: ingredient 1 is
run/0: jumping to instruction 20
run/0: instruction string-equal/21
run/0: ingredient 0 is i
mem/0: location 1041 is 0
run/0: ingredient 1 is 1
run/0: product 0 is 1
mem/0: storing 1 in location 1041
run/0: instruction string-equal/22
run/0: ingredient 0 is -11
run/0: pc now 11
run/0: instruction string-equal/12
run/0: ingredient 0 is i
mem/0: location 1041 is 1
run/0: ingredient 1 is a-len
mem/0: location 1037 is 3
run/0: product 0 is 0
mem/0: storing 0 in location 1042
run/0: instruction string-equal/13
mem/0: location 1042 is 0
run/0: ingredient 0 is 0
run/0: jump-if fell through
run/0: instruction string-equal/14
run/0: ingredient 0 is {name: "a", value: 1, type: 2-5-8, properties: ["a": "address":"array":"character", "deref": ]}
mem/0: location 1036 is 1030
run/0: ingredient 1 is {name: "i", value: 6, type: 1, properties: ["i": "integer"]}
mem/0: location 1041 is 1
run/0: address to copy is 1032
run/0: its type is 8
mem/0: location 1032 is 98
run/0: product 0 is 98
mem/0: storing 98 in location 1043
run/0: instruction string-equal/15
run/0: ingredient 0 is {name: "b", value: 3, type: 2-5-8, properties: ["b": "address":"array":"character", "deref": ]}
mem/0: location 1038 is 1030
run/0: ingredient 1 is {name: "i", value: 6, type: 1, properties: ["i": "integer"]}
mem/0: location 1041 is 1
run/0: address to copy is 1032
run/0: its type is 8
mem/0: location 1032 is 98
run/0: product 0 is 98
mem/0: storing 98 in location 1044
run/0: instruction string-equal/17
run/0: ingredient 0 is a2
mem/0: location 1043 is 98
run/0: ingredient 1 is b2
mem/0: location 1044 is 98
run/0: product 0 is 1
mem/0: storing 1 in location 1045
run/0: instruction string-equal/18
mem/0: location 1045 is 1
run/0: ingredient 0 is 1
run/0: ingredient 1 is
run/0: jumping to instruction 20
run/0: instruction string-equal/21
run/0: ingredient 0 is i
mem/0: location 1041 is 1
run/0: ingredient 1 is 1
run/0: product 0 is 2
mem/0: storing 2 in location 1041
run/0: instruction string-equal/22
run/0: ingredient 0 is -11
run/0: pc now 11
run/0: instruction string-equal/12
run/0: ingredient 0 is i
mem/0: location 1041 is 2
run/0: ingredient 1 is a-len
mem/0: location 1037 is 3
run/0: product 0 is 0
mem/0: storing 0 in location 1042
run/0: instruction string-equal/13
mem/0: location 1042 is 0
run/0: ingredient 0 is 0
run/0: jump-if fell through
run/0: instruction string-equal/14
run/0: ingredient 0 is {name: "a", value: 1, type: 2-5-8, properties: ["a": "address":"array":"character", "deref": ]}
mem/0: location 1036 is 1030
run/0: ingredient 1 is {name: "i", value: 6, type: 1, properties: ["i": "integer"]}
mem/0: location 1041 is 2
run/0: address to copy is 1033
run/0: its type is 8
mem/0: location 1033 is 99
run/0: product 0 is 99
mem/0: storing 99 in location 1043
run/0: instruction string-equal/15
run/0: ingredient 0 is {name: "b", value: 3, type: 2-5-8, properties: ["b": "address":"array":"character", "deref": ]}
mem/0: location 1038 is 1030
run/0: ingredient 1 is {name: "i", value: 6, type: 1, properties: ["i": "integer"]}
mem/0: location 1041 is 2
run/0: address to copy is 1033
run/0: its type is 8
mem/0: location 1033 is 99
run/0: product 0 is 99
mem/0: storing 99 in location 1044
run/0: instruction string-equal/17
run/0: ingredient 0 is a2
mem/0: location 1043 is 99
run/0: ingredient 1 is b2
mem/0: location 1044 is 99
run/0: product 0 is 1
mem/0: storing 1 in location 1045
run/0: instruction string-equal/18
mem/0: location 1045 is 1
run/0: ingredient 0 is 1
run/0: ingredient 1 is
run/0: jumping to instruction 20
run/0: instruction string-equal/21
run/0: ingredient 0 is i
mem/0: location 1041 is 2
run/0: ingredient 1 is 1
run/0: product 0 is 3
mem/0: storing 3 in location 1041
run/0: instruction string-equal/22
run/0: ingredient 0 is -11
run/0: pc now 11
run/0: instruction string-equal/12
run/0: ingredient 0 is i
mem/0: location 1041 is 3
run/0: ingredient 1 is a-len
mem/0: location 1037 is 3
run/0: product 0 is 1
mem/0: storing 1 in location 1042
run/0: instruction string-equal/13
mem/0: location 1042 is 1
run/0: ingredient 0 is 1
run/0: ingredient 1 is
run/0: jumping to instruction 23
run/0: instruction string-equal/24
run/0: result 0 is 1
mem/0: storing 1 in location 3