Mu - 034address.cc

  1 //: Addresses help us spend less time copying data around.
  2 
  3 //: So far we've been operating on primitives like numbers and characters, and
  4 //: we've started combining these primitives together into larger logical
  5 //: units (containers or arrays) that may contain many different primitives at
  6 //: once. Containers and arrays can grow quite large in complex programs, and
  7 //: we'd like some way to efficiently share them between recipes without
  8 //: constantly having to make copies. Right now 'next-ingredient' and 'return'
  9 //: copy data across recipe boundaries. To avoid copying large quantities of
 10 //: data around, we'll use *addresses*. An address is a bookmark to some
 11 //: arbitrary quantity of data (the *payload*). It's a primitive, so it's as
 12 //: efficient to copy as a number. To read or modify the payload 'pointed to'
 13 //: by an address, we'll perform a *lookup*.
 14 //:
 15 //: The notion of 'lookup' isn't an instruction like 'add' or 'subtract'.
 16 //: Instead it's an operation that can be performed when reading any of the
 17 //: ingredients of an instruction, and when writing to any of the products. To
 18 //: write to the payload of an ingredient rather than its value, simply add
 19 //: the /lookup property to it. Modern computers provide efficient support for
 20 //: addresses and lookups, making this a realistic feature.
 21 //:
 22 //: To recap: an address is a bookmark to some potentially large payload, and
 23 //: you can replace any ingredient or product with a lookup to an address of
 24 //: the appropriate type. But how do we get addresses to begin with? That
 25 //: requires a little more explanation. Once we introduce the notion of
 26 //: bookmarks to data, we have to think about the life cycle of a piece of
 27 //: data and its bookmarks (because remember, bookmarks can be copied around
 28 //: just like anything else). Otherwise several bad outcomes can result (and
 29 //: indeed *have* resulted in past languages like C):
 30 //:
 31 //:   a) You can run out of memory if you don't have a way to reclaim
 32 //:   data.
 33 //:   b) If you allow data to be reclaimed, you have to be careful not to
 34 //:   leave any stale addresses pointing at it. Otherwise your program might
 35 //:   try to lookup such an address and find something unexpected. Such
 36 //:   problems can be very hard to track down, and they can also be exploited
 37 //:   to break into your computer over the network, etc.
 38 //:
 39 //: To avoid these problems, we introduce the notion of a *reference count* or
 40 //: refcount. The life cycle of a bit of data accessed through addresses looks
 41 //: like this.
 42 //:
 43 //:    We create space in computer memory for it using the 'new' instruction.
 44 //:    The 'new' instruction takes a type as an ingredient, allocates
 45 //:    sufficient space to hold that type, and returns an address (bookmark)
 46 //:    to the allocated space.
 47 //:
 48 //:      x:address:num <- new number:type
 49 //:
 50 //:                     +------------+
 51 //:          x -------> |  number    |
 52 //:                     +------------+
 53 //:
 54 //:    That isn't entirely accurate. Under the hood, 'new' allocates an extra
 55 //:    number -- the refcount:
 56 //:
 57 //:                     +------------+------------+
 58 //:          x -------> | refcount   |  number    |
 59 //:                     +------------+------------+
 60 //:
 61 //:    This probably seems like a waste of space. In practice it isn't worth
 62 //:    allocating individual numbers and our payload will tend to be larger,
 63 //:    so the picture would look more like this (zooming out a bit):
 64 //:
 65 //:                         +-------------------------+
 66 //:                     +---+                         |
 67 //:          x -------> | r |                         |
 68 //:                     +---+        DATA             |
 69 //:                         |                         |
 70 //:                         |                         |
 71 //:                         +-------------------------+
 72 //:
 73 //:    (Here 'r' denotes the refcount. It occupies a tiny amount of space
 74 //:    compared to the payload.)
 75 //:
 76 //:    Anyways, back to our example where the data is just a single number.
 77 //:    After the call to 'new', Mu's map of memory looks like this:
 78 //:
 79 //:                     +---+------------+
 80 //:          x -------> | 1 |  number    |
 81 //:                     +---+------------+
 82 //:
 83 //:    The refcount of 1 here indicates that this number has one bookmark
 84 //:    outstanding. If you then make a copy of x, the refcount increments:
 85 //:
 86 //:      y:address:num <- copy x
 87 //:
 88 //:          x ---+     +---+------------+
 89 //:               +---> | 2 |  number    |
 90 //:          y ---+     +---+------------+
 91 //:
 92 //:    Whether you access the payload through x or y, Mu knows how many
 93 //:    bookmarks are outstanding to it. When you change x or y, the refcount
 94 //:    transparently decrements:
 95 //:
 96 //:      x <- copy 0  # an address is just a number, you can always write 0 to it
 97 //:
 98 //:                     +---+------------+
 99 //:          y -------> | 1 |  number    |
100 //:                     +---+------------+
101 //:
102 //:    The final flourish is what happens when the refcount goes down to 0: Mu
103 //:    reclaims the space occupied by both refcount and payload in memory, and
104 //:    they're ready to be reused by later calls to 'new'.
105 //:
106 //:      y <- copy 0
107 //:
108 //:                     +---+------------+
109 //:                     | 0 |  XXXXXXX   |
110 //:                     +---+------------+
111 //:
112 //: Using refcounts fixes both our problems a) and b) above: you can use
113 //: memory for many different purposes as many times as you want without
114 //: running out of memory, and you don't have to worry about ever leaving a
115 //: dangling bookmark when you reclaim memory.
116 //:
117 //: This layer implements creating addresses using 'new'. The next few layers
118 //: will flesh out the rest of the life cycle.
119 
120 //: todo: give 'new' a custodian ingredient. Following malloc/free is a temporary hack.
121 
122 :(scenario new)
123 # call 'new' two times with identical types without modifying the results; you
124 # should get back different results
125 def main [
126   1:address:num/raw <- new number:type
127   2:address:num/raw <- new number:type
128   3:bool/raw <- equal 1:address:num/raw, 2:address:num/raw
129 ]
130 +mem: storing 0 in location 3
131 
132 :(scenario new_array)
133 # call 'new' with a second ingredient to allocate an array of some type rather than a single copy
134 def main [
135   1:address:array:num/raw <- new number:type, 5
136   2:address:num/raw <- new number:type
137   3:num/raw <- subtract 2:address:num/raw, 1:address:array:num/raw
138 ]
139 +run: {1: ("address" "array" "number"), "raw": ()} <- new {number: "type"}, {5: "literal"}
140 +mem: array length is 5
141 # don't forget the extra location for array length, and the second extra location for the refcount
142 +mem: storing 7 in location 3
143 
144 :(scenario dilated_reagent_with_new)
145 def main [
146   1:address:address:num <- new {(address number): type}
147 ]
148 +new: size of '(address number)' is 1
149 
150 //: 'new' takes a weird 'type' as its first ingredient; don't error on it
151 :(before "End Mu Types Initialization")
152 put(Type_ordinal, "type", 0);
153 :(code)
154 bool is_mu_type_literal(const reagent& r) {
155   return is_literal(r) && r.type && r.type->name == "type";
156 }
157 
158 :(before "End Primitive Recipe Declarations")
159 NEW,
160 :(before "End Primitive Recipe Numbers")
161 put(Recipe_ordinal, "new", NEW);
162 :(before "End Primitive Recipe Checks")
163 case NEW: {
164   const recipe& caller = get(Recipe, r);
165   if (inst.ingredients.empty() || SIZE(inst.ingredients) > 2) {
166   ¦ raise << maybe(caller.name) << "'new' requires one or two ingredients, but got '" << to_original_string(inst) << "'\n" << end();
167   ¦ break;
168   }
169   // End NEW Check Special-cases
170   const reagent& type = inst.ingredients.at(0);
171   if (!is_mu_type_literal(type)) {
172   ¦ raise << maybe(caller.name) << "first ingredient of 'new' should be a type, but got '" << type.original_string << "'\n" << end();
173   ¦ break;
174   }
175   if (SIZE(inst.ingredients) > 1 && !is_mu_number(inst.ingredients.at(1))) {
176   ¦ raise << maybe(caller.name) << "second ingredient of 'new' should be a number (array length), but got '" << type.original_string << "'\n" << end();
177   ¦ break;
178   }
179   if (inst.products.empty()) {
180   ¦ raise << maybe(caller.name) << "result of 'new' should never be ignored\n" << end();
181   ¦ break;
182   }
183   if (!product_of_new_is_valid(inst)) {
184   ¦ raise << maybe(caller.name) << "product of 'new' has incorrect type: '" << to_original_string(inst) << "'\n" << end();
185   ¦ break;
186   }
187   break;
188 }
189 :(code)
190 bool product_of_new_is_valid(const instruction& inst) {
191   reagent/*copy*/ product = inst.products.at(0);
192   // Update NEW product in Check
193   if (!product.type || product.type->atom || product.type->left->value != get(Type_ordinal, "address"))
194   ¦ return false;
195   drop_from_type(product, "address");
196   if (SIZE(inst.ingredients) > 1) {
197   ¦ // array allocation
198   ¦ if (!product.type || product.type->atom || product.type->left->value != get(Type_ordinal, "array"))
199   ¦ ¦ return false;
200   ¦ drop_from_type(product, "array");
201   }
202   reagent/*local*/ expected_product;
203   expected_product.type = new_type_tree(inst.ingredients.at(0).name);
204   return types_strictly_match(product, expected_product);
205 }
206 
207 void drop_from_type(reagent& r, string expected_type) {
208   assert(!r.type->atom);
209   if (r.type->left->name != expected_type) {
210   ¦ raise << "can't drop2 " << expected_type << " from '" << to_string(r) << "'\n" << end();
211   ¦ return;
212   }
213   // r.type = r.type->right
214   type_tree* tmp = r.type;
215   r.type = tmp->right;
216   tmp->right = NULL;
217   delete tmp;
218   // if (!r.type->right) r.type = r.type->left
219   assert(!r.type->atom);
220   if (r.type->right) return;
221   tmp = r.type;
222   r.type = tmp->left;
223   tmp->left = NULL;
224   delete tmp;
225 }
226 
227 :(scenario new_returns_incorrect_type)
228 % Hide_errors = true;
229 def main [
230   1:bool <- new num:type
231 ]
232 +error: main: product of 'new' has incorrect type: '1:bool <- new num:type'
233 
234 :(scenario new_discerns_singleton_list_from_atom_container)
235 % Hide_errors = true;
236 def main [
237   1:address:num/raw <- new {(num): type}  # should be '{num: type}'
238 ]
239 +error: main: product of 'new' has incorrect type: '1:address:num/raw <- new {(num): type}'
240 
241 :(scenario new_with_type_abbreviation)
242 def main [
243   1:address:num/raw <- new num:type
244 ]
245 $error: 0
246 
247 :(scenario new_with_type_abbreviation_inside_compound)
248 def main [
249   {1: (address address number), raw: ()} <- new {(& num): type}
250 ]
251 $error: 0
252 
253 //: To implement 'new', a Mu transform turns all 'new' instructions into
254 //: 'allocate' instructions that precompute the amount of memory they want to
255 //: allocate.
256 
257 //: Ensure that we never call 'allocate' directly, and that there's no 'new'
258 //: instructions left after the transforms have run.
259 :(before "End Primitive Recipe Checks")
260 case ALLOCATE: {
261   raise << "never call 'allocate' directly'; always use 'new'\n" << end();
262   break;
263 }
264 :(before "End Primitive Recipe Implementations")
265 case NEW: {
266   raise << "no implementation for 'new'; why wasn't it translated to 'allocate'? Please save a copy of your program and send it to Kartik.\n" << end();
267   break;
268 }
269 
270 :(after "Transform.push_back(check_instruction)")  // check_instruction will guard against direct 'allocate' instructions below
271 Transform.push_back(transform_new_to_allocate);  // idempotent
272 
273 :(code)
274 void transform_new_to_allocate(const recipe_ordinal r) {
275   trace(9991, "transform") << "--- convert 'new' to 'allocate' for recipe " << get(Recipe, r).name << end();
276   for (int i = 0;  i < SIZE(get(Recipe, r).steps);  ++i) {
277   ¦ instruction& inst = get(Recipe, r).steps.at(i);
278   ¦ // Convert 'new' To 'allocate'
279   ¦ if (inst.name == "new") {
280   ¦ ¦ if (inst.ingredients.empty()) return;  // error raised elsewhere
281   ¦ ¦ inst.operation = ALLOCATE;
282   ¦ ¦ type_tree* type = new_type_tree(inst.ingredients.at(0).name);
283   ¦ ¦ inst.ingredients.at(0).set_value(size_of(type));
284   ¦ ¦ trace(9992, "new") << "size of '" << inst.ingredients.at(0).name << "' is " << inst.ingredients.at(0).value << end();
285   ¦ ¦ delete type;
286   ¦ }
287   }
288 }
289 
290 //: implement 'allocate' based on size
291 
292 :(before "End Globals")
293 extern const int Reserved_for_tests = 1000;
294 int Memory_allocated_until = Reserved_for_tests;
295 int Initial_memory_per_routine = 100000;
296 :(before "End Reset")
297 Memory_allocated_until = Reserved_for_tests;
298 Initial_memory_per_routine = 100000;
299 :(before "End routine Fields")
300 int alloc, alloc_max;
301 :(before "End routine Constructor")
302 alloc = Memory_allocated_until;
303 Memory_allocated_until += Initial_memory_per_routine;
304 alloc_max = Memory_allocated_until;
305 trace(9999, "new") << "routine allocated memory from " << alloc << " to " << alloc_max << end();
306 
307 :(before "End Primitive Recipe Declarations")
308 ALLOCATE,
309 :(before "End Primitive Recipe Numbers")
310 put(Recipe_ordinal, "allocate", ALLOCATE);
311 :(before "End Primitive Recipe Implementations")
312 case ALLOCATE: {
313   // compute the space we need
314   int size = ingredients.at(0).at(0);
315   if (SIZE(ingredients) > 1) {
316   ¦ // array allocation
317   ¦ trace(9999, "mem") << "array length is " << ingredients.at(1).at(0) << end();
318   ¦ size = /*space for length*/1 + size*ingredients.at(1).at(0);
319   }
320   int result = allocate(size);
321   if (SIZE(current_instruction().ingredients) > 1) {
322   ¦ // initialize array length
323   ¦ trace(9999, "mem") << "storing " << ingredients.at(1).at(0) << " in location " << result+/*skip refcount*/1 << end();
324   ¦ put(Memory, result+/*skip refcount*/1, ingredients.at(1).at(0));
325   }
326   products.resize(1);
327   products.at(0).push_back(result);
328   break;
329 }
330 :(code)
331 int allocate(int size) {
332   // include space for refcount
333   ++size;
334   trace(9999, "mem") << "allocating size " << size << end();
335 //?   Total_alloc += size;
336 //?   ++Num_alloc;
337   // Allocate Special-cases
338   // compute the region of memory to return
339   // really crappy at the moment
340   ensure_space(size);
341   const int result = Current_routine->alloc;
342   trace(9999, "mem") << "new alloc: " << result << end();
343   // initialize allocated space
344   for (int address = result;  address < result+size;  ++address) {
345   ¦ trace(9999, "mem") << "storing 0 in location " << address << end();
346   ¦ put(Memory, address, 0);
347   }
348   Current_routine->alloc += size;
349   // no support yet for reclaiming memory between routines
350   assert(Current_routine->alloc <= Current_routine->alloc_max);
351   return result;
352 }
353 
354 //: statistics for debugging
355 //? :(before "End Globals")
356 //? int Total_alloc = 0;
357 //? int Num_alloc = 0;
358 //? int Total_free = 0;
359 //? int Num_free = 0;
360 //? :(before "End Reset")
361 //? if (!Memory.empty()) {
362 //?   cerr << Total_alloc << "/" << Num_alloc
363 //?        << " vs " << Total_free << "/" << Num_free << '\n';
364 //?   cerr << SIZE(Memory) << '\n';
365 //? }
366 //? Total_alloc = Num_alloc = Total_free = Num_free = 0;
367 
368 :(code)
369 void ensure_space(int size) {
370   if (size > Initial_memory_per_routine) {
371   ¦ cerr << "can't allocate " << size << " locations, that's too much compared to " << Initial_memory_per_routine << ".\n";
372   ¦ exit(1);
373   }
374   if (Current_routine->alloc + size > Current_routine->alloc_max) {
375   ¦ // waste the remaining space and create a new chunk
376   ¦ Current_routine->alloc = Memory_allocated_until;
377   ¦ Memory_allocated_until += Initial_memory_per_routine;
378   ¦ Current_routine->alloc_max = Memory_allocated_until;
379   ¦ trace(9999, "new") << "routine allocated memory from " << Current_routine->alloc << " to " << Current_routine->alloc_max << end();
380   }
381 }
382 
383 :(scenario new_initializes)
384 % Memory_allocated_until = 10;
385 % put(Memory, Memory_allocated_until, 1);
386 def main [
387   1:address:num <- new number:type
388 ]
389 +mem: storing 0 in location 10
390 
391 :(scenario new_size)
392 def main [
393   11:address:num/raw <- new number:type
394   12:address:num/raw <- new number:type
395   13:num/raw <- subtract 12:address:num/raw, 11:address:num/raw
396 ]
397 # size of number + refcount
398 +mem: storing 2 in location 13
399 
400 :(scenario new_array_size)
401 def main [
402   1:address:array:num/raw <- new number:type, 5
403   2:address:num/raw <- new number:type
404   3:num/raw <- subtract 2:address:num/raw, 1:address:array:num/raw
405 ]
406 # 5 locations for array contents + array length + refcount
407 +mem: storing 7 in location 3
408 
409 :(scenario new_empty_array)
410 def main [
411   1:address:array:num/raw <- new number:type, 0
412   2:address:num/raw <- new number:type
413   3:num/raw <- subtract 2:address:num/raw, 1:address:array:num/raw
414 ]
415 +run: {1: ("address" "array" "number"), "raw": ()} <- new {number: "type"}, {0: "literal"}
416 +mem: array length is 0
417 # one location for array length, and one for the refcount
418 +mem: storing 2 in location 3
419 
420 //: If a routine runs out of its initial allocation, it should allocate more.
421 :(scenario new_overflow)
422 % Initial_memory_per_routine = 3;  // barely enough room for point allocation below
423 def main [
424   1:address:num/raw <- new number:type
425   2:address:point/raw <- new point:type  # not enough room in initial page
426 ]
427 +new: routine allocated memory from 1000 to 1003
428 +new: routine allocated memory from 1003 to 1006
429 
430 :(scenario new_without_ingredient)
431 % Hide_errors = true;
432 def main [
433   1:address:number <- new  # missing ingredient
434 ]
435 +error: main: 'new' requires one or two ingredients, but got '1:address:number <- new'