summary refs log tree commit diff stats
path: root/tests
Commit message (Expand)AuthorAgeFilesLines
* Refactor the bookmark validation codeWojciech Siewierski2019-01-271-2/+7
* Make it easier to test bookmarks by optionally disabling validationWojciech Siewierski2019-01-271-2/+2
* updated original filenameJon Erling Hustadnes2018-12-181-1/+1
* corrected for python 2.7Jon Erling Hustadnes2018-12-171-3/+3
* Added check if bookmark file is a symlink.Jon Erling Hustadnes2018-12-171-0/+17
* Implemented unit test for recent fixtau32018-02-152-0/+18
* attempt to fix CIhut2018-01-281-0/+2
* fix testshut2018-01-281-4/+5
* added test script that checks for man page completenesshut2018-01-261-0/+54
* container.fsobject: Fix natural sortnfnty2017-01-241-4/+23
* Python 3 division: Import `division` from `__future__`nfnty2017-01-213-3/+3
* linting: Python 2 compat: Import from `__future__`nfnty2017-01-173-0/+6
* linting: pylint and flake8nfnty2017-01-173-54/+54
* Fix misspellingsstepshal2016-06-262-2/+2
* Remove whitespace in blank linesstepshal2016-06-191-2/+2
* Merge branch 'E301' of https://github.com/stepshal/rangerhut2016-06-181-0/+2
|\
| * Add one blank line where is expectedstepshal2016-06-161-0/+2
* | Add two blank lines where is expectedstepshal2016-06-162-0/+2
|/
* Make exactly one space after commastepshal2016-06-061-1/+1
* test_fsobject: fix syntax errorhut2016-03-251-2/+2
* Remove accidental prints for debugginglverweijen2016-03-221-4/+0
* Make natural_sort's behaviour better definedlverweijen2016-03-081-0/+38
* container.history: fixed rebase() unit testhut2016-02-281-3/+4
* bookmarks: add testLaurent Charignon2016-01-311-0/+52
* history: fix logic error and add test for all the methodsLaurent Charignon2016-01-311-2/+96
* tests: add a dummy pytest test and add it to the make test stepLaurent Charignon2016-01-314-0/+5
='#n281'>281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297
<HTML>
<HEAD>
<TITLE>Computer Science Logo Style vol 3 ch 5: Programming Language Implementation</TITLE>
</HEAD>
<BODY>
<CITE>Computer Science Logo Style</CITE> volume 3:
<CITE>Beyond Programming</CITE> 2/e Copyright (C) 1997 MIT
<H1>Programming Language Implementation</H1>

<TABLE width="100%"><TR><TD>
<IMG SRC="../csls3.jpg" ALT="cover photo">
<TD><TABLE>
<TR><TD align="right"><CITE><A HREF="http://www.cs.berkeley.edu/~bh/">Brian
Harvey</A><BR>University of California, Berkeley</CITE>
<TR><TD align="right"><BR>
<TR><TD align="right"><A HREF="../pdf/v3ch05.pdf">Download PDF version</A>
<TR><TD align="right"><A HREF="../v3-toc2.html">Back to Table of Contents</A>
<TR><TD align="right"><A HREF="https://people.eecs.berkeley.edu/~bh/v3ch4/v3ch4.html"><STRONG>BACK</STRONG></A>
chapter thread <A HREF="../v3ch6/v3ch6.html"><STRONG>NEXT</STRONG></A>
<TR><TD align="right"><A HREF="https://mitpress.mit.edu/books/computer-science-logo-style-second-edition-volume-3">MIT
Press web page for <CITE>Computer Science Logo Style</CITE></A>
</TABLE></TABLE>

<HR><P>Program file for this chapter: <A HREF="pascal.lg"><CODE>pascal</CODE></A>

<P>We are now ready to turn from the questions of language design to those of
compiler implementation.  A Pascal compiler is a much larger programming
project than most of the ones we've explored so far.  You might well ask,
&quot;where do we <EM>begin</EM> in writing a compiler?&quot; My goal in this chapter
is to show some of the parts that go into a compiler design.

<P>A compiler translates programs from a language like Pascal into the
machine language of some particular computer model.  My compiler
translates into a simplified, simulated machine language; the compiled
programs are actually carried out by another Logo program, the simulator,
rather than directly by the computer hardware.  The advantage of using this
simulated machine language is that this compiler will work no matter what
kind of computer you have; also, the simplifications in this simulated
machine allow me to leave out many confusing details of a practical compiler.
Our machine language is, however, realistic enough to give you a good sense
of what compiling into a real machine language would be like; it's loosely
based on the MIPS microprocessor design.  You'll see in a moment that most
of the structure of the compiler is independent of the target language,
anyway.

<P>

<P>Here is a short, uninteresting Pascal program:

<P><PRE>program test;

procedure doit(n:integer);
   begin
      writeln(n,n*n)
   end;

begin
   doit(3)
end.
</PRE>

<P>If you type this program into a disk file and then compile it
using <CODE>compile</CODE> as described in Chapter 4, the compiler will translate
the program into this sequence of instructions, contained in a list in the
variable named <CODE>%test</CODE>:

<P>

<P><PRE>[       [add 3 0 0]
        [add 4 0 0]
        [addi 2 0 36]
        [jump &quot;g1]
%doit   [store 1 0(4)]
        [jump &quot;g2]
g2      [rload 7 36(4)]
        [putint 10 7]
        [rload 7 36(4)]
        [rload 8 36(4)]
        [mul 7 7 8]
        [putint 10 7]
        [newline]
        [rload 1 0(4)]
        [add 2 4 0]
        [rload 4 3(2)]
        [jr 1]
g1      [store 5 1(2)]
        [add 5 2 0]
        [addi 2 2 37]
        [store 4 3(5)]
        [store 4 2(5)]
        [addi 7 0 3]
        [store 7 36(5)]
        [add 4 5 0]
        [rload 5 1(4)]
        [jal 1 &quot;%doit]
        [exit]
]
</PRE>

<P>I've displayed this list of instructions with some extra spacing
thrown in to make it look somewhat like a typical <EM>assembler</EM> listing.
(An assembler is a program that translates a notation like <CODE>add 3 0 0</CODE>
into a binary number, the form in which the machine hardware actually
recognizes these instructions.)  A real assembler listing wouldn't have the
square brackets that Logo uses to mark each sublist, but would instead
depend on the convention that each instruction occupies one line.

<P>The first three instructions carry out initialization that would be the same
for any compiled Pascal program; the fourth is a <CODE>jump</CODE> instruction that
tells the (simulated) computer to skip to the instruction following the
<EM>label</EM> <CODE>g1</CODE> that appears later in the program.  (A word that
isn't part of a sublist is a label.)  In Pascal, the body of the main
program comes after the declarations of procedures; this <CODE>jump</CODE>
instruction allows the compiler to translate the parts of the program in the
order in which they appear.

<P>(Two instructions later, you'll notice a <CODE>jump</CODE> to a label that comes
right after the jump instruction!  The compiler issues this useless
instruction just in case some internal procedures were declared within
the procedure <CODE>doit</CODE>.  A better compiler would include an <EM>
optimizer</EM> that would go through the compiled program looking for
ways to eliminate unnecessary instructions such as this one.  The optimizer
is the most important thing that I've left out of my compiler.)

<P>We're not ready yet to talk in detail about how the compiled instructions
represent the Pascal program, but you might be able to guess certain things.
For example, the variable <CODE>n</CODE> in procedure <CODE>doit</CODE> seems to be
represented as <CODE>36(4)</CODE> in the compiled program; you can see where <CODE>
36(4)</CODE> is printed and then multiplied by itself, although it may not yet be
clear to you what the numbers <CODE>7</CODE> and <CODE>8</CODE> have to do with anything.
Before we get into those details, I want to give a broader overview of
the organization of the compiler.

<P>The compilation process is divided into three main pieces.  First and
simplest is <EM>tokenization.</EM> The compiler initially sees the
source program as a string of characters: <CODE>p</CODE>, then <CODE>r</CODE>, and so on,
including spaces and line separators.  The first step in compilation is to
turn these characters into symbols, so that the later stages of compilation
can deal with the word <CODE>program</CODE> as a unit.  The second piece of the
compiler is the <EM>parser,</EM> the part that recognizes certain
patterns of symbols as representing meaningful units.  &quot;Oh,&quot; says the
parser, &quot;I've just seen the word <CODE>procedure</CODE> so what comes next must be
a procedure header and then a <CODE>begin</CODE>-<CODE>end</CODE> block for the body of
the procedure.&quot; Finally, there is the process of <EM>
code generation,</EM> in which each unit that was recognized by the
parser is actually translated into the equivalent machine language
instructions.

<P>(I don't mean that parsing and code generation happen separately, one after
the other, in the compiler's algorithm.  In fact each meaningful unit is
translated as it's encountered, and the translation of a large unit like a
procedure includes, recursively, the translation of smaller units like
statements.  But parsing and code generation are conceptually two different
tasks, and we'll talk about them separately.)

<P><H2>Formal Syntax Definition</H2>

<P>One common starting place is to develop a formal definition for the language
we're trying to compile.  The regular expressions of Chapter 1 are an
example of what I mean by a formal definition.  A
regular expression tells
us unambiguously that certain strings of characters are accepted as members
of the category defined by the expression, while other strings aren't.  A
language like Pascal is too complicated to be described by a regular
expression, but other kinds of formal definition can be used.

<P>The formal systems of Chapter 1 just gave a yes-or-no decision for any input
string:  Is it, or is it not, accepted in the language under discussion?
That's not quite good enough for a compiler.  We don't just want to know
whether a Pascal program is syntactically correct; we want a translation of
the program into some executable form.  Nevertheless, it turns out to be
worthwhile to begin by designing a formal acceptor for Pascal.  That part of
the compiler--the part that determines the syntactic structure of the
source program--is called the <EM>parser.</EM>  Later we'll add provisions for
<EM>code generation:</EM> the translation of each syntactic unit of the
source program into a piece of <EM>object</EM> (executable) program that
carries out the meaning (the <EM>semantics</EM>) of that unit.

<P>One common form in which programming languages are described is the
<EM>production rule</EM> notation mentioned briefly in Chapter 1.  For
example, here is part of a specification for Pascal:

<P><PRE>program:  program <EM>identifier</EM> <EM>filenames</EM> ; <EM>block</EM> .

filenames:  | ( <EM>idlist</EM> )

idlist:  <EM>identifier</EM> | <EM>idlist</EM> , <EM>identifier</EM>

block:  <EM>varpart</EM> <EM>procpart</EM> <EM>compound</EM>

varpart:  | var <EM>varlist</EM>

procpart:  | <EM>procpart</EM> <EM>procedure</EM> | <EM>procpart</EM> <EM>function</EM>

compound:  begin <EM>statements</EM> end

statements:  <EM>statement</EM> | <EM>statements</EM> ; <EM>statement</EM>

procedure:  procedure <EM>identifier</EM> <EM>args</EM> ; <EM>block</EM> ;

function:  function <EM>identifier</EM> <EM>args</EM> : <EM>type</EM> ; <EM>block</EM> ;
</PRE>

<P>A program consists of six components.  Some of these components
are particular words (like <CODE>program</CODE>) or punctuation marks; other
components are defined in terms of even smaller units by other
rules.<SUP>*</SUP>

<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>The <EM>filenames</EM> component is an optional list of
names of files, part of Pascal's input/output capability; my compiler doesn't
handle file input or output, so it ignores this list if there is one.</SMALL></BLOCKQUOTE></SMALL><P>A vertical bar (<CODE>|</CODE>) in a rule separates alternatives; an idlist
(identifier list) is either a single identifier or a smaller idlist followed
by a comma and another identifier.  Sometimes one of the alternatives in a
rule is empty; for example, a varpart can be empty because a block need not
declare any local variables.

<P>The goal in designing a formal specification is to capture the
syntactic hierarchy
of the language you're describing.  For example, you could define
a Pascal type as

<P><PRE>type:  integer | real | char | boolean | array <EM>range</EM> of integer |
               packed array <EM>range</EM> of integer | array <EM>range</EM> of real |
               ...
</PRE>

<P>but it's better to say

<P><PRE>type:  <EM>scalar</EM> | array <EM>range</EM> of <EM>scalar</EM> |
               packed array <EM>range</EM> of <EM>scalar</EM>

scalar:  integer | real | char | boolean
</PRE>

<P>Try completing the syntactic description of my subset of Pascal along these
lines.  You might also try a similar syntactic description of Logo.  Which
is easier?

<P>Another kind of formal description is

the <EM>recursive transition network</EM> (RTN).
An RTN is like a finite-state machine except that instead
of each arrow representing a single symbol in the machine's alphabet, an
arrow can be labeled with the name of another RTN; such an arrow represents
any string of symbols accepted by that RTN.

<P>Below I show two RTNs, one for a program and one for a
sequence of statements (the body of a compound statement).
In the former, the transition from state 5 to state 6 is followed if what
comes next in the Pascal program is a string of symbols accepted by the RTN
named &quot;block.&quot; In these diagrams, a word in <CODE>typewriter</CODE> style like
<CODE>program</CODE> represents a single symbol, as in a finite-state machine diagram,
while a word in <EM>italics</EM> like <EM>block</EM> represents any
string accepted by the RTN of that name.  The <EM>statements</EM> RTN
is recursive; one path through the network involves a transition that
requires parsing a smaller <EM>statements</EM> unit.

<P><CENTER><IMG SRC="rtn1.gif" ALT="figure: rtn1"></CENTER>
<P><CENTER><IMG SRC="rtn2.gif" ALT="figure: rtn2"></CENTER>

<P><H2>Tokenization</H2>


<P>In both the production rules and the RTNs I've treated words like <CODE>
program</CODE> as a single symbol of the &quot;alphabet&quot; of the language.  It would
be possible, of course, to use single characters as the alphabetic symbols
and describe the language in this form:

<P><CENTER><IMG SRC="https://people.eecs.berkeley.edu/~bh/v3ch5/tokenrtn.gif" ALT="figure: tokenrtn"></CENTER>

<P>Extending the formal description down to that level, though, makes
it hard to see the forest for the trees; the important structural patterns
get lost in details about, for instance, where spaces are required between
words (as in <CODE>program tower</CODE>), where they're optional (as in <CODE>
2 + 3</CODE>), and where they're not allowed at all (<CODE>prog
ram</CODE>).  A similar complication is that a comment in braces can
be inserted anywhere in the program; it would be enormously complicated if
every state of every RTN had to have a transition for a left brace beginning
a comment.

<P>Most language processors therefore group the characters of the source
program into <EM>tokens</EM> used as the alphabet for the formal
grammar.  A token may be a single character, such as a punctuation mark, or
a group of characters, such as a word or a number.  Spaces do not ordinarily
form part of tokens, although in the Pascal compiler one kind of token is a
quoted character string that can include spaces.  Comments are also removed
during tokenization.  Here's what the <CODE>tower</CODE> program from Chapter 4
looks like in token form:

<P><CENTER><IMG SRC="tokenized.gif"></CENTER>

<P>Tokenization is what the Logo <CODE>readlist</CODE> operation does when
it uses spaces and brackets to turn the string of characters you type into
a sequence of words and lists.

<P>Tokenization is also called <EM>lexical analysis.</EM> This term has
nothing to do with lexical scope; the word &quot;lexical&quot; is used not to remind
us of a dictionary but because the root &quot;lex&quot; means <EM>word</EM> and
lexical analysis divides the source program into words.

<P>I've been talking as if the Pascal compiler first went through the entire
source file tokenizing it and then went back and parsed the result.  That's
not actually how it works; instead, the parser just calls a procedure named
<CODE>token</CODE> whenever it wants to see the next token in the source file.
I've already mentioned that Pascal was designed to allow the compiler to
read straight through the source program without jumping around and
re-reading parts of it.

<P><H2>Lookahead</H2>


<P>Consider the situation when the parser has recognized the first token
(<CODE>program</CODE>) as the beginning of a program and it invokes <CODE>token</CODE> to
read the second token, the program name.  In the <CODE>tower</CODE> program, the
desired token is <CODE>tower</CODE>.  <CODE>Token</CODE> reads the letter <CODE>t</CODE>; since
it's a letter, it must be the beginning of an identifier.  Any number of
letters or digits following the <CODE>t</CODE> will be part of the identifier, but
the first non-alphanumeric character ends the token.  (In this case, the
character that ends the token will be a semicolon.)

<P>What this means is that <CODE>token</CODE> has to read one character too many in
order to find the end of the word <CODE>tower</CODE>.  The semicolon isn't
part of that token; it's part of the <EM>following</EM> token.  (In fact it's
the entire following token, but in other situations that need not be true.)
Ordinarily <CODE>token</CODE> begins its work by reading a character from the
source file, but the next time we call <CODE>token</CODE> it has to deal with the
character it's already read.  It would simplify things enormously if <CODE>
token</CODE> could &quot;un-read&quot; the semicolon that ends the token <CODE>
tower</CODE>.  It's possible to allow something like un-reading by using a
technique called <EM>lookahead.</EM>

<P><PRE>to getchar
local &quot;char
if namep &quot;peekchar
    [make &quot;char :peekchar
     ern &quot;peekchar
     output :char]
output readchar
end
</PRE>

<P><CODE>Getchar</CODE> is the procedure that <CODE>token</CODE> calls to read the next
character from the source file.  Ordinarily <CODE>getchar</CODE> just invokes the
primitive <CODE>readchar</CODE> to read a character from the file.<SUP>*</SUP>  But if there is a variable named <CODE>peekchar</CODE>, then <CODE>
getchar</CODE> just outputs whatever is in that variable without looking at the
file.  <CODE>Token</CODE> can now un-read a character by saying

<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>I'm lying.
The real <CODE>getchar</CODE> is slightly more complicated because it checks for an
unexpected end of file and because it prints the characters that it reads
onto the screen.  The program listing at the end of the chapter tells the
whole story.</SMALL></BLOCKQUOTE></SMALL><P><PRE>make &quot;peekchar :char
</PRE>

<P>This technique only allows <CODE>token</CODE> to un-read a single
character at a time.  It would be possible to replace <CODE>peekchar</CODE> with a
<EM>list</EM> of pre-read characters to be recycled.  But in fact one is
enough.  When a program &quot;peeks at&quot; characters before they're read &quot;for
real,&quot; the technique is called <EM>lookahead.</EM>  <CODE>Getchar</CODE> uses
<EM>one-character lookahead</EM> because <CODE>peekchar</CODE> only stores a single
character.

<P>It turns out that, for similar reasons, the Pascal parser will occasionally
find it convenient to peek at a <EM>token</EM> and re-read it later.  <CODE>
Token</CODE> therefore provides for one-<EM>token</EM> lookahead using a similar
mechanism:

<P><PRE>to token
local [token char]
if namep &quot;peektoken [make &quot;token :peektoken
                     ern &quot;peektoken   output :token]
make &quot;char getchar
if equalp :char &quot;|{| [skipcomment output token]
if equalp :char char 32 [output token]
if equalp :char char 13 [output token]
if equalp :char char 10 [output token]
if equalp :char &quot;' [output string &quot;']
if memberp :char [+ - * / = ( , ) |[| |]| |;|] [output :char]
if equalp :char &quot;|&lt;| [output twochar &quot;|&lt;| [= &gt;]]
if equalp :char &quot;|&gt;| [output twochar &quot;|&gt;| [=]]
if equalp :char &quot;. [output twochar &quot;. [.]]
if equalp :char &quot;: [output twochar &quot;: [=]]
if numberp :char [output number :char]
if letterp ascii :char [output token1 lowercase :char]
(throw &quot;error sentence [unrecognized character:] :char)
end

to twochar :old :ok
localmake &quot;char getchar
if memberp :char :ok [output word :old :char]
make &quot;peekchar :char
output :old
end
</PRE>

<P>As you can see, <CODE>token</CODE> is mainly a selection of special cases.
<CODE>Char 32</CODE> is a space; <CODE>char 13</CODE> or <CODE>char 10</CODE> is the end-of-line
character.  <CODE>Skipcomment</CODE> skips over characters until it sees a right
brace.  <CODE>String</CODE> accumulates characters up to and including a single
quote (apostrophe), except that two single quotes in a row become one single
quote inside the string and don't end the string.  <CODE>Number</CODE> is a little
tricky because of decimal points (the string of characters <CODE>1.10</CODE> is a
single token, but the string <CODE>1..10</CODE> is three tokens!) and exponent
notation.  I'm not showing you all the details because the compiler is a
very large program and we'll never get through it if I annotate every
procedure.  But I did want to show you <CODE>twochar</CODE> because it's a good,
simple example of character lookahead at work.  If the character <CODE>&lt;</CODE> is
seen in the source program, it may be a token by itself or it may be part of
the two-character tokens <CODE>&lt;=</CODE> or <CODE>&lt;&gt;</CODE>.  <CODE>Twochar</CODE> takes a peek
at the next character in the file to decide.

<P>If the character that <CODE>token</CODE> reads isn't part of any recognizable
token, the procedure generates an error.  (The error is caught by the
toplevel procedure <CODE>compile</CODE> so that it can close the source file.)
This extremely primitive error handling is one of the most serious
deficiencies in my compiler; it would be better if the compilation process
continued, despite the error, so that any other errors in the program could
also be discovered.  In a real compiler, more than half of the parsing
effort goes into error handling; it's relatively trivial to parse a correct
source program.

<P><H2>Parsing</H2>


<P>There are general techniques for turning a formal language specification,
such as a set of production rules, into an algorithm for parsing the
language so specified.  These techniques are analogous to the program in
Chapter 1 that translates a regular expression into a finite-state machine.
A program that turns a formal specification into a parser is called a
<EM>parser generator.</EM>

<P>The trouble is that the techniques that work for <EM>any</EM> set of rules
are quite slow.  The time required to parse a sequence of length <EM>n</EM> is
O(<EM>n</EM><SUP><SMALL>2</SMALL></SUP>) if the grammar is unambiguous or O(<EM>n</EM><SUP><SMALL>3</SMALL></SUP>) if it's
ambiguous.  A grammar is <EM>ambiguous</EM> if the same input sequence
can be parsed correctly in more than one way.  For example, if the
production rule

<P><PRE>idlist:  <EM>identifier</EM> | <EM>idlist</EM> , <EM>identifier</EM>
</PRE>

<P>is applied to the string

<P><PRE>Beatles,Who,Zombies,Kinks
</PRE>

<P>then the only possible application of the rule to accept the
string produces this left-to-right grouping:

<P><CENTER><IMG SRC="unambiguous.gif" ALT="figure: unambiguous"></CENTER>

<P>However, if the rule were

<P><PRE>idlist:  <EM>identifier</EM> | <EM>idlist</EM> , <EM>idlist</EM>
</PRE>

<P>this new rule would accept the same strings, but would allow
alternative groupings like

<P><CENTER><IMG SRC="ambiguous.gif" ALT="figure: ambiguous"></CENTER>

<P>The former rule could be part of an unambiguous grammar; the new
rule makes the grammar that contains it ambiguous.

<P>

<P>It's usually not hard to devise an unambiguous grammar for any practical
programming language, but even a quadratic algorithm is too slow.  Luckily,
most programming languages have <EM>deterministic grammars,</EM>
which is a condition even stricter than being unambiguous.  It means that a
parser can read a program from left to right, and can figure out what to do
with the next token using only a fixed amount of lookahead.  A parser for a
deterministic grammar can run in linear time, which is a lot better than
quadratic.

<P>When I said &quot;figure out what to do with the next token,&quot; I was being
deliberately vague.  A deterministic parser doesn't necessarily know exactly
how a token will fit into the complete program--which production rules will
be branch nodes in a parse tree having this token as a leaf node--as soon
as it reads the token.  As a somewhat silly example, pretend that the word
<CODE>if</CODE> is not a &quot;reserved word&quot; in Pascal; suppose it could be the
name of a variable.  Then, when the parser is expecting the beginning of a
new statement and the next token is the word <CODE>if</CODE>, the parser doesn't
know whether it is seeing the beginning of a conditional statement such as
<CODE>if x &gt; 0 then writeln('positive')</CODE> or the beginning of an assignment
statement such as <CODE>if := 87</CODE>.  But the parser could still be deterministic.
Upon seeing the word <CODE>if</CODE>, it would enter a state (as in a finite state
machine) from which there are two exits.  If the next token turned out to be
the <CODE>:=</CODE> assignment operator, the parser would follow one transition; if
the next token was a variable or constant value, the parser would choose a
different next state.

<P>The real Pascal, though, contains no such syntactic cliffhangers.  A Pascal
compiler can always tell which production rule the next token requires.
That's why the language includes keywords like <CODE>var</CODE>, <CODE>
procedure</CODE>, and <CODE>function</CODE>.  For the most part, you could figure out
which kind of declaration you're reading without those keywords by looking
for clues like whether or not there are parentheses after the identifier
being declared.  (If so, it's a procedure or a function.)  But the keywords
let you know from the beginning what to expect next.  That means we can
write what's called a <EM>predictive grammar</EM> for Pascal,
even simpler to implement than a deterministic one.

<P>There are general algorithms for parsing deterministic languages, and there
are parser generators using these algorithms.  One widely used example is
the YACC (Yet Another Compiler Compiler) program that translates
production rules into a parser in the C programming language.<SUP>*</SUP> But because Pascal's
grammar is so simple I found it just as easy to do the translation by hand.
For each production rule in a formal description of Pascal, the compiler
includes a Logo procedure that parses each component part of the production
rule.  A parser written in this way is called a <EM>
recursive descent parser.</EM> Here's a sample:

<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>A parser
generator is also called a <EM>compiler compiler</EM> because it treats
the formal specification as a kind of source program and produces a compiler
as the object program.  But the name isn't quite accurate because, as you
know, there's more to a compiler than the parser.</SMALL></BLOCKQUOTE></SMALL><P><PRE>to statement
local [token type]
ifbe &quot;begin [compound stop]
ifbe &quot;for [pfor stop]
ifbe &quot;if [pif stop]
ifbe &quot;while [pwhile stop]
ifbe &quot;repeat [prepeat stop]
ifbe &quot;write [pwrite stop]
ifbe &quot;writeln [pwriteln stop]
make &quot;token token
make &quot;peektoken :token
if memberp :token [|;| end until] [stop]
make &quot;type gettype :token
if emptyp :type [(throw &quot;error sentence :token [can't begin statement])]
if equalp :type &quot;procedure [pproccall stop]
if equalp :type &quot;function [pfunset stop]
passign
end

to pif
local [cond elsetag endtag]
make &quot;cond pboolean pexpr
make &quot;elsetag gensym
make &quot;endtag gensym
mustbe &quot;then
code (list &quot;jumpf :cond (word &quot;&quot; :elsetag))
regfree :cond
statement
code (list &quot;jump (word &quot;&quot; :endtag))
code :elsetag
ifbe &quot;else [statement]
code :endtag
end
</PRE>

<P>Many of the details of <CODE>pif</CODE> have to do with code
generation, but never mind those parts now.  For the
moment, my concern is with the parsing aspect of these procedures: how they
decide what to accept.

<P><CODE>Statement</CODE> is an important part of the parser; it is invoked whenever a
Pascal statement is expected.  It begins by checking the next token from the
source file.  If that token is <CODE>begin</CODE>, <CODE>for</CODE>, <CODE>if</CODE>, <CODE>while</CODE>,
or <CODE>repeat</CODE> then we're finished with the token and <CODE>statement</CODE> turns
to a subprocedure to handle the syntax of whatever structured statement type
we've found.  If the token isn't one of those, then the statement has to be
a simple statement and the token has to be an identifier, i.e., the name of
a procedure, a function, or a variable.  (One other trivial possibility is
that this is an <EM>empty</EM> statement, if we're already up to the
semicolon, <CODE>end</CODE>, or <CODE>until</CODE> that marks the end of a statement.)
In any of these cases, the token we've just read is important to the parsing
procedure that will handle the simple statement, so <CODE>statement</CODE> un-reads
it before deciding what to do next.  <CODE>Gettype</CODE> outputs the type of the
identifier, either a variable type like <CODE>real</CODE> or else <CODE>procedure</CODE>
or <CODE>function</CODE>.  (The compiler data structures that underlie the work of
<CODE>gettype</CODE> will be discussed later.)  If the token is a
procedure name, then this is a procedure call statement.  If the token is a
function name, then this is the special kind of assignment inside a function
definition that provides the return value for the function.  Otherwise, the
token must be a variable name and this is an ordinary assignment statement.

<P>The procedure <CODE>pif</CODE> parses <CODE>if</CODE> statements.  (The letter <CODE>p</CODE> in
its name stands for &quot;Pascal&quot;; many procedures in the compiler have such
names to avoid conflicts with Logo procedures with similar purposes.)  The
syntax of Pascal <CODE>if</CODE> is

<P><PRE>ifstatement:  if <EM>boolean</EM> then <EM>statement</EM> |
                 if <EM>boolean</EM> then <EM>statement</EM> else <EM>statement</EM>
</PRE>

<P>When <CODE>pif</CODE> begins, the token <CODE>if</CODE> has just been read by
<CODE>statement</CODE>.  So the first thing that's required is a boolean expression.
<CODE>Pexpr</CODE> parses an expression; that task is relatively complicated and
will be discussed in more detail later.  <CODE>Pboolean</CODE> ensures that the
expression just parsed does indeed produce a value of type <CODE>boolean</CODE>.

<P>The next token in the source file <EM>must</EM> be the word <CODE>then</CODE>.  The
instruction

<P><PRE>mustbe &quot;then
</PRE>

<P>in <CODE>pif</CODE> ensures that.  Here's <CODE>mustbe</CODE>:

<P><PRE>to mustbe :wanted
localmake &quot;token token
if equalp :token :wanted [stop]
(throw &quot;error (sentence &quot;expected :wanted &quot;got :token))
end
</PRE>

<P>If <CODE>mustbe</CODE> returns successfully, <CODE>pif</CODE> then invokes
<CODE>statement</CODE> recursively to parse the true branch of the conditional.
The production rule tells us that there is then an <EM>optional</EM> false
branch, signaled by the reserved word <CODE>else</CODE>.  The instruction

<P><PRE>ifbe &quot;else [statement]
</PRE>

<P>handles that possibility.  If the next token matches the first
input to <CODE>ifbe</CODE> then the second input, an instruction list, is carried
out.  Otherwise the token is un-read.  There is also an <CODE>ifbeelse</CODE> that
takes a third input, an instruction list to be carried out if the next token
isn't equal to the first input.  (<CODE>Ifbeelse</CODE> still un-reads the token in
that case, before it runs the third input.)  These must be macros so that
the instruction list inputs can include <CODE>output</CODE> or <CODE>stop</CODE>
instructions (as discussed in Volume 2), as in the invocations of <CODE>ifbe</CODE>
in <CODE>statement</CODE> seen a moment ago.

<P><PRE>.macro ifbe :wanted :action
localmake &quot;token token
if equalp :token :wanted [output :action]
make &quot;peektoken :token
output []
end

.macro ifbeelse :wanted :action :else
localmake &quot;token token
if equalp :token :wanted [output :action]
make &quot;peektoken :token
output :else
end
</PRE>

<P>If there were no code generation involved, <CODE>pif</CODE> would be written this
way:

<P><PRE>to pif
pboolean pexpr
mustbe &quot;then
statement
ifbe &quot;else [statement]
end
</PRE>

<P>This simplified procedure is a straightforward translation of the
RTN

<P><CENTER><IMG SRC="pif.gif" ALT="figure: pif"></CENTER>

<P>The need to generate object code complicates the parser.  But
don't let that distract you; in general you can see the formal structure of
Pascal syntax reflected in the sequence of instructions used to parse that
syntax.

<P>The procedures that handle other structured statements, such as <CODE>pfor</CODE>
and <CODE>pwhile</CODE>, are a lot like <CODE>pif</CODE>.  Procedure and function
declarations (procedures <CODE>procedure</CODE>, <CODE>function</CODE>, and <CODE>proc1</CODE> in
the compiler) also use the same straightforward parsing technique, but are a
little more complicated because of the need to keep track of type
declarations for each procedure's parameters and local variables.
Ironically, the hardest thing to compile is the &quot;simple&quot; assignment
statement, partly because of operator precedence (multiplication before
addition) in expressions (procedure <CODE>pexpr</CODE> in the compiler) and partly
because of the need to deal with the complexity of variables, including
special cases such as assignments to <CODE>var</CODE> parameters and array elements.

<P>I haven't yet showed you <CODE>pboolean</CODE> because you have to understand how
the compiler handles expressions first.  But it's worth noticing that Pascal
can check <EM>at compile time</EM> whether or not an expression is going to
produce a <CODE>boolean</CODE> value even though the program hasn't been run yet
and the variables in the expression don't have values yet.  It's the strict
variable typing of Pascal that makes this compile-time checking possible.
If we were writing a Logo compiler, the checking would have to be postponed
until run time because you can't, in general, know what type of datum will
be computed by a Logo expression until it's actually evaluated.

<P><H2>Expressions and Precedence</H2>

<P>

<P>Arithmetic or boolean expressions appear not only on the right side of
assignment statements but also as actual parameters, array index values, and
as &quot;phrases&quot; in structured statements.  One of the classic problems in
compiler construction is the translation of these expressions to executable
form.  The interesting difficulty concerns <EM>
operator precedence</EM>--the rule that in a string of alternating
operators and operands, multiplications are done before additions, so

<P><PRE>a + b * c + d
</PRE>

<P>means

<P><PRE>a + (b * c) + d
</PRE>

<P>Pascal has four levels of operator precedence.  The highest level, number 4,

is the <EM>unary</EM> operators <CODE>+</CODE>, <CODE>-</CODE>, and <CODE>not</CODE>.  (The first

two can be used as unary operators (<CODE>-3</CODE>) or <EM>binary</EM> ones (<CODE>
6-3</CODE>); it's only in the unary case that they have this
precedence.)<SUP>*</SUP> Then comes multiplication, division, and
logical <CODE>and</CODE> at level 3.  Level 2 has binary addition, subtraction, and
<CODE>or</CODE>.  And level 1 includes the relational operators like
<CODE>=</CODE>.

<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>It's unfortunate that the word &quot;binary&quot; is used in
computer science both for base-2 numbers and for two-input operations.
 Kenneth Iverson, in his documentation for the
language APL, used the words <EM>monadic</EM> and <EM>
dyadic</EM> instead of unary and binary to avoid that ambiguity.  But
those terms haven't caught on.</SMALL></BLOCKQUOTE></SMALL><P>

<P>The formalization of precedence could be done using the mechanisms we've
already seen.  For example, here is a production rule grammar for
expressions using only the four basic arithmetic operations.

<P><PRE>expression:  <EM>term</EM> | <EM>expression</EM> + <EM>term</EM> | <EM>expression</EM> - <EM>term</EM>
term:  <EM>factor</EM> | <EM>term</EM> * <EM>factor</EM> | <EM>term</EM> / <EM>factor</EM>
factor:  <EM>variable</EM> | <EM>number</EM> | ( <EM>expression</EM> )
</PRE>

<P>This grammar also introduces into the discussion the fact that the
precedence of operations can be changed by using parentheses.

<P>This grammar, although formally correct, is not so easy to use in a
recursive descent parser.
One subtle but important problem is that it's <EM>left recursive:</EM>  Some
of the alternative forms for an <CODE>expression</CODE> start with an <CODE>
expression</CODE>.  If we tried to translate this into a Logo procedure it would
naturally start out

<P><PRE>to expression
local [left op right]
make &quot;left expression
ifbe &quot;+
  [make &quot;op &quot;+
   make &quot;right term]
  [ifbe &quot;-
     [make &quot;op &quot;-
      make &quot;right term]
     [make &quot;op []] ]
...
</PRE>

<P>But this procedure will never get past the first <CODE>make</CODE>; it's
an infinite loop.  It will never actually read a token from the source file;
instead it keeps invoking itself recursively.

<P>Left association is a problem for automatic compiler compilers, too.  There
are ways to solve the problem but I don't want to get into that because in
fact arithmetic expressions are generally handled by an entirely different
scheme, which I'll show you in a moment.  The problem wouldn't come up if
the order of the operands were reversed, so the rules said

<P><PRE>expression:  <EM>term</EM> | <EM>term</EM> + <EM>expression</EM> | <EM>term</EM> - <EM>expression</EM>
</PRE>

<P>and so on.  Unfortunately this changes the meaning, and the rules
of Pascal say that equal-precedence operations are performed left to right.

<P>In any case, the formalization of precedence with production rules gets more
complicated as the number of levels of precedence increases.  I showed you a
grammar with two levels.  Pascal, with four levels, might reasonably be done
in a similar way, but think about the C programming language, which has 15
levels of precedence!

<P><H2>The Two-Stack Algorithm for Expressions</H2>


<P>What we're after is an algorithm that will allow the compiler to read an
expression once, left to right, and group operators and operands correctly.
The algorithm involves the use of two stacks, one for operations and one for
data.  For each operation we need to know whether it's unary or binary and
what its precedence level is.  I'll use the notation &quot;<CODE>*</CODE><SUB><SMALL>2,3</SMALL></SUB>&quot; to
represent binary <CODE>*</CODE> at precedence level 3.  So the expression

<P><PRE>a + b * - c - d
</PRE>

<P>will be represented in this algorithm as

<P>
<IMG SRC="vdash.gif"><SUB><SMALL>0</SMALL></SUB> a +<SUB><SMALL>2,2</SMALL></SUB> b *<SUB><SMALL>2,3</SMALL></SUB> -<SUB><SMALL>1,4</SMALL></SUB> c -<SUB><SMALL>2,2</SMALL></SUB> d <IMG SRC="https://people.eecs.berkeley.edu/~bh/v3ch5/dashv.gif"><SUB><SMALL>0</SMALL></SUB>


<P>The symbols <IMG SRC="vdash.gif"> and <IMG SRC="https://people.eecs.berkeley.edu/~bh/v3ch5/dashv.gif"> aren't really part of the source
expression; they're imaginary markers for the beginning and end of the
expression.  When we read a token that doesn't make sense as part of an
expression, we can un-read that token and pretend we read a <IMG SRC="https://people.eecs.berkeley.edu/~bh/v3ch5/dashv.gif"> instead.
These markers are given precedence level zero because they form a boundary
for <EM>any</EM> operators inside them, just as a low-precedence operator like
<CODE>+</CODE> is a boundary for the operands of a higher-precedence operator like
<CODE>*</CODE>.  (For the same reason, you'll see that parentheses are considered
precedence zero.)

<P>The two minus signs in this expression have two different meanings.  As you
read the following algorithm description, you'll see how the algorithm knows
whether an operation symbol is unary or binary.

<P><STRONG>Step 1.</STRONG>  We initialize the two stacks this way:

<P><PRE>operation:  [ <IMG SRC="vdash.gif"><SUB><SMALL>0</SMALL></SUB> ]

data:       [ ]
</PRE>

<P><STRONG>Step 2.</STRONG>  We are now expecting a datum, such as a variable.  Read a
token.  If it's an operation, it must be unary; subscript it accordingly and
go to step 4.  If it's a datum, push it onto the data stack.  (If it's
neither an operation nor a datum, something's wrong.)

<P><STRONG>Step 3.</STRONG>  We are now expecting a binary operation.  Read a token.  If
it's an operation, subscript it as binary and go to step 4.  If not, we've
reached the end of the expression.  Un-read the token, and go to step 4 with
the token <IMG SRC="https://people.eecs.berkeley.edu/~bh/v3ch5/dashv.gif"><SUB><SMALL>0</SMALL></SUB>.

<P><STRONG>Step 4.</STRONG>  We have an operation in hand at this point and we know its
precedence level and how many arguments it needs.  Compare its precedence
level with that of the topmost (most recently pushed) operation on the stack.
If the precedence of the new operation is less than or equal to that of the
one on the stack, go to step 5.  If it's greater, go to step 6.

<P><STRONG>Step 5.</STRONG>  The topmost operation on the stack has higher precedence than
the one we just read, so we should do it right away.  (For example, we've
just read the <CODE>+</CODE> in <CODE>a*b+c</CODE>; the multiplication operation and both
of its operands are ready on the stacks.)  Pop the operation off
the stack, pop either one or two items off the data stack depending on the
first subscript of the popped operation, then compile machine instructions to
perform the indicated computation.  Push the result on the data stack
as a single quantity.  However, if the operation we popped is <IMG SRC="vdash.gif">, then
we're finished.  There should be only one thing on the data stack, and it's
the completely compiled expression.  Otherwise, we still have the new
operation waiting to be processed, so return to step 4.

<P><STRONG>Step 6.</STRONG>  The topmost operation on the stack has lower precedence than
the one we just read, so we can't do it yet because we're still reading its
right operand.  (For example, we've just read the <CODE>*</CODE> in <CODE>a+b*c</CODE>;
we're not ready to do either operation until we read the <CODE>c</CODE> later.)
Push the new operation onto the operation stack, then return to step 2.

<P>Here's how this algorithm works out with the sample expression above.  In
the data stack, a boxed entry like <IMG SRC="a+b.gif"> means the result from
translating that subexpression into the object language.

<P><IMG SRC="stack1.gif">
<BR><IMG SRC="stack2.gif">

<P>The final value on the data stack is the translation of the
entire expression.

<P>The algorithm so far does not deal with parentheses.  They're handled
somewhat like operations, but with slightly different rules.  A left
parenthesis is stored on the operation stack as <CODE>(</CODE><SUB><SMALL>0</SMALL></SUB>, like the
special marker at the beginning of the expression, but it does not invoke
step 5 of the algorithm before being pushed on the stack.  A right
parenthesis <EM>does</EM> invoke step 5, but only as far down the stack as
the first matching left parenthesis; if it were an ordinary operation of
precedence zero it would pop everything off the stack.  You might try to
express precisely how to modify the algorithm to allow for parentheses.

<P>Here are the procedures that embody this algorithm in the compiler.
<CODE>Pgetunary</CODE> and <CODE>pgetbinary</CODE> output a list like

<P><PRE>[sub 2 2]
</PRE>

<P>for binary <CODE>-</CODE> or

<P><PRE>[minus 1 4]
</PRE>

<P>for unary minus.  (I'm leaving out some complications
having to do with type checking.)  They work by looking for a <CODE>unary</CODE> or
<CODE>binary</CODE> property on the property list of the operation symbol.
Procedures with names like <CODE>op.prec</CODE> are selectors for the members
of these lists.

<P>In this algorithm, only step 5 actually generates any instructions in the
object program.  This is the step in which an operation is removed from
the operation stack and actually performed.  Step 5 is carried out by the
procedure <CODE>ppopop</CODE> (Pascal pop operation); most of that procedure deals
with code generation, but I've omitted that part of the procedure in the
following listing because right now we're concerned with the parsing
algorithm.  We'll return to code generation shortly.

<P><CODE>Pexpr1</CODE> invokes <CODE>pdata</CODE> when it expects to read an operand, which
could be a number, a variable, or a function call.  <CODE>Pdata</CODE>, which I'm
not showing here, generates code to make the operand available and
outputs the location of the result in the simulated computer, in a form
that can be used by <CODE>pexpr</CODE>.

<P><PRE>to pexpr
local [opstack datastack parenlevel]
make &quot;opstack [[popen 1 0]]                       ; step 1
make &quot;datastack []
make &quot;parenlevel 0
output pexpr1
end

to pexpr1 
local [token op]
make &quot;token token                                 ; step 2
while [equalp :token &quot;|(|] [popen  make &quot;token token]
make &quot;op pgetunary :token
if not emptyp :op [output pexprop :op]
push &quot;datastack pdata :token
make &quot;token token                                 ; step 3
while [and (:parenlevel &gt; 0) (equalp :token &quot;|)| )]
   [pclose  make &quot;token token]
make &quot;op pgetbinary :token
if not emptyp :op [output pexprop :op]
make &quot;peektoken :token
pclose
if not emptyp :opstack [(throw &quot;error [too many operators])]
if not emptyp butfirst :datastack [(throw &quot;error [too many operands])]
output pop &quot;datastack
end

to pexprop :op                                    ; step 4
while [(op.prec :op) &lt; (1 + op.prec first :opstack)] [ppopop]
push &quot;opstack :op                                 ; step 6
output pexpr1
end

to ppopop                                         ; step 5
local [op function args left right type reg]
make &quot;op pop &quot;opstack
make &quot;function op.instr :op
if equalp :function &quot;plus [stop]
make &quot;args op.nargs :op
make &quot;right pop &quot;datastack
make &quot;left (ifelse equalp :args 2 [pop &quot;datastack] [[[] []]])
make &quot;type pnewtype :op exp.type :left exp.type :right
... code generation omitted ...
push &quot;datastack (list :type &quot;register :reg)
end

to popen
push &quot;opstack [popen 1 0]
make &quot;parenlevel :parenlevel+1
end

to pclose
while [(op.prec first :opstack) &gt; 0] [ppopop]
ignore pop &quot;opstack
make &quot;parenlevel :parenlevel - 1
end
</PRE>

<P><H2>The Simulated Machine</H2>

<P>We're ready to move from parsing to code generation, but first you must
understand what a computer's native language is like.  Most computer models
in use today have a very similar structure, although there are differences
in details.  My simulated computer design makes these detail choices in
favor of simplicity rather than efficiency.  (It wouldn't be very efficient
no matter what, compared to real computers.  This &quot;computer&quot; is actually an
interpreter, written in Logo, which is itself an interpreter.  So we have
two levels of interpretation involved in each simulated instruction, whereas
on a real computer, each instruction is carried out directly by the
hardware.  Our compiled Pascal programs, as you've probably already noticed,
run very slowly.  That's not Pascal's fault, and it's not even primarily my
compiler's fault, even though the compiler doesn't include optimization
techniques.  The main slowdown is in the interpretation of the machine
instructions.)

<P>Every computer includes a <EM>processor,</EM> which decodes
instructions and carries out the indicated arithmetic operations, and a
<EM>memory,</EM> in which information (such as the values of
variables) is stored.  In modern computers, the processor is generally
a single <EM>integrated circuit,</EM> nicknamed a <EM>chip,</EM>
which is a rectangular black plastic housing one or two inches on a side
that contains thousands or even millions of tiny components made of silicon.
The memory is usually a <EM>circuit board</EM> containing several memory
chips.  Computers also include circuitry to connect with input and output
devices, but we're not going to have to think about those.  What makes one
computer model different from another is mainly the processor.  If you
have a PC, its processor is probably an Intel design with a name like
80486 or Pentium; if you have a Macintosh, the processor might be a Motorola
68040 or a Power PC chip.

<P>It turns out that the wiring connecting the processor to the memory is
often the main limiting factor on the speed of a computer.  Things happen at
great speed within the processor, and within the memory, but only one value
at a time can travel from one to the other.  Computer designers have invented
several ways to get around this problem, but the important one for our
purposes is that every modern processor includes a little bit of memory
within the processor chip itself.  By &quot;a little bit&quot; I mean that a typical
processor has enough memory in it to hold 32 values, compared to several
million values that can be stored in the computer's main memory.  The 32
memory slots within the processor are called <EM>
registers.</EM><SUP>*</SUP>

<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>One current topic in computer architecture research
is the development of <EM>parallel</EM> computers with many processors working
together.  In some of these designs, each processor includes its own
medium-size memory within the processor chip.</SMALL></BLOCKQUOTE></SMALL><P>Whenever you want to perform an arithmetic operation, the operands must
already be within the processor, in registers.  So, for example, the Pascal
instruction

<P><PRE>c := a + b
</PRE>

<P>isn't compiled into a single machine instruction.  First we must
<EM>load</EM> the values of <CODE>a</CODE> and <CODE>b</CODE> from memory into registers,
then add the two registers, then <EM>store</EM> the result back into memory:

<P><PRE>rload 8 a
rload 9 b
add 10 8 9
store 10 c
</PRE>

<P>The first <CODE>rload</CODE> instruction loads the value from memory
location <CODE>a</CODE> into register 8.<SUP>*</SUP> The <CODE>add</CODE> instruction adds
the numbers in registers 8 and 9, putting the result into register 10.  (In
practice, you'll see that the compiler would be more likely to conserve
registers by reusing one of the operand registers for the result, but for
this first example I wanted to keep things simple.)  Finally we store the
result into the variable <CODE>c</CODE> in memory.

<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>Really I should have called this
instruction <CODE>load</CODE>, but my machine simulator uses Logo procedures to
carry out the machine instructions, and I had to pick a name that wouldn't
conflict with the Logo <CODE>load</CODE> primitive.</SMALL></BLOCKQUOTE></SMALL><P>The instructions above are actually not machine language instructions, but
rather <EM>assembly language</EM> instructions, a kind of shorthand.
A program called an <EM>assembler</EM> translates assembly language into
machine language, in which each instruction is represented as a number.
For example, if the instruction code for <CODE>add</CODE> is <CODE>0023</CODE>, then
the <CODE>add</CODE> instruction above might be translated into <CODE>0023100809</CODE>,
with four digits for the instruction code and two digits for each of the three
register numbers.  (In reality the encoding would use binary numbers rather
than the decimal numbers I've shown in this example.)  Since a machine
language instruction is just a number, the instructions that make up a
computer program are stored in memory along with the program's data values.
But one of the simplifications I've made in my simulated computer is that
the simulator deals directly with assembly language instructions, and those
instructions are stored in a Logo list, separate from the program's data memory.

<P>The simulated computer has 32 processor registers plus 3000 locations of
main memory; it's a very small computer, but big enough for my sample
Pascal programs.  (You can change these sizes by editing procedure <CODE>
opsetup</CODE> in the compiler.)  The registers are numbered from 0 to 31, and
the memory locations are numbered from 0 to 2999.  The number of a memory
location is called its <EM>address.</EM>  Each memory location can hold
one numeric value.<SUP>*</SUP>  A Pascal array will be represented by a contiguous block of
memory locations, one for each member of the array.  Each register, too, can
hold one numeric value.  In this machine, as in some real computers, register
number 0 is special; it always contains the value zero.

<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>This, too, is a simplification.  In real computers,
different data types require different amounts of memory.  A character
value, for example, fits into eight <EM>bits</EM> (binary digits) of memory,
whereas an integer requires 32 bits in most current computers.  Instead of a
single <CODE>load</CODE> instruction, a real computer has a separate one for each
datum size.</SMALL></BLOCKQUOTE></SMALL><P>The simulated computer understands 50 instruction codes, fewer than most
real computers.  The first group we'll consider are the 14 binary arithmetic
instructions: <CODE>add</CODE>, <CODE>sub</CODE>, <CODE>mul</CODE>, <CODE>div</CODE> (real quotient),
<CODE>quo</CODE> (integer quotient), <CODE>rem</CODE> (remainder), <CODE>land</CODE> (logical
and), <CODE>lor</CODE> (logical or), <CODE>eql</CODE> (compare two operands for equality),
<CODE>neq</CODE> (not equal), <CODE>less</CODE>, <CODE>gtr</CODE> (greater than), <CODE>leq</CODE> (less
than or equal), and <CODE>geq</CODE> (greater than or equal).  The result of each
of the six comparison operators is <CODE>0</CODE> for false or <CODE>1</CODE> for true.
The machine also has four unary arithmetic instructions: <CODE>lnot</CODE>
(logical not), <CODE>sint</CODE> (truncate to integer), <CODE>sround</CODE> (round to
integer), and <CODE>srandom</CODE>.  Each of these 18 arithmetic instructions takes
its operands from registers and puts its result into a register.

<P>All but the last three of these are typical instructions of real
computers.<SUP>*</SUP>
(Not every computer has all of them; for example, if a computer has <CODE>eql</CODE>
and <CODE>lnot</CODE>, then it doesn't really need a <CODE>neq</CODE> instruction because
the same value can be computed by a sequence of two instructions.)  The
operations <CODE>sint</CODE>, <CODE>sround</CODE>, and <CODE>srandom</CODE> are less likely to be
machine instructions on actual computers.  On the other hand, most real
computers have a <EM>system call</EM> mechanism, which is a machine
instruction that switches the computer from the user's program to a part of
the operating system that performs some task on behalf of the user.  System
calls are used mainly for input and output, but we can pretend that there are
system calls to compute these Pascal library functions.  (The letter <CODE>s</CODE>
in the instruction names stands for &quot;system call&quot; to remind us.)

<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>One important simplification is that in the simulated
computer, the same instructions are used for all kinds of numbers.  A typical
computer has three <CODE>add</CODE> instructions: one for integers, one for short
reals (32 bits), and one for long reals (64 bits).</SMALL></BLOCKQUOTE></SMALL><P>The simulated computer also has another set of 18 <EM>immediate</EM>
instructions, with the letter <CODE>i</CODE> added to the instruction name: <CODE>
addi</CODE>, <CODE>subi</CODE>, and so on.  In these instructions, the rightmost operand
in the instruction is the actual value desired, rather than the number of
a register containing the operand.  For example,

<P><PRE>add 10 8 9
</PRE>

<P>means, &quot;add the number in register 8 and the number in register 9,
putting the result into register 10.&quot;  But

<P><PRE>addi 10 8 9
</PRE>

<P>means, &quot;add the number in register 8 to the value 9, putting
the result in register 10.&quot;

<P>It's only the right operand that can be made immediate.  So, for example,
the Pascal assignment

<P><PRE>y := x - 5
</PRE>

<P>can be translated into

<P><PRE>rload 8 x
subi 8 8 5
store 8 y
</PRE>

<P>but the Pascal assignment

<P><PRE>y := 5 - x
</PRE>

<P>must be translated as

<P><PRE>addi 8 0 5
rload 9 x
sub 8 8 9
store 8 y
</PRE>

<P>This example illustrates one situation in which it's useful to
have register 0 guaranteed to contain the value 0.

<P>Our simulated machine has six more system call instructions having to do
with printing results.  One of them, <CODE>newline</CODE>, uses no operands and
simply prints a newline character, moving to the beginning of a new line
on the screen.  Four more are for printing the value in a register; the
instruction used depends on the data type of the value in the register.
The instructions are <CODE>putch</CODE> for a character, <CODE>puttf</CODE> for a boolean
(true or false) value, <CODE>putint</CODE> for an integer, and <CODE>putreal</CODE> for a
real number.  Each takes two operands; the first, an immediate value, gives
the minimum width in which to print the value, and the second is a register
number.  So the instruction

<P><PRE>putint 10 8
</PRE>

<P>means, &quot;print the integer value in register 8, using at least
10 character positions on the line.&quot;  The sixth printing instruction,
<CODE>putstr</CODE>, is used only for constant character strings in the Pascal
program; its first operand is a width, as for the others, but its second
is a Logo list containing the string to print:

<P><PRE>putstr 1 [The shuffled deck:]
</PRE>

<P>This is, of course, unrealistic; in a real computer the second
operand would have to be the memory address of the beginning of the
array of characters to print.  But the way I handle printing isn't very
realistic in any case; I wanted to do the simplest possible thing, because
worrying about printing really doesn't add anything to your understanding
of the process of compilation, which is the point of this chapter.

<P>The next group of instructions has to do with the flow of control in the
computer program.  Ordinarily the computer carries out its instructions
in sequence, that is, in the order in which they appear in the program.
But in order to implement conditionals (such as <CODE>if</CODE>), loops (such as
<CODE>while</CODE>), and procedure calls, we must be able to jump out of sequence.
The <CODE>jump</CODE> instruction takes a single operand, a <EM>label</EM>
that appears somewhere in the program.  When the computer carries out a jump
instruction, it looks for the specified label and starts reading instructions
just after where that label appears in the program.  (We saw an example of
labels at the beginning of this chapter.)

<P>The <CODE>jump</CODE> instruction is used for unconditional jumps.  In order to
implement conditionals and loops, we need a way to jump if some condition
is true.  The instruction <CODE>jumpt</CODE> (jump if true) has two operands,
a register number and a label.  It jumps to the specified label if and only
if the given register contains a true value.  (Since registers hold only
numbers, we use the value 1 to represent true, and 0 to represent false.)
Similarly, <CODE>jumpf</CODE> jumps if the value in the given register is false.

<P>For procedure and function calls, we need a different mechanism.  The jump
is unconditional, but the computer must remember where it came from, so that
it can continue where it left off once the called procedure or function
returns.  The instruction <CODE>jal</CODE> (jump and link) takes two operands, a
register and a label.  It puts into the register the address of the
instruction following the <CODE>jal</CODE> instruction.<SUP>*</SUP> Then it jumps to the specified label.
To return from the called procedure, we use the <CODE>jr</CODE> (jump register)
instruction.  It has one operand, a register number; it jumps to the
instruction whose address is in the register.

<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>In a real computer,
each instruction is stored in a particular memory location, so the address
of an instruction is the address of the memory location in which it's stored.
In this simulated computer, I keep the program in the form of a Logo list,
and so I cheat and put the sublist starting at the next instruction into
the register.  This isn't quite as much of a cheat as it may seem, though,
since you know from Chapter 3 that Logo represents a list with the memory
address of the first pair of the list.</SMALL></BLOCKQUOTE></SMALL><P>One final instruction that affects the flow of control is the <CODE>exit</CODE>
system call.  It requires no operands; it terminates the running of the
program.  In this simulated computer, it returns to a Logo prompt; in a
real computer, the operating system would start running another user program.

<P>The only remaining instructions are <CODE>rload</CODE> and <CODE>store</CODE>.  You already
know what these do, but I've been showing them in oversimplified form so far.
The second operand can't just be a variable name, because that variable might
not be in the same place in memory every time the procedure is called. Think,
for example, about a recursive procedure.  Several invocations may be in
progress at once, all of them carrying out the same compiled instructions,
but each referring to a separate set of local variables.  The solution to
this problem is that the compiler arranges to load into a register
the address of a block of memory containing all the local variables for a
given procedure call.  If the variable <CODE>c</CODE>, for example, is in the sixth
memory location of that block, an instruction to load or store that variable
must be able to say &quot;the memory location whose address is the contents of
register 4 (let's say) plus five.&quot; So each load and store instruction
contains an <EM>index register</EM> in parentheses following an
<EM>offset</EM> to be added to the contents of that register.  We'd say

<P><PRE>store 8 5(4)
</PRE>

<P>to store the contents of register 8 into the variable <CODE>c</CODE>,
provided that register 4 points to the correct procedure invocation's
local variables and that <CODE>c</CODE> is in the sixth position in the block.
(The first position in the block would have offset 0, and so on.)

<P><H2>Stack Frames</H2>

<P>The first step in invoking a procedure or function is to set aside, or
<EM>allocate,</EM> a block of memory locations for use by that invocation.
This block will include the procedure's local variables, its arguments, and
room to save the values of registers as needed.  The compiler's data
structures include, for each procedure, how much memory that procedure needs
when it's invoked.  That block of memory is called a <EM>frame.</EM>

<P>In most programming languages, including Pascal and Logo (but not, as it
turns out, Lisp), the frame allocated when a procedure invocation begins can
be released, or <EM>deallocated,</EM> when that invocation returns to its
caller.  In other words, the procedure's local variables no longer exist
once the invocation is finished.  In these languages, the frames for all the
active procedure invocations can be viewed as a <EM>stack,</EM> a data structure
to which new elements are added by a Push operation, and elements are removed
using a Pop operation that removes the most recently pushed element.  (In
this case, the elements are the frames.)  That is, suppose that procedure A
invokes B, which invokes C, which invokes D.  For each of these invocations
a new frame is pushed onto the stack.  Which procedure finishes first?  It
has to be D, the last one invoked.  When D returns, its frame can be popped
off the stack.  Procedure C returns next, and its frame is popped, and so on.
The phrase <EM>stack frame</EM> is used to refer to frames that
behave like elements of a stack.

<P>My Pascal compiler allocates memory starting at location 0 and working
upward.  At the beginning of the program, a <EM>global frame</EM> is allocated
to hold the program's global variables.  Register 3, the
<EM>global pointer,</EM> always contains the
address of the beginning of the global frame, so that every procedure can
easily make use of global variables.  (Since the global frame is the first
thing in memory, its address is always zero, so the value in register 3 is
always 0.  But in a more realistic implementation the program itself would
appear in memory before the global frame, so its address would be greater
than zero.)  

<P>At any point in the program, register 4, the <EM>frame pointer,</EM>
contains the address of the beginning of the current frame, that is, the
frame that was created for the current procedure invocation.  Register 2,
the <EM>stack pointer,</EM> contains the address of the first
currently unused location in memory.

<P>My compiler is a little unusual in that when a procedure is called, the
stack frame for the new invocation is allocated by the caller, not by the
called procedure.  This simplifies things because the procedure's arguments
can be stored in its own frame; if each procedure allocates its own frame,
then the caller must store argument values in its (the caller's) frame,
because the callee's frame doesn't exist yet.  So, in my compiler, the
first step in a procedure call is to set register 5, the <EM>new frame
pointer,</EM> to point to the first free memory location, and change the
stack pointer to allocate the needed space.  If <CODE>N</CODE> memory locations
are needed for the new frame, the calling procedure will contain the
following instructions:

<P><PRE>add 5 2 0
addi 2 2 N
</PRE>

<P>The first instruction copies the value from register 2 (the
first free memory location) into register 5; the second adds <CODE>N</CODE> to
register 2.  (I've left out a complication, which is that the old value
in register 5 must be saved somewhere before putting this new value
into it.  You can read the code generation instructions at the beginning
of <CODE>pproccall1</CODE>, in the program listing at the end of the chapter,
for all the details.)  The current frame pointer is also saved in
location 3 of the new frame:

<P><PRE>store 4 3(5)
</PRE>

<P>The compiler uses data abstraction to refer to these register
numbers and frame slots; for example, the procedure <CODE>reg.frameptr</CODE> takes
no arguments and always outputs 4, while <CODE>frame.prevframe</CODE> outputs 3.

<P>The next step is to put the argument values into the new frame.  During
this process, the calling procedure must use register 4 to refer to its own
variables, and register 5 to refer to the callee's variables.  The final
step, just before calling the procedure, is to make the
frame pointer (register 4) point to the new frame:

<P><PRE>add 4 5 0
</PRE>

<P>Once the caller has set up the new frame and saved the necessary registers,
it can call the desired procedure, putting the return address in register 1:

<P><PRE>jal 1 &quot;proclabel
</PRE>

<P>The first step in the called procedure is to save the return
address in location zero of its frame:

<P><PRE>store 1 0(4)
</PRE>

<P>The procedure then carries out the instructions in its body.  When it's
ready to return, it must load the saved return address back into register
1, then restore the old stack pointer and frame pointer to deallocate
its frame, and finally return to the caller:

<P><PRE>rload 1 0(4)
add 2 4 0
rload 4 3(2)
jr 1
</PRE>

<P>(Procedure <CODE>proc1</CODE> in the compiler generates these
instructions for each procedure.)

<P>One final complication about stack frames comes from Pascal's block
structure.  Suppose we have a program with internal procedures arranged
in this structure:

<P><CENTER><IMG SRC="blocks.gif" ALT="figure: blocks"></CENTER>

<P>Then suppose that the main program calls procedure A, which
calls B, which calls C, which calls itself recursively.  The current (inner)
invocation of C has access to its own variables, those of procedure A, and
the global variables, but not to procedure B's variables.  How does
procedure C know where procedure A's stack frame is located?  The answer is
that every frame, in addition to saving a pointer to the previous frame,
must include a pointer to the <EM>lexically enclosing</EM> frame.  The
calling procedure sets this up; it can do this because it knows its own
lexical depth and that of the called procedure.  For example, when procedure
B calls procedure C, C's lexically enclosing frame will be the same as B's
(namely, the frame for the invocation of A), because B and C are at the
same lexical depth.  (They are both declared inside A.)  But when procedure
A calls procedure B, which is declared within itself, A must store its own
frame pointer as B's lexically enclosing frame.  Here is a picture of what's
where in memory:

<P><CENTER><IMG SRC="memory.gif" ALT="figure: memory"></CENTER>

<P>If all these pointers between frames confuse you, it might help to keep in
mind that the two kinds of pointers have very different purposes.  The
pointer to the previous frame is used only when a procedure returns, to
help in putting everything back the way it was before the procedure was
called (in particular, restoring the old value of register 4).  The pointer
to the lexically enclosing frame is used while the procedure is running,
whenever the procedure makes reference to a variable that belongs to some
outer procedure (for example, a reference in procedure B or C to a variable
that belongs to procedure A).<SUP>*</SUP>

<P><SMALL><BLOCKQUOTE><SMALL><SUP>*</SUP>If procedures used the previous-frame
pointers to make variable references, we would be compiling a dynamically
scoped language!  In this example, because Pascal is lexically scoped,
procedure C can't refer to procedure B's variables, even though B called C.</SMALL></BLOCKQUOTE></SMALL><P><H2>Data Structures</H2>

<P>In this section I'll describe the main data structures used during
compilation (abstract data types for identifiers and for expressions)
and during the running of the program (registers and frames).

<P>The main body of information that the compiler must maintain is the list of
Pascal identifiers (variable, procedure, and function names).  Since Pascal
is lexically scoped, some attention is necessary to ensure that each compiled
Pascal procedure has access to precisely the variables that it should.
At any point during the compilation, the value of <CODE>:idlist</CODE> is a list of
just those identifiers that may be used in the part of the program being
compiled.  We'll see in a moment how that's accomplished.

<P>There are two main categories of identifier: procedure names (including the
main program and functions in this category) and variable names.  The
information maintained for a procedure name looks like this example:

<P><PRE>[myproc procedure %myproc [2 46]]
</PRE>

<P>The first member of this list is the Pascal name of the program,
procedure, or function.  The second member is the type indicator, which
will be one of the words <CODE>program</CODE>, <CODE>procedure</CODE>, or <CODE>function</CODE>.
The third member is the procedure's &quot;Logo name,&quot; the unique name used within
the compiler to represent this program or procedure.  The program's Logo name
is used as the variable name whose value will be the compiled program; the
Logo names for procedures and functions are used as the labels in the
compiled program at which each procedure or function begins.  The fourth
member of the list contains the frame information for the procedure; it's
a list of two numbers, the lexical depth and the frame size.  The lexical
depth is 0 for the main program, 1 for a procedure declared inside the
main program, 2 for a procedure declared inside a depth-1 procedure, and
so on.  The frame size indicates how many memory locations must be allocated
for each invocation of the procedure.  (For the main program, the frame size
indicates the size of the global frame.)

<P>Because of the Pascal scope rules, there can be two procedures with the same
name, each declared within a different region of the program.  But there is
no scoping of labels in the compiled program; each label must be unique.
The simplest solution would be to use a distinct program-generated name for
every Pascal procedure; the Pascal <CODE>doit</CODE> would become the
Logo <CODE>g14</CODE>.  In fact I chose to modify this approach somewhat.  When an
identifier <CODE>symbol</CODE> is declared in the source program, the compiler
looks to see whether another identifier with the same name has appeared
anywhere in the program.  If not, the Logo name <CODE>%symbol</CODE> is used; if
so, a generated symbol is used.  This rule makes the compiled
program a little easier to read, while preserving the rule that all Logo
names must be unique.  The percent sign in <CODE>%symbol</CODE> ensures that this
Logo name doesn't conflict with any names used in the
compiler itself.  Procedure <CODE>newlname</CODE> in the compiler takes a Pascal
identifier as input and generates a new Logo name to correspond.

<P>The selectors <CODE>id.type</CODE>, <CODE>id.lname</CODE>, and <CODE>id.frame</CODE> are used
for the second through fourth members of these lists.  There's no selector
for the first member, the Pascal name, because the compiler never
extracts this information explicitly.  Instead, the Pascal name is used
by procedure <CODE>getid</CODE>, which takes a Pascal name as its input and
returns the corresponding identifier list.

<P>For variable names, the identifier information looks a little different:

<P><PRE>[i integer [1 41] false]
</PRE>

<P>The first two members of this list are the Pascal name and the
type, the same as for a procedure.  The third member is the <EM>pointer</EM>
information for the variable: its lexical depth and the offset within
a frame where it should be kept.  The compiler will use this information
to issue instructions to load or store the value of the variable.  The
fourth member of the list is <CODE>true</CODE> if this variable is a <CODE>var</CODE>
(call by reference) parameter, <CODE>false</CODE> otherwise.

<P>The variable <CODE>i</CODE> above has a scalar type, so its type indicator is
a word.  Had it been an array, the type indicator would be a list such as

<P><PRE>[integer [0 6] [5 3]]
</PRE>

<P>for a variable declared as <CODE>array [0..5, 5..7] of integer</CODE>.

<P>For each dimension of the array, the first number in the list
is the smallest possible index, while the second number is the number of
possible index values in this dimension.  That is, the range <CODE>[3..7]</CODE>
is represented by the list <CODE>[3 5]</CODE> because there are five possible
values starting from 3.  Notice that there is no &quot;Logo name&quot; for a
variable; in the compiled program, a variable is represented as an
offset and an index register, such as <CODE>41(4)</CODE>.

<P>For variables, the selectors used are <CODE>id.type</CODE>, <CODE>id.pointer</CODE>,
and <CODE>id.varp</CODE>.

<P>The information about currently accessible identifiers is kept in the list
<CODE>idlist</CODE>.  This variable holds a list of lists; each Pascal identifier
is represented by a list as indicated above.  <CODE>Idlist</CODE> is a local
variable in the compiler procedures <CODE>program</CODE>, <CODE>procedure</CODE>, and <CODE>
function</CODE>.  That is, there is a separate version for each block of the
Pascal source program.  Each local version starts out with the same value as
the higher-level version; identifiers declared within a block are added to
the local version but not to the outer one.  When the compiler finishes a
block, the (Logo) procedure in charge of that block stops and the outer <CODE>
idlist</CODE> becomes current again.

<P>This arrangement may or may not seem strange to you.  Recall that we had to
invent this <CODE>idlist</CODE> mechanism because Pascal's
lexical scope is different from Logo's dynamic scope.  The reason we have
these different versions of <CODE>idlist</CODE> is to keep track of which
identifiers are lexically available to which blocks.  And yet we are using
Logo's dynamic scope to determine which <CODE>idlist</CODE> is available at any
point in the compilation.  The reason this works is that <EM>the
dynamic environment at compile time reflects the
lexical environment at run time.</EM> For example, in the <CODE>tower</CODE>
program, the fact that <CODE>tower</CODE> <EM>contains</EM> <CODE>hanoi</CODE>, which in
turn contains <CODE>movedisk</CODE>, is reflected in the fact that <CODE>program</CODE>
(compiling <CODE>tower</CODE>) <EM>invokes</EM> <CODE>procedure</CODE> (compiling <CODE>
hanoi</CODE>), which in turn <EM>invokes</EM> <CODE>procedure</CODE> recursively
(compiling <CODE>movedisk</CODE>).  Earlier I said that lexical scope is easier for
a compiler than dynamic scope; this paragraph may help you see why that's
true.  Even dynamically scoped Logo naturally falls into providing lexical
scope for a Pascal compiler.

<P>Here is how procedure and function declarations are compiled:

<P><PRE>to procedure
proc1 &quot;procedure framesize.proc
end

to function
proc1 &quot;function framesize.fun
end

to proc1 :proctype :framesize
localmake &quot;procname token
localmake &quot;lexical.depth :lexical.depth+1
localmake &quot;frame (list :lexical.depth 0)
push &quot;idlist (list :procname :proctype (newlname :procname) :frame)
localmake &quot;idlist :idlist
 ...
end
</PRE>

<P>(I'm leaving out the code generation part for now.)  What I want
to be sure you understand is that the <CODE>push</CODE> instruction adds the new
procedure name to the <EM>outer</EM> <CODE>idlist</CODE>; after that, it creates a
new <CODE>idlist</CODE> whose initial value is the same as the old one.  It's very
important that the instruction

<P><PRE>localmake &quot;idlist :idlist
</PRE>

<P>comes where it does and not at the beginning of the procedure.
<CODE>Proc1</CODE> needs access to the outer <CODE>idlist</CODE> when it starts, and
then later it &quot;shadows&quot; that variable with its own local version.  This
example shows that Logo's <CODE>local</CODE> command really is an executable
command and not a declaration like Pascal's <CODE>var</CODE> declaration.  In
Pascal it would be unthinkable to declare a new local variable in the middle
of a block.

<P><CODE>Getid</CODE> depends on Logo's dynamic scope to give it access to the right
version of <CODE>idlist</CODE>.  Think about writing a Pascal compiler in Pascal.
There would be a large block for <CODE>program</CODE> with many other procedures
inside it.  Two of those inner procedures would be the ones for <CODE>
procedure</CODE> and <CODE>function</CODE>.  (Of course they couldn't have those names,
because they're Pascal reserved words.  They'd be called <CODE>
compileprocedure</CODE> or some such thing.  But I think this will be easier to
follow if I stick with the names used in the Logo version of the compiler.)
Those two procedures should be at the same level of block structure; neither
should be lexically within the other.  That's because a Pascal procedure
block can include a function definition or vice versa.  Now, where in the
lexical structure does <CODE>getid</CODE> belong?  It needs access to the local
<CODE>idlist</CODE> of either <CODE>procedure</CODE> or <CODE>function</CODE>, whichever is
currently active.  Similarly, things like <CODE>statement</CODE> need to be
lexically within both <CODE>procedure</CODE> and <CODE>function</CODE>, and actually also
within <CODE>program</CODE> because the outermost program block has statements too.
It would theoretically be possible to solve the problem by writing three
identical versions of each of these subprocedures, but that solution is too
horrible to contemplate.  Instead a more common technique is to have only
one <CODE>idlist</CODE> variable, a global one, and write the compiler so that it
explicitly maintains a stack of old values of that variable.  The Pascal
programmer has to do the work that the programming language should be doing
automatically.  This is an example in which dynamic scope, while not
absolutely essential, makes the program much easier to write and more
straightforward to understand.

<P>For every procedure or function in the Pascal source program, the compiler
creates a global Logo variable with the same name as the corresponding
label--that is, either a percent-prefix name or a generated symbol.  The
value of this variable is a list of types, one for each argument to the
procedure or function.  (For a function, the first member of the list is the
type of the function itself; the butfirst is the list of types of its
arguments.)  The compiler examines this &quot;type signature&quot; variable when a
procedure or function is invoked, to make sure that the types of the actual
arguments match the types of the formal parameters.

<P>The other important compile-time data structure is the one that represents
a compiled expression.  When the compiler calls <CODE>pexpr</CODE>, its job is to
parse an expression from the Pascal source program and generate code to
compute (when the compiled program runs!) the value of the expression.

The generated code leaves the computed value in some register.  What <CODE>
pexpr</CODE> returns to its caller is a data structure indicating which register
and what type the expression has, like this:

<P>

<P><PRE>[real register 8]
</PRE>

<P>The first member of this list is the type of the expression.
Most of the time, the second member is the word <CODE>register</CODE> and the
third member is the register number in which the expression's value
can be found.  The only exception is for a constant expression; if
the expression is, for example, <CODE>15</CODE> then <CODE>pexpr</CODE> will output

<P><PRE>[integer immediate 15]
</PRE>

<P>For the most part, these immediate expressions are useful
only within recursive calls to <CODE>pexpr</CODE>.  In compiling the Pascal
assignment

<P><PRE>x := 15
</PRE>

<P>we're going to have to get the value 15 into a register anyway
in order to be able to store it into <CODE>x</CODE>; the generated code will be
something like

<P><PRE>addi 7 0 15
store 7 48(4)
</PRE>

<P>An immediate expression is most useful in compiling something like

<P><PRE>x := a+15
</PRE>

<P>in which we can avoid loading the value 15 into a register, but
can directly add it to the register containing <CODE>a</CODE>:

<P><PRE>rload 7 53(4)
addi 7 7 15
store 7 48(4)
</PRE>

<P>The members of an expression list are examined using the selectors <CODE>
exp.type</CODE>, <CODE>exp.mode</CODE> (the word <CODE>register</CODE> or <CODE>immediate</CODE>),
and <CODE>exp.value</CODE> (the register number or immediate value).

<P>In this compiler an &quot;expression&quot; is always a
<EM>scalar</EM> type; although the formal definition of Pascal allows for
array expressions, there are no operations that act on arrays the way
operations like <CODE>+</CODE> act on scalars, and so an array expression can only
be the name of an array variable.  (<EM>Members</EM> of arrays can, of
course, be part of a scalar expression.)  <CODE>Passign</CODE>, the compiler
procedure that handles assignment statements, first checks for the special
case of an array assignment and then, only if the left side of the
assignment is a scalar, invokes <CODE>pexpr</CODE> to parse a scalar expression.

<P>In order to understand the code generated by the compiler, you should also
know about the <EM>runtime</EM> data structures used by compiled programs.
First, certain registers are reserved for special purposes:

<P><TABLE>
<TR><TH align="right">number<TH>&nbsp;&nbsp;&nbsp;&nbsp;<TH align="left">name<TH>&nbsp;&nbsp;&nbsp;&nbsp;<TH align="left">purpose
<TR><TD>&nbsp;
<TR><TD align="right">0<TD><TD><CODE>reg.zero</CODE><TD><TD>always contains zero
<TR><TD align="right">1<TD><TD><CODE>reg.retaddr</CODE><TD><TD>return address from procedure call
<TR><TD align="right">2<TD><TD><CODE>reg.stackptr</CODE><TD><TD>first free memory address
<TR><TD align="right">3<TD><TD><CODE>reg.globalptr</CODE><TD><TD>address of global frame
<TR><TD align="right">4<TD><TD><CODE>reg.frameptr</CODE><TD><TD>address of current frame
<TR><TD align="right">5<TD><TD><CODE>reg.newfp</CODE><TD><TD>address of frame being made for procedure call
<TR><TD align="right">6<TD><TD><CODE>reg.retval</CODE><TD><TD>return value from function
<TR><TD align="right">7<TD><TD><CODE>reg.firstfree</CODE><TD><TD>first register available for expressions
</TABLE>

<P>We've already seen most of these while discussing stack frames.
A Pascal function returns its result in register 6; the caller immediately
copies the return value into some other register so that it won't be
lost if the program calls another function, for a case like

<P><PRE>x := f(3)+f(4)
</PRE>

<P>Whenever a register is needed to hold some computed value, the compiler
calls the Logo procedure <CODE>newregister</CODE>, which finds the first register
number starting from 7 that isn't currently in use.  When the value in
a register is no longer needed, the compiler calls <CODE>regfree</CODE> to indicate
that that register can be reassigned by <CODE>newregister</CODE>.

<P>The other noteworthy runtime data structure is the use of slots within each
frame for special purposes:

<P><TABLE>
<TR><TH align="right">number<TH>&nbsp;&nbsp;&nbsp;&nbsp;<TH align="left">name<TH>&nbsp;&nbsp;&nbsp;&nbsp;<TH align="left">purpose
<TR><TD>&nbsp;
<TR><TD align="right">0<TD><TD><CODE>frame.retaddr</CODE><TD><TD>address from which this procedure was called
<TR><TD align="right">1<TD><TD><CODE>frame.save.newfp</CODE><TD><TD>saved register 3 while filling this new frame
<TR><TD align="right">2<TD><TD><CODE>frame.outerframe</CODE><TD><TD>the frame lexically enclosing this one
<TR><TD align="right">3<TD><TD><CODE>frame.prevframe</CODE><TD><TD>the frame from which this one was called
<TR><TD align="right">4-35<TD><TD><CODE>frame.regsave</CODE><TD><TD>space for saving registers
<TR><TD align="right">36<TD><TD><CODE>frame.retval</CODE><TD><TD>function return value
</TABLE>

<P>Why is there both a register and a frame slot for a function's return
value?  Remember that the way you indicate the return value in a Pascal
function is by assigning to the function's name as if it were a variable.
Such an assignment is not necessarily the last instruction in the function;
it may do more work after computing the return value.  The compiler notices
as assignment to the function name and generates code to save the computed
value in slot 36 of the current frame.  Then, when the function actually
returns, the compiler generates the instruction

<P><PRE>rload 6 36(4)
</PRE>

<P>to copy the return value into register 6.  The function's frame
is about to be freed, so the caller can't look there
for the return value; that's why a register is used.

<P>Each frame includes a block of space for saving registers when another
procedure is called.  That's because each procedure allocates register
numbers independently; each starts with register 7 as the first free one.
So if the registers weren't saved before a procedure call and restored
after the call, the values in the registers would be lost.  (Although
the frame has enough room to save all 32 registers, to make things simple,
not all 32 are actually saved.  The compiler knows which registers contain
active expression values at the moment of the procedure call, and it
generates code to save and restore only the necessary ones.)

<P>You might think it would be easier to have each procedure use a separate
set of registers, so saving wouldn't be necessary.  But this
doesn't work for two reasons.  First, there are only a few registers, and
in a large program we'd run out.  Even more important, the compiled
code for a <EM>recursive</EM> procedure is going to use the same registers
in each invocation, so we certainly can't avoid saving registers in that
situation.

<P><H2>Code Generation</H2>


<P>Let's look again at how the compiler handles a Pascal <CODE>if</CODE> statement:

<P><PRE>to pif
local [cond elsetag endtag]
make &quot;cond pboolean pexpr
make &quot;elsetag gensym
make &quot;endtag gensym
mustbe &quot;then
code (list &quot;jumpf :cond (word &quot;&quot; :elsetag))
regfree :cond
statement
code (list &quot;jump (word &quot;&quot; :endtag))
code :elsetag
ifbe &quot;else [statement]
code :endtag
end
</PRE>

<P>I showed you this procedure while talking about parsing, asking you to
ignore the parts about code generation.  Now we'll come back to that part of
the process.

<P>The format of the <CODE>if</CODE> statement is either of these:

<P><PRE>if <EM>condition</EM> then <EM>statement</EM>
if <EM>condition</EM> then <EM>statement</EM> else <EM>statement</EM>
</PRE>

<P>(There is probably a semicolon after the statement, but it's
not officially part of the <CODE>if</CODE>; it's part of the compound statement
that contains the <CODE>if</CODE>.)  When we get to <CODE>pif</CODE>, the compiler has
already read the token <CODE>if</CODE>; the next thing to read is an expression,
which must be of type <CODE>boolean</CODE>, providing the condition part of the
statement.

<P>In the instruction

<P><PRE>make &quot;cond pboolean pexpr
</PRE>

<P>the call to <CODE>pexpr</CODE> generates code for the expression and
returns an expression list, in the format shown earlier.  The procedure <CODE>
pboolean</CODE> does three things:  First, it checks the mode of the expression;
if it's immediate, the value is loaded into a register.  Second, it checks
the type of the expression to ensure that it really is boolean.  Third,
<CODE>pboolean</CODE> returns just the register number, which will be used in code
generated by <CODE>pif</CODE>.

<P><PRE>to pboolean :expr [:pval noimmediate :expr]
if equalp exp.type :pval &quot;boolean [output exp.value :pval]
(throw &quot;error sentence exp.type :pval [not true or false])
end

to noimmediate :value
if equalp exp.mode :value &quot;immediate ~
   [localmake &quot;reg newregister
    code (list &quot;addi :reg reg.zero exp.value :value)
    output (list exp.type :value &quot;register :reg)]
output :value
end
</PRE>

<P>Overall, the code compiled for the <CODE>if</CODE> statement will look like this:

<P><PRE>    ... get condition into register <EM>cond</EM> ...
    jumpf <EM>cond</EM> &quot;g5
    ... code for <CODE>then</CODE> statement ...
    jump &quot;g6
g5
    ... code for <CODE>else</CODE> statement ...
g6
</PRE>

<P>The labels <CODE>g5</CODE> and <CODE>g6</CODE> in this example are generated
symbols; they'll be different each time.  The labels are generated by the
instructions

<P><PRE>make &quot;elsetag gensym
make &quot;endtag gensym
</PRE>

<P>in <CODE>pif</CODE>.  After we call <CODE>pexpr</CODE> to generate the code
for the conditional expression, we explicitly generate the <CODE>jumpf</CODE>
instruction:

<P><PRE>code (list &quot;jumpf :cond (word &quot;&quot; :elsetag))
regfree :cond
</PRE>

<P>Notice that once we've generated the <CODE>jumpf</CODE> instruction, we
no longer need the value in register <CODE>:cond</CODE>, and we call <CODE>regfree</CODE>
to say so.  The rest of this code generation process should be easy to work
out.  All of the structured statements (<CODE>for</CODE>, <CODE>while</CODE>, and <CODE>
repeat</CODE>) are similarly simple.

<P>The code generation for expressions is all in <CODE>ppopop</CODE>.  Most of the
complexity of dealing with expressions is in the parsing, not in the code
generation; by the time we get to <CODE>ppopop</CODE>, we know that we want to
carry out a single operation on two values, both of which are either in
registers or immediate values.  The simple case is that both are in
registers; suppose, for example, that we are given the subtraction operation
and the two operands are in registers 8 and 9.  Then we just generate the
instruction

<P><PRE>sub 8 8 9
</PRE>

<P>and declare register 9 free.  <CODE>Ppopop</CODE> is a little long,
because it has to check for special cases such as immediate operands.
Also, a unary minus is turned into a subtraction from register zero,
since there is no unary <CODE>minus</CODE> operation in our simulated machine.

<P>Ironically, it's the &quot;simple&quot; statements that are hardest to
compile: assignment and procedure calling.  For procedure (or function)
calling, the difficulty is in matching actual argument expressions with
formal parameters.  Procedure <CODE>pproccall1</CODE> generates the instructions to
manipulate frame pointers, as described earlier, and procedure <CODE>
procargs</CODE> fills the newly-created frame with the actual argument values.
(If an argument is an array passed by value, each member of the array must
be copied into the new frame.)  Assignment, handled by procedure <CODE>
passign</CODE> in the compiler, is similar to argument passing; a value must be
computed and then stored into a frame.  I wouldn't be too upset if you
decide to stop here and take code generation for memory references on faith.

<P>Suppose we are compiling the assignment

<P><PRE>x := <EM>expression</EM>
</PRE>

<P><CODE>Passign</CODE> reads the name <CODE>x</CODE> and uses <CODE>getid</CODE> to find
the information associated with that name.  If the assignment is to an array
member, then <CODE>passign</CODE> must also read the array indices, but let's say
that we are assigning to a scalar variable, to keep it simple.

<P><PRE>to passign
local [name id type index value pointer target]
<U>make &quot;name token</U>
make &quot;index []
ifbe &quot;|[| [make &quot;index commalist [pexpr] mustbe &quot;|]|]
mustbe &quot;|:=|
<U>make &quot;id getid :name</U>
<U>make &quot;pointer id.pointer :id</U>
<U>make &quot;type id.type :id</U>
passign1
end
</PRE>

<P>Procedure <CODE>passign1</CODE> contains the steps that are in common between
ordinary assignment (handled by <CODE>passign</CODE>) and assignment to the name of
the current function, to set the return value (handled by <CODE>pfunset</CODE>,
which you can read in the complete listing at the end of the chapter).

<P><PRE>to passign1
if and (listp :type) (emptyp :index) [parrayassign :id  stop]
setindex &quot;false
<U>make &quot;value check.type :type pexpr</U>
<U>codestore :value (id.pointer :id) (id.varp :id) :index</U>
regfree :value
end
</PRE>

<P>We call <CODE>pexpr</CODE> to generate the code to compute the
expression. <CODE>Check.type</CODE> is like <CODE>pboolean</CODE>, which you saw earlier,
except that it takes the desired type as an argument.  It returns the
number of the register that contains the expression value.

<P>The real work is done by <CODE>codestore</CODE>, which takes four inputs.
The first is the register number whose value should be stored; the other
three inputs indicate where in memory the value should go.  First comes the
pointer from the identifier list; this, you'll recall, tells us the lexical
depth at which the variable was declared and the offset within its frame
where the variable is kept.  Next is a true or false value indicating
whether or not this variable is a <CODE>var</CODE> parameter; if so, then its value
is a pointer to the variable whose value we <EM>really</EM> want to change.
Finally, the <EM>index</EM> input will be zero for a scalar variable, or the
number of a register containing the
array index for an array member.  (Procedure <CODE>lindex</CODE>, whose name stands
for &quot;linear index,&quot; has been called to generate code to convert the possible
multi-dimensional indices, with possibly varying starting values, into a
single number indicating the position within the array, starting from zero
for the first member.)

<P><PRE>to codestore :reg :pointer :varflag :index
localmake &quot;target memsetup :pointer :varflag :index
code (list &quot;store :reg targetaddr)
regfree last :target
end
</PRE>

<P>(There is a similar procedure <CODE>codeload</CODE> used to generate
the code to load a variable's value into a register.)  <CODE>Codestore</CODE>
invokes a subprocedure <CODE>memsetup</CODE> whose job is to work out an
appropriate operand for an <CODE>rload</CODE> or <CODE>store</CODE> machine instruction.
That operand must be an offset and an index register, such as <CODE>41(4)</CODE>.
What <CODE>memsetup</CODE> returns is a list of the two numbers, in this case
<CODE>[41 4]</CODE>.  Procedure <CODE>targetaddr</CODE> turns that into the right notation
for use in the instruction.

<P><CODE>Memsetup</CODE> is the most complicated procedure in the compiler, because
there are so many special cases.  I'll describe the easy cases here.  Suppose
that we are dealing with a scalar variable that isn't a <CODE>var</CODE> parameter.
Then there are three cases.  If the lexical depth of that variable is equal
to the current lexical depth, then this variable is declared in the same
block that we're compiling.  In that case, we use register 4 (the current
frame pointer) as the index register, and the variable's frame slot as the
offset.  If the variable's lexical depth is zero, then it's a global
variable.  In that case, we use register 3 (the global frame pointer) as
the index register, and the variable's frame slot as the offset.  If the
variable's depth is something other than zero or the current depth, then
we have to find a pointer to the variable's own frame by looking in the
current frame's <CODE>frame.outerframe</CODE> slot, and perhaps in <EM>that</EM>
frame's <CODE>frame.outerframe</CODE> slot, as many times as the difference between
the current depth and the variable's depth.

<P>If the variable is a <CODE>var</CODE> parameter, then we go through the same cases
just described, and then load the value of that variable (which is a pointer
to the variable we really want) into a register.  We use that new register
as the index register, and zero as the offset.

<P>If the variable is an array member, then we must add the linear index
(which is already in a register) to the offset as computed so far.

<P>Perhaps an example will help sort this out.  Here is the compiled version
of the <CODE>tower</CODE> program, with annotations:

<P><PRE>[           [add 3 0 0]                   set up initial pointers
            [add 4 0 0]
            [addi 2 0 36]
            [jump &quot;g1]                    jump to main program

%hanoi      [store 1 0(4)]                save return value
            [jump &quot;g2]                    jump to body of hanoi

%movedisk   [store 1 0(4)]
            [jump &quot;g3]

g3          [putstr 1 [Move disk ]]       body of movedisk
            [rload 7 36(4)]
            [putint 1 7]                  write(number:1)
            [putstr 1 [ from ]]
            [rload 7 37(4)]
            [putch 1 7]                   write(from:1)
            [putstr 1 [ to ]]
            [rload 7 38(4)]
            [putch 1 7]                   write(to:1)
            [newline]
            [rload 1 0(4)]                reload return address
            [add 2 4 0]                   free stack frame
            [rload 4 3(2)]
            [jr 1]                        return to caller

g2          [rload 7 36(4)]               body of hanoi
            [neqi 7 7 0]                  if number &lt;&gt; 0
            [jumpf 7 &quot;g4]
            [store 5 1(2)]                allocate new frame
            [add 5 2 0]
            [addi 2 2 40]
            [store 4 3(5)]                set previous frame
            [rload 7 2(4)]
            [store 7 2(5)]                set enclosing frame
            [rload 7 36(4)]
            [subi 7 7 1]
            [store 7 36(5)]               first arg is number-1
            [rload 7 37(4)]
            [store 7 37(5)]               next arg is from
            [rload 7 39(4)]
            [store 7 38(5)]               next arg is other
            [rload 7 38(4)]
            [store 7 39(5)]               next arg is onto
            [add 4 5 0]                   switch to new frame
            [rload 5 1(4)]
            [jal 1 &quot;%hanoi]               recursive call

            [store 5 1(2)]                set up for movedisk
            [add 5 2 0]
            [addi 2 2 39]
            [store 4 3(5)]
            [store 4 2(5)]                note different enclosing frame
            [rload 7 36(4)]
            [store 7 36(5)]               copy args
            [rload 7 37(4)]
            [store 7 37(5)]
            [rload 7 38(4)]
            [store 7 38(5)]
            [add 4 5 0]
            [rload 5 1(4)]
            [jal 1 &quot;%movedisk]            call movedisk

            [store 5 1(2)]                second recursive call
            [add 5 2 0]
            [addi 2 2 40]
            [store 4 3(5)]
            [rload 7 2(4)]
            [store 7 2(5)]
            [rload 7 36(4)]
            [subi 7 7 1]
            [store 7 36(5)]
            [rload 7 39(4)]
            [store 7 37(5)]
            [rload 7 38(4)]
            [store 7 38(5)]
            [rload 7 37(4)]
            [store 7 39(5)]
            [add 4 5 0]
            [rload 5 1(4)]
            [jal 1 &quot;%hanoi]
            [jump &quot;g5]                    end of if...then

g4
g5          [rload 1 0(4)]                return to caller
            [add 2 4 0]
            [rload 4 3(2)]
            [jr 1]

g1          [store 5 1(2)]                body of main program
            [add 5 2 0]                   prepare to call hanoi
            [addi 2 2 40]
            [store 4 3(5)]
            [store 4 2(5)]
            [addi 7 0 5]                  constant argument 5
            [store 7 36(5)]
            [addi 7 0 97]                 ASCII code for 'a'
            [store 7 37(5)]
            [addi 7 0 98]                 ASCII code for 'b'
            [store 7 38(5)]
            [addi 7 0 99]                 ASCII code for 'c'
            [store 7 39(5)]
            [add 4 5 0]
            [rload 5 1(4)]
            [jal 1 &quot;%hanoi]               call hanoi
            [exit]
]
</PRE>

<P>
<TABLE width="100%"><TR><TD><A HREF="../v3-toc2.html">(back to Table of Contents)</A>
<TD align="right"><A HREF="https://people.eecs.berkeley.edu/~bh/v3ch4/v3ch4.html"><STRONG>BACK</STRONG></A>
chapter thread <A HREF="../v3ch6/v3ch6.html"><STRONG>NEXT</STRONG></A>
</TABLE>

<P><H2>Program Listing</H2>

<P><PRE>
to compile :file
if namep "peekchar [ern "peekchar]
if namep "peektoken [ern "peektoken]
if not namep "idlist [opsetup]
if not emptyp :file [openread :file]
setread :file
ignore error
catch "error [program]
localmake "error error
if not emptyp :error [print first butfirst :error]
setread []
if not emptyp :file [close :file]
end

;; Global setup

to opsetup
make "numregs 32
make "memsize 3000
pprop "|=| "binary [eql 2 [boolean []] 1]
pprop "|<>| "binary [neq 2 [boolean []] 1]
pprop "|<| "binary [less 2 [boolean []] 1]
pprop "|>| "binary [gtr 2 [boolean []] 1]
pprop "|<=| "binary [leq 2 [boolean []] 1]
pprop "|>=| "binary [geq 2 [boolean []] 1]
pprop "|+| "binary [add 2 [[] []] 2]
pprop "|-| "binary [sub 2 [[] []] 2]
pprop "or "binary [lor 2 [boolean boolean] 2]
pprop "|*| "binary [mul 2 [[] []] 3]
pprop "|/| "binary [quo 2 [real []] 3]
pprop "div "binary [div 2 [integer integer] 3]
pprop "mod "binary [rem 2 [integer integer] 3]
pprop "and "binary [land 2 [boolean boolean] 3]
pprop "|+| "unary [plus 1 [[] []] 4]
pprop "|-| "unary [minus 1 [[] []] 4]
pprop "not "unary [lnot 1 [boolean boolean] 4]
make "idlist `[[trunc function int [1 ,[framesize.fun+1]]]
               [round function round [1 ,[framesize.fun+1]]]
               [random function random [1 ,[framesize.fun+1]]]]
make "int [integer real]
make "round [integer real]
make "random [integer integer]
end

;; Block structure

to program
mustbe "program
localmake "progname token
ifbe "|(| [ignore commalist [id]  mustbe "|)|]
mustbe "|;|
localmake "lexical.depth 0
localmake "namesused []
localmake "needint "false
localmake "needround "false
localmake "needrandom "false
localmake "idlist :idlist
localmake "frame [0 0]
localmake "id (list :progname "program (newlname :progname) :frame)
push "idlist :id
localmake "codeinto word "% :progname
make :codeinto []
localmake "framesize framesize.proc
program1
mustbe ".
code [exit]
foreach [int round random] "plibrary
make :codeinto reverse thing :codeinto
end

to program1
localmake "regsused (array :numregs 0)
for [i reg.firstfree :numregs-1] [setitem :i :regsused "false]
ifbe "var [varpart]
.setfirst butfirst :frame :framesize
if :lexical.depth = 0 [code (list "add reg.globalptr reg.zero reg.zero)
                       code (list "add reg.frameptr reg.zero reg.zero)
                       code (list "addi reg.stackptr reg.zero :framesize)]
localmake "bodytag gensym
code (list "jump (word "" :bodytag))
tryprocpart
code :bodytag
mustbe "begin
blockbody "end
end

to plibrary :func
if not thing (word "need :func) [stop]
code :func
code (list "rload reg.firstfree (memaddr framesize.fun reg.frameptr))
code (list (word "s :func) reg.retval reg.firstfree)
code (list "add reg.stackptr reg.frameptr reg.zero)
code (list "rload reg.frameptr (memaddr frame.prevframe reg.stackptr))
code (list "jr reg.retaddr)
end

;; Variable declarations

to varpart
local [token namelist type]
make "token token
make "peektoken :token
if reservedp :token [stop]
vargroup
foreach :namelist [newvar ? :type]
mustbe "|;|
varpart
end

to vargroup
make "namelist commalist [id]
mustbe ":
ifbe "packed []
make "type token
ifelse equalp :type "array [make "type arraytype] [typecheck :type]
end

to id
local "token
make "token token
if letterp ascii first :token [output :token]
make "peektoken :token
output []
end

to arraytype
local [ranges type]
mustbe "|[|
make "ranges commalist [range]
mustbe "|]|
mustbe "of
make "type token
typecheck :type
output list :type :ranges
end

to range
local [first last]
make "first range1
mustbe "..
make "last range1
if :first > :last ~  
   [(throw "error (sentence [array bounds not increasing:]
                            :first ".. :last))]
output list :first (1 + :last - :first)
end

to range1
local "bound
make "bound token
if equalp first :bound "' [output ascii first butfirst :bound]
if equalp :bound "|-| [make "bound minus token]
if equalp :bound int :bound [output :bound]
(throw "error sentence [array bound not ordinal:] :bound)
end

to typecheck :type
if memberp :type [real integer char boolean] [stop]
(throw "error sentence [undefined type] :type)
end

to newvar :pname :type
if reservedp :pname [(throw "error sentence :pname [reserved word])]
push "idlist (list :pname :type (list :lexical.depth :framesize) "false)
make "framesize :framesize + ifelse listp :type [arraysize :type] [1]
end

to arraysize :type
output reduce "product map [last ?] last :type
end

;; Procedure and function declarations

to tryprocpart
ifbeelse "procedure ~
         [procedure tryprocpart] ~
         [ifbe "function [function tryprocpart]]
end

to procedure
proc1 "procedure framesize.proc
end

to function
proc1 "function framesize.fun
end

to proc1 :proctype :framesize
localmake "procname token
localmake "lexical.depth :lexical.depth+1
localmake "frame (list :lexical.depth 0)
push "idlist (list :procname :proctype (newlname :procname) :frame)
localmake "idlist :idlist
make lname :procname []
ifbe "|(| [arglist]
if equalp :proctype "function ~
   [mustbe ":
    localmake "type token
    typecheck :type
    make lname :procname fput :type thing lname :procname]
mustbe "|;|
code lname :procname
code (list "store reg.retaddr (memaddr frame.retaddr reg.frameptr))
program1
if equalp :proctype "function ~
   [code (list "rload reg.retval (memaddr frame.retval reg.frameptr))]
code (list "rload reg.retaddr (memaddr frame.retaddr reg.frameptr))
code (list "add reg.stackptr reg.frameptr reg.zero)
code (list "rload reg.frameptr (memaddr frame.prevframe reg.stackptr))
code (list "jr reg.retaddr)
mustbe "|;|
end

to arglist
local [token namelist type varflag]
make "varflag "false
ifbe "var [make "varflag "true]
vargroup
foreach :namelist [newarg ? :type :varflag]
ifbeelse "|;| [arglist] [mustbe "|)|]
end

to newarg :pname :type :varflag
if reservedp :pname [(throw "error sentence :pname [reserved word])]
localmake "pointer (list :lexical.depth :framesize)
push "idlist (list :pname :type :pointer :varflag)
make "framesize :framesize + ifelse (and listp :type not :varflag) ~
                                    [arraysize :type] [1]
queue lname :procname ifelse :varflag [list "var :type] [:type]
end

;; Statement part

to blockbody :endword
statement
ifbeelse "|;| [blockbody :endword] [mustbe :endword]
end

to statement
local [token type]
ifbe "begin [compound stop]
ifbe "for [pfor stop]
ifbe "if [pif stop]
ifbe "while [pwhile stop]
ifbe "repeat [prepeat stop]
ifbe "write [pwrite stop]
ifbe "writeln [pwriteln stop]
make "token token
make "peektoken :token
if memberp :token [|;| end until] [stop]
make "type gettype :token
if emptyp :type [(throw "error sentence :token [can't begin statement])]
if equalp :type "procedure [pproccall stop]
if equalp :type "function [pfunset stop]
passign
end

;; Compound statement

to compound
blockbody "end
end

;; Structured statements

to pfor
local [var init step final looptag endtag testreg]
make "var token
mustbe "|:=|
make "init pinteger pexpr
make "step 1
ifbeelse "downto [make "step -1] [mustbe "to]
make "final pinteger pexpr
mustbe "do
make "looptag gensym
make "endtag gensym
code :looptag
localmake "id getid :var
codestore :init (id.pointer :id) (id.varp :id) 0
make "testreg newregister
code (list (ifelse :step<0 ["less] ["gtr]) :testreg :init :final)
code (list "jumpt :testreg (word "" :endtag))
regfree :testreg
statement
code (list "addi :init :init :step)
code (list "jump (word "" :looptag))
code :endtag
regfree :init
regfree :final
end

to prepeat
local [cond looptag]
make "looptag gensym
code :looptag
blockbody "until
make "cond pboolean pexpr
code (list "jumpf :cond (word "" :looptag))
regfree :cond
end

to pif
local [cond elsetag endtag]
make "cond pboolean pexpr
make "elsetag gensym
make "endtag gensym
mustbe "then
code (list "jumpf :cond (word "" :elsetag))
regfree :cond
statement
code (list "jump (word "" :endtag))
code :elsetag
ifbe "else [statement]
code :endtag
end

to pwhile
local [cond looptag endtag]
make "looptag gensym
make "endtag gensym
code :looptag
make "cond pboolean pexpr
code (list "jumpf :cond (word "" :endtag))
regfree :cond
mustbe "do
statement
code (list "jump (word "" :looptag))
code :endtag
end

;; Simple statements: write and writeln

to pwrite
mustbe "|(|
pwrite1
end

to pwrite1
pwrite2
ifbe "|)| [stop]
ifbeelse ", [pwrite1] [(throw "error [missing comma])]
end

to pwrite2
localmake "result pwrite3
ifbe ": [.setfirst (butfirst :result) token]
code :result
if not equalp first :result "putstr [regfree last :result]
end

to pwrite3
localmake "token token
if equalp first :token "' ~
   [output (list "putstr 1 (list butlast butfirst :token))]
make "peektoken :token
localmake "result pexpr
if equalp first :result "char [output (list "putch 1 pchar :result)]
if equalp first :result "boolean [output (list "puttf 1 pboolean :result)]
if equalp first :result "integer [output (list "putint 10 pinteger :result)]
output (list "putreal 20 preal :result)
end

to pwriteln
ifbe "|(| [pwrite1]
code [newline]
end

;; Simple statements: procedure call

to pproccall
localmake "pname token
localmake "id getid :pname
localmake "lname id.lname :id
localmake "vartypes thing :lname
pproccall1 framesize.proc
end

to pproccall1 :offset
code (list "store reg.newfp (memaddr frame.save.newfp reg.stackptr))
code (list "add reg.newfp reg.stackptr reg.zero)
code (list "addi reg.stackptr reg.stackptr (last id.frame :id))
code (list "store reg.frameptr (memaddr frame.prevframe reg.newfp))
localmake "newdepth first id.frame :id
ifelse :newdepth > :lexical.depth ~
       [code (list "store reg.frameptr
                   (memaddr frame.outerframe reg.newfp))] ~
       [localmake "tempreg newregister
        code (list "rload :tempreg (memaddr frame.outerframe reg.frameptr))
        repeat (:lexical.depth - :newdepth)
               [code (list "rload :tempreg 
                           (memaddr frame.outerframe :tempreg))]
        code (list "store :tempreg (memaddr frame.outerframe reg.newfp))
        regfree :tempreg]
if not emptyp :vartypes [mustbe "|(|  procargs :vartypes :offset]
for [i reg.firstfree :numregs-1] ~
    [if item :i :regsused
        [code (list "store :i (memaddr frame.regsave+:i reg.frameptr))]]
code (list "add reg.frameptr reg.newfp reg.zero)
code (list "rload reg.newfp (memaddr frame.save.newfp reg.frameptr))
code (list "jal reg.retaddr (word "" :lname))
for [i reg.firstfree :numregs-1] ~
    [if item :i :regsused
        [code (list "rload :i (memaddr frame.regsave+:i reg.frameptr))]]
end

to procargs :types :offset
if emptyp :types [mustbe "|)|  stop]
localmake "next procarg first :types :offset
if not emptyp butfirst :types [mustbe ",]
procargs butfirst :types :offset+:next
end

to procarg :type :offset
local "result
if equalp first :type "var [output procvararg last :type]
if listp :type [output procarrayarg :type]
make "result check.type :type pexpr
code (list "store :result (memaddr :offset reg.newfp))
regfree :result
output 1
end

to procvararg :ftype
local [pname id type index]
make "pname token
make "id getid :pname
make "type id.type :id
ifelse wordp :ftype ~
       [setindex "true] ~
       [make "index 0]
if not equalp :type :ftype ~
   [(throw "error sentence :pname [arg wrong type])]
localmake "target memsetup (id.pointer :id) (id.varp :id) :index
localmake "tempreg newregister
code (list "addi :tempreg (last :target) (first :target))
code (list "store :tempreg (memaddr :offset reg.newfp))
regfree last :target
regfree :tempreg
output 1
end

to procarrayarg :type
localmake "pname token
localmake "id getid :pname
if not equalp :type (id.type :id) ~
   [(throw "error (sentence "array :pname [wrong type for arg]))]
localmake "size arraysize :type
localmake "rtarget memsetup (id.pointer :id) (id.varp :id) 0
localmake "pointreg newregister
code (list "addi :pointreg reg.newfp :offset)
localmake "ltarget (list 0 :pointreg)
copyarray
output :size
end

;; Simple statements: assignment statement (including function value)

to passign
local [name id type index value pointer target]
make "name token
make "index []
ifbe "|[| [make "index commalist [pexpr] mustbe "|]|]
mustbe "|:=|
make "id getid :name
make "pointer id.pointer :id
make "type id.type :id
passign1
end

to pfunset
local [name id type index value pointer target]
make "name token
make "index []
if not equalp :name :procname ~
   [(throw "error sentence [assign to wrong function] :name)]
mustbe "|:=|
make "pointer (list :lexical.depth frame.retval)
make "type first thing lname :name
make "id (list :name :type :pointer "false)
passign1
end

to passign1
if and (listp :type) (emptyp :index) [parrayassign :id  stop]
setindex "false
make "value check.type :type pexpr
codestore :value (id.pointer :id) (id.varp :id) :index
regfree :value
end

to noimmediate :value
if equalp exp.mode :value "immediate ~
   [localmake "reg newregister
    code (list "addi :reg reg.zero exp.value :value)
    output (list exp.type :value "register :reg)]
output :value
end

to check.type :type :result
if equalp :type "real [output preal :result]
if equalp :type "integer [output pinteger :result]
if equalp :type "char [output pchar :result]
if equalp :type "boolean [output pboolean :result]
end

to preal :expr [:pval noimmediate :expr]
if equalp exp.type :pval "real [output exp.value :pval]
output pinteger :pval
end

to pinteger :expr [:pval noimmediate :expr]
local "type
make "type exp.type :pval
if memberp :type [integer boolean char] [output exp.value :pval]
(throw "error sentence exp.type :pval [isn't ordinal])
end

to pchar :expr [:pval noimmediate :expr]
if equalp exp.type :pval "char [output exp.value :pval]
(throw "error sentence exp.type :pval [not character value])
end

to pboolean :expr [:pval noimmediate :expr]
if equalp exp.type :pval "boolean [output exp.value :pval]
(throw "error sentence exp.type :pval [not true or false])
end

to parrayassign :id
localmake "right token
if equalp first :right "' ~
   [pstringassign :type (butlast butfirst :right)  stop]
localmake "rid getid :right
if not equalp (id.type :id) (id.type :rid) ~
   [(throw "error (sentence "arrays :name "and :right [unequal types]))]
localmake "size arraysize id.type :id
localmake "ltarget memsetup (id.pointer :id) (id.varp :id) 0
localmake "rtarget memsetup (id.pointer :rid) (id.varp :rid) 0
copyarray
end

to pstringassign :type :string
if not equalp first :type "char [stringlose]
if not emptyp butfirst last :type [stringlose]
if not equalp (last first last :type) (count :string) [stringlose]
localmake "ltarget memsetup (id.pointer :id) (id.varp :id) 0
pstringassign1 newregister (first :ltarget) (last :ltarget) :string
regfree last :ltarget
end

to pstringassign1 :tempreg :offset :reg :string
if emptyp :string [regfree :tempreg  stop]
code (list "addi :tempreg reg.zero ascii first :string)
code (list "store :tempreg (memaddr :offset :reg))
pstringassign1 :tempreg :offset+1 :reg (butfirst :string)
end

to stringlose
(throw "error sentence :name [not string array or wrong size])
end

;; Multiple array indices to linear index computation

to setindex :parseflag
ifelse listp :type ~
       [if :parseflag
           [mustbe "|[|  make "index commalist [pexpr]  mustbe "|]| ]
        make "index lindex last :type :index
        make "type first :type] ~
       [make "index 0]
end

to lindex :bounds :index
output lindex1 (offset pinteger noimmediate first :index
                       first first :bounds) ~
               butfirst :bounds butfirst :index
end

to lindex1 :sofar :bounds :index
if emptyp :bounds [output :sofar]
output lindex1 (nextindex :sofar
                          last first :bounds
                          pinteger noimmediate first :index
                          first first :bounds) ~
               butfirst :bounds butfirst :index
end

to nextindex :old :factor :new :offset
code (list "muli :old :old :factor)
localmake "newreg offset :new :offset
code (list "add :old :old :newreg)
regfree :newreg
output :old
end

to offset :indexreg :lowbound
if not equalp :lowbound 0 [code (list "subi :indexreg :indexreg :lowbound)]
output :indexreg
end

;; Memory interface: load and store instructions

to codeload :reg :pointer :varflag :index
localmake "target memsetup :pointer :varflag :index
code (list "rload :reg targetaddr)
regfree last :target
end

to codestore :reg :pointer :varflag :index
localmake "target memsetup :pointer :varflag :index
code (list "store :reg targetaddr)
regfree last :target
end

to targetaddr
output memaddr (first :target) (last :target)
end

to memaddr :offset :index
output (word :offset "\( :index "\))
end

to memsetup :pointer :varflag :index
localmake "depth first :pointer
localmake "offset last :pointer
local "newreg
ifelse equalp :depth 0 ~
       [make "newreg reg.globalptr] ~
       [ifelse equalp :depth :lexical.depth
               [make "newreg reg.frameptr]
               [make "newreg newregister
                code (list "rload :newreg
                           (memaddr frame.outerframe reg.frameptr))
                repeat (:lexical.depth - :depth) - 1
                       [code (list "rload :newreg
                                   (memaddr frame.outerframe :newreg))]]]
if :varflag ~
   [ifelse :newreg = reg.frameptr
           [make "newreg newregister
            code (list "rload :newreg (memaddr :offset reg.frameptr))]
           [code (list "rload :newreg (memaddr :offset :newreg))]
    make "offset 0]
if not equalp :index 0 ~
   [code (list "add :index :index :newreg)
    regfree :newreg
    make "newreg :index]
output list :offset :newreg
end

to copyarray
localmake "looptag gensym
localmake "sizereg newregister
code (list "addi :sizereg reg.zero :size)
code :looptag
localmake "tempreg newregister
code (list "rload :tempreg (memaddr (first :rtarget) (last :rtarget)))
code (list "store :tempreg (memaddr (first :ltarget) (last :ltarget)))
code (list "addi (last :rtarget) (last :rtarget) 1)
code (list "addi (last :ltarget) (last :ltarget) 1)
code (list "subi :sizereg :sizereg 1)
code (list "gtr :tempreg :sizereg reg.zero)
code (list "jumpt :tempreg (word "" :looptag))
regfree :sizereg
regfree :tempreg
regfree last :ltarget
regfree last :rtarget
end

;; Expressions

to pexpr
local [opstack datastack parenlevel]
make "opstack [[popen 1 0]]
make "datastack []
make "parenlevel 0
output pexpr1
end

to pexpr1
local [token op]
make "token token
while [equalp :token "|(|] [popen  make "token token]
make "op pgetunary :token
if not emptyp :op [output pexprop :op]
push "datastack pdata :token
make "token token
while [and (:parenlevel > 0) (equalp :token "|)| )] ~
      [pclose  make "token token]
make "op pgetbinary :token
if not emptyp :op [output pexprop :op]
make "peektoken :token
pclose
if not emptyp :opstack [(throw "error [too many operators])]
if not emptyp butfirst :datastack [(throw "error [too many operands])]
output pop "datastack
end

to pexprop :op
while [(op.prec :op) < (1 + op.prec first :opstack)] [ppopop]
push "opstack :op
output pexpr1
end

to popen
push "opstack [popen 1 0]
make "parenlevel :parenlevel + 1
end

to pclose
while [(op.prec first :opstack) > 0] [ppopop]
ignore pop "opstack
make "parenlevel :parenlevel - 1
end

to pgetunary :token
output gprop :token "unary
end

to pgetbinary :token
output gprop :token "binary
end

to ppopop
local [op function args left right type reg]
make "op pop "opstack
make "function op.instr :op
if equalp :function "plus [stop]
make "args op.nargs :op
make "right pop "datastack
make "left (ifelse equalp :args 2 [pop "datastack] [[[] []]])
make "type pnewtype :op exp.type :left exp.type :right
if equalp exp.mode :left "immediate ~
   [localmake "leftreg newregister
    code (list "addi :leftreg reg.zero exp.value :left)
    make "left (list exp.type :left "register :leftreg)]
ifelse equalp exp.mode :left "register ~
       [make "reg exp.value :left] ~
       [ifelse equalp exp.mode :right "register
               [make "reg exp.value :right]
               [make "reg newregister]]
if equalp :function "minus ~
   [make "left (list exp.type :right "register reg.zero)
    make "function "sub
    make "args 2]
if equalp exp.mode :right "immediate ~
   [make "function word :function "i]
ifelse equalp :args 2 ~
       [code (list :function :reg exp.value :left exp.value :right)] ~
       [code (list :function :reg exp.value :right)]
if not equalp :reg exp.value :left [regfree exp.value :left]
if (and (equalp exp.mode :right "register)
        (not equalp :reg exp.value :right)) ~
   [regfree exp.value :right]
push "datastack (list :type "register :reg)
end

to pnewtype :op :ltype :rtype
local "type
make "type op.types :op
if emptyp :ltype [make "ltype :rtype]
if not emptyp last :type [pchecktype last :type :ltype :rtype]
if and (equalp :ltype "real) (equalp :rtype "integer) [make "rtype "real]
if and (equalp :ltype "integer) (equalp :rtype "real) [make "ltype "real]
if not equalp :ltype :rtype [(throw "error [type clash])]
if emptyp last :type ~
   [if not memberp :rtype [integer real]
       [(throw "error [nonarithmetic type])]]
if emptyp first :type [output :rtype]
output first :type
end

to pchecktype :want :left :right
if not equalp :want :left [(throw "error (sentence :left "isn't :want))]
if not equalp :want :right [(throw "error (sentence :right "isn't :want))]
end

;; Expression elements

to pdata :token
if equalp :token "true [output [boolean immediate 1]]
if equalp :token "false [output [boolean immediate 0]]
if equalp first :token "' [output pchardata :token]
if numberp :token [output (list numtype :token "immediate :token)]
localmake "id getid :token
if emptyp :id [(throw "error sentence [undefined symbol] :token)]
localmake "type id.type :id
if equalp :type "function [output pfuncall :token]
local "index
setindex "true
localmake "reg newregister
codeload :reg (id.pointer :id) (id.varp :id) :index
output (list :type "register :reg)
end

to pchardata :token
if not equalp count :token 3 ~
   [(throw "error sentence :token [not single character])]
output (list "char "immediate ascii first butfirst :token)
end

to numtype :number
if memberp ". :number [output "real]
if memberp "e :number [output "real]
output "integer
end

to pfuncall :pname
localmake "id getid :pname
localmake "lname id.lname :id
if namep (word "need :lname) [make (word "need :lname) "true]
localmake "vartypes thing :lname
localmake "returntype first :vartypes
make "vartypes butfirst :vartypes
pproccall1 framesize.fun
localmake "reg newregister
code (list "add :reg reg.retval reg.zero)
output (list :returntype "register :reg)
end

;; Parsing assistance

to code :stuff
if emptyp :stuff [stop]
push :codeinto :stuff
end

to commalist :test [:sofar []]
local [result token]
make "result run :test
if emptyp :result [output :sofar]
ifbe ", [output (commalist :test (lput :result :sofar))]
output lput :result :sofar
end

.macro ifbe :wanted :action
localmake "token token
if equalp :token :wanted [output :action]
make "peektoken :token
output []
end

.macro ifbeelse :wanted :action :else
localmake "token token
if equalp :token :wanted [output :action]
make "peektoken :token
output :else
end

to mustbe :wanted
localmake "token token
if equalp :token :wanted [stop]
(throw "error (sentence "expected :wanted "got :token))
end

to newregister
for [i reg.firstfree :numregs-1] ~
    [if not item :i :regsused [setitem :i :regsused "true  output :i]]
(throw "error [not enough registers available])
end

to regfree :reg
setitem :reg :regsused "false
end

to reservedp :word
output memberp :word [and array begin case const div do downto else end ~
                      file for forward function goto if in label mod nil ~
                      not of packed procedure program record repeat set ~
                      then to type until var while with]
end

;; Lexical analysis

to token
local [token char]
if namep "peektoken [make "token :peektoken
                     ern "peektoken   output :token]
make "char getchar
if equalp :char "|{| [skipcomment output token]
if equalp :char char 32 [output token]
if equalp :char char 13 [output token]
if equalp :char char 10 [output token]
if equalp :char "' [output string "']
if memberp :char [+ - * / = ( , ) |[| |]| |;|] [output :char]
if equalp :char "|<| [output twochar "|<| [= >]]
if equalp :char "|>| [output twochar "|>| [=]]
if equalp :char ". [output twochar ". [.]]
if equalp :char ": [output twochar ": [=]]
if numberp :char [output number :char]
if letterp ascii :char [output token1 lowercase :char]
(throw "error sentence [unrecognized character:] :char)
end

to skipcomment
if equalp getchar "|}| [stop]
skipcomment
end

to string :string
local "char
make "char getchar
if not equalp :char "' [output string word :string :char]
make "char getchar
if equalp :char "' [output string word :string :char]
make "peekchar :char
output word :string "'
end

to twochar :old :ok
localmake "char getchar
if memberp :char :ok [output word :old :char]
make "peekchar :char
output :old
end

to number :num
local "char
make "char getchar
if equalp :char ". ~
   [make "char getchar ~
    ifelse equalp :char ". ~
           [make "peektoken ".. output :num] ~
           [make "peekchar :char output number word :num ".]]
if equalp :char "e [output number word :num twochar "e [+ -]]
if numberp :char [output number word :num :char]
make "peekchar :char
output :num
end

to token1 :token
local "char
make "char getchar
if or letterp ascii :char numberp :char ~
   [output token1 word :token lowercase :char]
make "peekchar :char
output :token
end

to letterp :code
if and (:code > 64) (:code < 91) [output "true]
output and (:code > 96) (:code < 123)
end

to getchar
local "char
if namep "peekchar [make "char :peekchar  ern "peekchar  output :char]
if eofp [output char 1]
output rc1
end

to rc1
local "result
make "result readchar
type :result
output :result
end

;; Data abstraction: ID List

to newlname :word
if memberp :word :namesused [output gensym]
if namep word "% :word [output gensym]
push "namesused :word
output word "% :word
end

to lname :word
local "result
make "result getid :word
if not emptyp :result [output item 3 :result]
(throw "error sentence [unrecognized identifier] :word)
end

to gettype :word
local "result
make "result getid :word
if not emptyp :result [output item 2 :result]
(throw "error sentence [unrecognized identifier] :word)
end

to getid :word [:list :idlist]
if emptyp :list [output []]
if equalp :word first first :list [output first :list]
output (getid :word butfirst :list)
end

to id.type :id
output item 2 :id
end

to id.pointer :id
output item 3 :id
end

to id.lname :id
output item 3 :id
end

to id.varp :id
output item 4 :id
end

to id.frame :id
output item 4 :id
end

;; Data abstraction: Frame slots

to frame.retaddr
output 0
end

to frame.save.newfp
output 1
end

to frame.outerframe
output 2
end

to frame.prevframe
output 3
end

to frame.regsave
output 4
end

to framesize.proc
output 4+:numregs
end

to frame.retval
output 4+:numregs
end

to framesize.fun
output 5+:numregs
end

;; Data abstraction: Operators

to op.instr :op
output first :op
end

to op.nargs :op
output first bf :op
end

to op.types :op
output item 3 :op
end

to op.prec :op
output last :op
end

;; Data abstraction: Expressions

to exp.type :exp
output first :exp
end

to exp.mode :exp
output first butfirst :exp
end

to exp.value :exp
output last :exp
end

;; Data abstraction: Registers

to reg.zero
output 0
end

to reg.retaddr
output 1
end

to reg.stackptr
output 2
end

to reg.globalptr
output 3
end

to reg.frameptr
output 4
end

to reg.newfp
output 5
end

to reg.retval
output 6
end

to reg.firstfree
output 7
end

;; Runtime (machine simulation)

to prun :progname
localmake "prog thing word "% :progname
localmake "regs (array :numregs 0)
local filter "wordp :prog
foreach :prog [if wordp ? [make ? ?rest]]
localmake "memory (array :memsize 0)
setitem 0 :regs 0
if not procedurep "add [runsetup]
prun1 :prog
end

to prun1 :pc
if emptyp :pc [stop]
if listp first :pc [run first :pc]
prun1 butfirst :pc
end

to rload :reg :offset :index
setitem :reg :regs (item (item :index :regs)+:offset :memory)
end

to store :reg :offset :index
setitem (item :index :regs)+:offset :memory (item :reg :regs)
end

to runsetup
foreach [[add sum] [sub difference] [mul product] [quo quotient]
         [div [int quotient]] [rem remainder] [land product]
         [lor [tobool lessp 0 sum]] [eql [tobool equalp]]
         [neq [tobool not equalp]] [less [tobool lessp]]
         [gtr [tobool greaterp]] [leq [tobool not greaterp]]
         [geq [tobool not lessp]]] ~
        [define first ?
                `[[dest src1 src2]
                  [setitem :dest :regs ,@[last ?] (item :src1 :regs)
                                                  (item :src2 :regs)]]
         define word first ? "i
                `[[dest src1 immed]
                  [setitem :dest :regs ,@[last ?] (item :src1 :regs)
                                                  :immed]]]
foreach [[lnot [difference 1]] [sint int] [sround round] [srandom random]] ~
        [define first ?
                `[[dest src]
                  [setitem :dest :regs ,@[last ?] (item :src :regs)]]
         define word first ? "i
                `[[dest immed]
                  [setitem :dest :regs ,@[last ?] :immed]]]
end

to tobool :tf
output ifelse :tf [1] [0]
end

to jump :label
make "pc fput :label thing :label
end

to jumpt :reg :label
if (item :reg :regs)=1 [jump :label]
end

to jumpf :reg :label
if (item :reg :regs)=0 [jump :label]
end

to jr :reg
make "pc item :reg :regs
end

to jal :reg :label
setitem :reg :regs :pc
jump :label
end

to putch :width :reg
spaces :width 1
type char (item :reg :regs)
end

to putstr :width :string
spaces :width (count first :string)
type :string
end

to puttf :width :bool
spaces :width 1
type ifelse (item :bool :regs)=0 ["F] ["T]
end

to putint :width :reg
localmake "num (item :reg :regs)
spaces :width count :num
type :num
end

to putreal :width :reg
putint :width :reg
end

to spaces :width :count
if :width > :count [repeat :width - :count [type "| |]]
end

to newline
print []
end

to exit
make "pc [exit]
end
</PRE>

<P><A HREF="../v3-toc2.html">(back to Table of Contents)</A>
<P><A HREF="https://people.eecs.berkeley.edu/~bh/v3ch4/v3ch4.html"><STRONG>BACK</STRONG></A>
chapter thread <A HREF="../v3ch6/v3ch6.html"><STRONG>NEXT</STRONG></A>

<P>
<ADDRESS>
<A HREF="../index.html">Brian Harvey</A>, 
<CODE>bh@cs.berkeley.edu</CODE>
</ADDRESS>
</BODY>
</HTML>