1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
|
<HTML>
<HEAD>
<TITLE>Lisp Operating System</TITLE>
<LINK rel="stylesheet" type="text/css" href="../mm.css">
</HEAD>
<BODY>
<H1>Lisp Operating System</H1>
<p>
A Lisp Operating System (LispOS for short) is not just another
operating system that happens to be written in Lisp (although that
would be a good thing in itself). A LispOS is also an operating
system that uses the Lisp interactive environment as an inspiration
for the interface between the user and the system, and between
applications and the system.
</p>
<p>
Below, we give some ideas on what a LispOS might contain, how it
would be different from existing operating systems, and how such a
system might be created.
</p>
<H2>The problem with existing systems</H2>
<H3>The concept of a <i>process</i></H3>
<p>
Most popular existing operating systems are derived from Unix which
was written in the 1970s. The computers for which Unix was intended
has a very small address space; too small for most usable end-user
applications. To solve this problem, the creators of Unix used the
concept of a <i>process</i>. A large application was written so
that it consisted of several smaller programs, each of which ran in
its own address space. These smaller programs would communicate by
having one application write text to its output stream for another
application to read. This method of communication was called
a <i>pipe</i> and a sequence of small applications was called
a <i>pipeline</i>. As a typical example of a chain of applications,
consider the pipeline for producing a typeset document (one of the
main applications for which Unix was designed). This chain had a
program for creating tables (called <tt>tbl</tt>), a program for
generating pictures (called <tt>pic</tt>), a program for generating
equations (called <tt>eqn</tt>), and of course the typesetting program
itself (called <tt>troff</tt>).
</p>
<p>
Using pipes to communicate between different components of an
application has several disadvantages:
<ul>
<li>
To communicate complex data structures (such as trees or
graphs), they must be converted to a stream of bytes by the
creating component, and it must be analyzed and parsed into an
equivalent data structure by the using component. Not only is
this unparsing/parsing inefficient in terms of computing
resources, but it is also problematic from a
software-engineering point of view, because the external format
must be specified and maintained as a separate aspect of each
component.
</li>
<li>
An artificial <i>order</i> between the different components is
imposed, so that components can not work as libraries that other
components can use in any order. Sometimes (as in the example
of the <tt>troff</tt> chain) the end result of a computation
depends in subtle ways on the order between the components of
the chain. Introducing a new component may require other
components to be modified.
</li>
</ul>
</p>
<p>
It is an interesting observation that in most text books on
operating systems, the concept of a process is presented as playing
a central role in operating-system design, whereas it ought to be
presented as an unfortunate necessity due to the limited address
space of existing minicomputers in the 1970s. It is also presented
as <i>the</i> method for obtaining some kind of <i>security</i>,
preventing one application from intentionally or accidentally
modifying the data of some other application. In reality, there are
several ways of obtaining such security, and separate address spaces
should be considered to be a method with too many disadvantages.
</p>
<p>
Nowadays, computers have addresses that are 64 bit wide, making it
possible to address almost 20 exabytes of data. To get an idea of
the order of magnitude of such a number, consider a fairly large
disc that can hold a terabyte of data. Then 20 million such discs
can be directly addressed by the processor. We can thus consider
the problem of too small an address space to be solved.
</p>
<H3>Hierarchical file systems</H3>
<p>
Existing operating system come with a <i>hierarchical file
system</i>. There are two significant problems,
namely <i>hierarchical</i> and <i>file</i>.
</p>
<p>
The <i> hierarchy</i> is also a concept that dates back to the
1970s, and it was considered a vast improvement on flat file
systems. However,
as <a href="http://www.shirky.com/writings/ontology_overrated.html">this
article</a> clearly explains, most things are not naturally
hierarchical. A hierarchical organization imposes an artificial
order between names. Whether a document is called
Lisp/Programs/2013/stuff, Programs/Lisp/2013/stuff,
2013/Programs/Lisp/stuff, etc, is usually not important.
</p>
<p>
The problem with a <i>file</i> is that it is only a sequence of
bytes with no structure. This lack of structure fits the Unix pipe
model very well, because intermediate steps between individual
software components can be saved to a file without changing the
result. But it also means that in order for complex data structures
to be stored in the file system, they have to be transformed into a
sequence of bytes. And whenever such a structure needs to be
modified by some application, it must again be parsed and
transformed into an in-memory structure.
</p>
<H3>Distinction between primary and secondary memory</H3>
<p>
Current system (at least for desktop computers) make a very clear
distinction between primary and secondary memory. Not only are the
two not the same, but they also have totally different semantics:
<ul>
<li> Primary memory is <i>volatile</i>. When power is turned off,
whatever was in primary memory is lost.
</li>
<li> Secondary memory is <i>permanent</i>. Stored data will not
disappear when power is turned off.
</ul>
This distinction coupled with the semantics of the two memories
creates a permanent conundrum for the user of most applications, in
that if current application data is <i>not</i> saved, then it will
be lost in case of power loss, and if it <i>is</i> saved, then
previously saved data is forever lost.
</p>
<p>
Techniques were developed as early in the 1960s for presenting
primary and secondary memory as a single abstraction to the user.
For example, the Multics system had a single hierarchy of fixed-size
byte arrays (called segments) that served as permanent storage, but
that could also be treated as any in-memory array by applications.
As operating systems derived from Unix became widespread, these
techniques were largely forgotten.
</p>
<H3>The concept of a kernel</H3>
<p>
The <i>kernel</i> of an operating system is a fairly large,
monolithic program that is started when the computer is powered on.
The kernel is not an ordinary program of the computer. It executes
in a privileged state so that it has direct access to devices and to
data structures that must be protected from direct use by user-level
programs.
</p>
<p>
The very existence of a kernel is problematic because the computer
needs to be restarted whenever the kernel is updated, and then all
existing state is lost, including open files and data structures
that reside in volatile memory. Some programs, such as web
browsers, compensate somewhat for this problem by remembering the
open windows and the links that were associated with each window.
</p>
<p>
The fact that the kernel is monolithic poses a problem because when
code needs to be added to the kernel in the form of <i>a kernel
module</i>, such code has full access to the entire computer
system. This universal access represents a security risk, of
course, but more commonly, it can be <i>defective</i> and then it
will <i>fail</i> often by crashing the entire computer.
</p>
<p>
We have had solutions to this problem for many decades. The Multics
system, for example, did not have a kernel at all. An interrupt or
a system call was executed by the user-level process that issued the
system call or that happened to be executing when the interrupt
arrived. The code that executed then was not part of a monolithic
kernel, but existed as independent programs that could be added or
replaced without restarting the system. The system could still
crash, of course, if some essential system-wide data structure was
corrupted, but most of the time, only the user-level process that
issued the request would crash.
</p>
<H3>Monolithic applications</H3>
<p>
Applications in current operating systems are written in low-level
languages such as C or C++. An application is built using
techniques from more than half a century ago where source code is
compiled to <i>object code</i> and then <i>linked</i> to produce
an <i>executable file</i> meant to run in its own
dedicated <i>address space</i>.
</p>
<p>
Aside from system calls, all code in an application is run in one
single address space. Together with the fact that low-level
languages are used, this organization makes the application
vulnerable to <i>viruses</i> and other security-related attacks. A
typical attack exploits the vulnerability to <i>buffer overflow</i>
which is due to the fact that the programming language used to write
the application does not insert boundary checks for arrays,
requiring the programmer to do that with explicit code, which is
therefore frequently not done.
</p>
<p>
A buffer overflow in such an application can be exploited in order
to modify the execution of the program so that code that was not
intended by the application writer is executed in place of the
intended application code. Such modification is possible because
the execution stack is part of the application address space, and
the execution stack contains addresses of code to be executed later,
so that the application has direct access to these code addresses.
</p>
<p>
In a Lisp operating system, the stack is not accessible to
application code. It is therefore not possible to alter addresses
on the stack representing code to be executed later. Furthermore,
the programming language automatically checks boundaries of arrays,
so that buffer overflows are not possible.
</p>
<p>
An application in a Lisp operating system is not built as a
monolithic executable meant to execute in its own address space.
Instead, an application consists of a large number of objects whose
addresses are protected by the system and not directly accessible to
application code. The most common techniques for security attacks
are therefore not possible in such a system.
</p>
<H2>Objectives for a Lisp operating system</H2>
<p>
The three main objectives of a Lisp operating system correspond to
solutions to the two main problems with exiting systems as indicated
in the previous section.
</p>
<H3>Single address space</H3>
<p>
Instead of each application having its own address space, we propose
that all applications share a single large address space. This way,
applications can share data simply by passing pointers around,
because a pointer is globally valid, unlike pointers in current
operating systems.
</p>
<p>
Clearly, if there is a single address space shared by all
applications, there needs to be a different mechanism to
ensure <i>protection</i> between them so that one application can
not intentionally or accidentally destroy the data of another
application. Most high-level programming languages (in particular
Lisp, but also Java, and many more) propose a solution to this
problem by simply not allowing users to execute arbitrary machine
code. Instead, they allow only code that has been produced from the
high-level notation of the language and which excludes arbitrary
pointer arithmetic so that the application can only address its own
data. This technique is sometimes called "trusted compiler".
</p>
<p>
It might sometimes be desirable to write an application in a
low-level language like C or even assembler, or it might be
necessary to run applications that have been written for other
systems. Such applications could co-exist with the normal ones, but
they would have to work in their own address space as with current
operating systems, and with the same difficulties of communicating
with other applications.
</p>
<H3>Object store based on tags</H3>
<p>
Instead of a hierarchical file system, we propose an <i>object
store</i> which can contain any objects. If a file (i.e. a
sequence of bytes) is desired, it would be stored as an array of
bytes.
</p>
<p>
Instead of organizing the objects into a hierarchy, objects in the
store can optionally be associated with an arbitrary number
of <i>tags</i>. These tags are <i>key/value</i> pairs, such as for
example the date of creation of the archive entry, the creator (a
user) of the archive entry, and the <i>access permissions</i> for
the entry. Notice that tags are not properties of the objects
themselves, but only of the archive entry that allows an object to
be accessed. Some tags might be derived from the contents of the
object being stored such as the <i>sender</i> or the <i>date</i> of
an email message. It should be possible to accomplish most searches
of the store without accessing the objects themselves, but only the
tags. Occasionally, contents must be accessed such as when a raw
search of the contents of a text is wanted.
</p>
<p>
It is sometimes desirable to group related objects together as
with <i>directories</i> of current operating systems. Should a user
want such a group, it would simply be another object (say instances
of the class <tt>directory</tt>) in the store. Users who can not
adapt to a non-hierarchical organization can even store such
directories as one of the objects inside another directory.
</p>
<p>Here are some examples of possible keyword/value pairs, how they
might be used, and what kinds of values are permitted:</p>
<table border="1">
<tr>
<td>
Keyword
</td>
<td>
Possible values
</td>
</tr>
<tr>
<td>
<b>category</b>
</td>
<td>
The nature of the object such
as <b>movie</b>, <b>music</b>, <b>article</b>, <b>book</b>, <b>user
manual</b>, <b>dictionary</b>, <b>course</b>, <b>lecture</b>,
<b>recipe</b>, <b>program</b>, <b>bank statement</b>,
<b>email</b>. These would be chosen from an
editable set that is defined per user.
</td>
</tr>
<tr>
<td>
<b>name</b>
</td>
<td>
A string that is displayed with the object, such as "A Dramatic
Turn of Events", "Three seasons", "Alternative energy".
</td>
</tr>
<tr>
<td>
<b>author</b>
</td>
<td>
An object identifying a person, an organization, a company,
etc.
</td>
</tr>
<tr>
<td>
<b>genre</b>. Can be used for movies, music albums, programs,
articles, etc.
</td>
<td>
<b>progressive
metal</b>, <b>science</b>, <b>algorithms</b>, <b>garbage
collection</b>, <b>game</b>, <b>programming language
implementation</b>, <b>operating system</b>. These would be
chosen from an editable set that is defined per user.
</td>
</tr>
<tr>
<td>
<b>format</b>
</td>
<td>
This tag can be used to identify the file type of documents such
as <b>PDF</b>, <b>ogg/vorbis</b>, <b>MPEG4</b> <b>PNG</b>, in
which case the tag can be assigned automatically, but also to
identify the source format of files in a directory containing
things like articles or user manuals, for
example <b>LaTeX</b>, <b>Texinfo</b>, <b>HTML</b>. These would
be chosen from an editable set that is defined per user.
</td>
</tr>
<tr>
<td>
<b>date of creation</b>
</td>
<td>
A date interval.
</td>
</tr>
<tr>
<td>
<b>composer</b>. Used for music.
</td>
<td>
An object representing a person. On a compilation album there
can be more than one tag of this kind.
</td>
</tr>
<tr>
<td>
<b>language</b>.
</td>
<td>
An object representing a natural language such
as <b>English</b>, <b>Vietnamese</b>, or a programming languages
such as <b>Lisp</b>, <b>Python</b>. These would
be chosen from an editable set that is defined per user. If
appropriate, a document can have several of these tags, for
instance if some program uses multiple programming languages, or
if a document is written using several languages, such as a
dictionary.
</td>
</tr>
<tr>
<td>
<b>duration</b>. Can be used for things like movies or music
albums.
</td>
<td>
An object representing a duration.
</td>
</tr>
<tr>
<td>
<b>source control</b>. Can be used for any documents that use
source control such a programs, etc.
</td>
<td>
<b>GIT</b>, <b>SVN</b>, <b>CVS</b>, <b>darks</b>, etc. These
would be chosen from an editable set that is defined per user.
</td>
</tr>
<tr>
</tr>
</table>
<p>
When (a pointer to) an object is returned to a user as a result of a
search of the object store, it is actually similar to what is called
a "capability" in the operating-system literature. Such a
capability is essentially only a pointer with a few bits indicating
what <i>access rights</i> the user has to the objects. Each creator
may interpret the contents of those bits as he or she likes, but
typically they would be used to restrict access, so that for
instance executing a <i>reader</i> method is allowed, but executing
a <i>writer</i> method is not.
</p>
<H3>Single memory abstraction</H3>
<p>
Instead of two different memory abstractions (primary and
secondary), the Lisp operating system would contain a single
abstraction which looks like any interactive Lisp system, except
that data is permanent.
</p>
<p>
Since data is permanent, application writers are encouraged to
provide a sophisticated <i>undo</i> facility.
</p>
<p>
The physical main (semiconductor) memory of the computer simply acts
as a <i>cache</i> for the disk(s), so that the address of an object
uniquely determines where on the disk it is stored. The cache is
managed as an ordinary <i>virtual memory</i> with existing
algorithms.
</p>
<H3>Other features</H3>
<H4>Crash proof (maybe)</H4>
<p>
There is extensive work on crash-proof systems, be it operating
systems or data base systems. In our opinion, this work is
confusing in that the objective is not clearly stated.
</p>
<p>
Sometimes the objective is stated as the desire that no data be lost
when power is lost. But the solution to that problem already exists
in every laptop computer; it simply provides a <i>battery</i> that
allow the system to continue to work, or to be <i>shut down</i> in a
controlled way.
</p>
<p>
Other times, the objective is stated as a protection against
defective software, so that data is stored at regular intervals
(checkpointing) perhaps combined with a <i>transaction log</i> so
that the state of the system immediately before a crash can always
be recovered. But it is very hard to protect oneself against
defective software. There can be defects in the checkpointing code
or in the code for logging transactions, and there can be defects in
the underlying file system. We believe that it is a better use of
developer time to find and eliminate defects than to aim for a
recovery as a result of existing defects.
</p>
<H4>Multiple simultaneous environments</H4>
<p>
To allow for a user to add methods to standard generic functions
(such as <tt>print-object</tt>) without interfering with other
users, we suggest that each user gets a different <i>global
environment</i>. The environment maps <i>names</i>
to <i>objects</i> such as functions, classes, types, packages, and
more. Immutable objects (such as the <tt>common-lisp</tt> package)
can exist in several different environments simultaneously, but
objects (such as the generic function <tt>print-object</tt> would be
different in different environments.
</p>
<p>
Multiple environments would also provide more safety for users in
that if a user inadvertently removes some system feature, then it
can be recovered from a default environment, and in the worst case a
fresh default environment could be installed for a user who
inadvertently destroyed large parts of his or her environment.
</p>
<p>
Finally, multiple environments would simplify experimentation with
new features without running the risk of destroying the entire
system. Different versions of a single package could exist in
different environments.
</p>
<H2>How to accomplish it</H2>
<p>
The most important aspect of a Lisp operating system is not that all
the code be written in Lisp, but rather to present a Lisp-like
interface between users and the system and between applications and
the system. It is therefore legitimate to take advantage of some
existing system (probably Linux or some BSD version) in order to
provide services such as device drivers, network communication,
thread scheduling, etc.
</p>
<H3>Create a Lisp system to be used as basis</H3>
<p>
The first step is to create a Common Lisp system that can be used as
a basis for the Lisp operating system. It should already allow for
multiple environments, and it should be available on 64-bit
platforms. Preferably, this system should use as little C code as
possible and interact directly with the system calls of the
underlying kernel.
</p>
<H3>Create a single-user system as a Unix process</H3>
<p>
The next step is to transform the Common Lisp system into an
operating system in the sense of the API for users and
applications. This system would contain the object store, but
perhaps not access control functionality.
</p>
<p>
When this step is accomplished, it is possible to write or adapt
applications such as text editors, inspectors, debuggers, GUI
interface libraries, etc. for the system.
</p>
<H3>Create device drivers</H3>
<p>
The final step is to replace the temporary Unix kernel with native
device drivers for the new system and to turn the system into a full
multi-user operating system.
</p>
<HR>
<address>
robert.strandh@gmail.com
</address>
</BODY>
</HTML>
|