From a519e5320bec2f2df42b7356740e33189dd13d29 Mon Sep 17 00:00:00 2001 From: Andrew Yu Date: Tue, 26 Jul 2022 01:17:29 +0800 Subject: backup public domain --- random/lispos.html | 597 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 597 insertions(+) create mode 100644 random/lispos.html (limited to 'random/lispos.html') diff --git a/random/lispos.html b/random/lispos.html new file mode 100644 index 0000000..cd114fa --- /dev/null +++ b/random/lispos.html @@ -0,0 +1,597 @@ + + +Lisp Operating System + + + +

Lisp Operating System

+ +

+ A Lisp Operating System (LispOS for short) is not just another + operating system that happens to be written in Lisp (although that + would be a good thing in itself). A LispOS is also an operating + system that uses the Lisp interactive environment as an inspiration + for the interface between the user and the system, and between + applications and the system. +

+ +

+ Below, we give some ideas on what a LispOS might contain, how it + would be different from existing operating systems, and how such a + system might be created. +

+ +

The problem with existing systems

+ +

The concept of a process

+ +

+ Most popular existing operating systems are derived from Unix which + was written in the 1970s. The computers for which Unix was intended + has a very small address space; too small for most usable end-user + applications. To solve this problem, the creators of Unix used the + concept of a process. A large application was written so + that it consisted of several smaller programs, each of which ran in + its own address space. These smaller programs would communicate by + having one application write text to its output stream for another + application to read. This method of communication was called + a pipe and a sequence of small applications was called + a pipeline. As a typical example of a chain of applications, + consider the pipeline for producing a typeset document (one of the + main applications for which Unix was designed). This chain had a + program for creating tables (called tbl), a program for + generating pictures (called pic), a program for generating + equations (called eqn), and of course the typesetting program + itself (called troff). +

+ +

+ Using pipes to communicate between different components of an + application has several disadvantages: +

+

+ +

+ It is an interesting observation that in most text books on + operating systems, the concept of a process is presented as playing + a central role in operating-system design, whereas it ought to be + presented as an unfortunate necessity due to the limited address + space of existing minicomputers in the 1970s. It is also presented + as the method for obtaining some kind of security, + preventing one application from intentionally or accidentally + modifying the data of some other application. In reality, there are + several ways of obtaining such security, and separate address spaces + should be considered to be a method with too many disadvantages. +

+ +

+ Nowadays, computers have addresses that are 64 bit wide, making it + possible to address almost 20 exabytes of data. To get an idea of + the order of magnitude of such a number, consider a fairly large + disc that can hold a terabyte of data. Then 20 million such discs + can be directly addressed by the processor. We can thus consider + the problem of too small an address space to be solved. +

+ +

Hierarchical file systems

+ +

+ Existing operating system come with a hierarchical file + system. There are two significant problems, + namely hierarchical and file. +

+ +

+ The hierarchy is also a concept that dates back to the + 1970s, and it was considered a vast improvement on flat file + systems. However, + as this + article clearly explains, most things are not naturally + hierarchical. A hierarchical organization imposes an artificial + order between names. Whether a document is called + Lisp/Programs/2013/stuff, Programs/Lisp/2013/stuff, + 2013/Programs/Lisp/stuff, etc, is usually not important. +

+ +

+ The problem with a file is that it is only a sequence of + bytes with no structure. This lack of structure fits the Unix pipe + model very well, because intermediate steps between individual + software components can be saved to a file without changing the + result. But it also means that in order for complex data structures + to be stored in the file system, they have to be transformed into a + sequence of bytes. And whenever such a structure needs to be + modified by some application, it must again be parsed and + transformed into an in-memory structure. +

+ +

Distinction between primary and secondary memory

+ +

+ Current system (at least for desktop computers) make a very clear + distinction between primary and secondary memory. Not only are the + two not the same, but they also have totally different semantics: +

+ This distinction coupled with the semantics of the two memories + creates a permanent conundrum for the user of most applications, in + that if current application data is not saved, then it will + be lost in case of power loss, and if it is saved, then + previously saved data is forever lost. +

+ +

+ Techniques were developed as early in the 1960s for presenting + primary and secondary memory as a single abstraction to the user. + For example, the Multics system had a single hierarchy of fixed-size + byte arrays (called segments) that served as permanent storage, but + that could also be treated as any in-memory array by applications. + As operating systems derived from Unix became widespread, these + techniques were largely forgotten. +

+ +

The concept of a kernel

+ +

+ The kernel of an operating system is a fairly large, + monolithic program that is started when the computer is powered on. + The kernel is not an ordinary program of the computer. It executes + in a privileged state so that it has direct access to devices and to + data structures that must be protected from direct use by user-level + programs. +

+ +

+ The very existence of a kernel is problematic because the computer + needs to be restarted whenever the kernel is updated, and then all + existing state is lost, including open files and data structures + that reside in volatile memory. Some programs, such as web + browsers, compensate somewhat for this problem by remembering the + open windows and the links that were associated with each window. +

+ +

+ The fact that the kernel is monolithic poses a problem because when + code needs to be added to the kernel in the form of a kernel + module, such code has full access to the entire computer + system. This universal access represents a security risk, of + course, but more commonly, it can be defective and then it + will fail often by crashing the entire computer. +

+ +

+ We have had solutions to this problem for many decades. The Multics + system, for example, did not have a kernel at all. An interrupt or + a system call was executed by the user-level process that issued the + system call or that happened to be executing when the interrupt + arrived. The code that executed then was not part of a monolithic + kernel, but existed as independent programs that could be added or + replaced without restarting the system. The system could still + crash, of course, if some essential system-wide data structure was + corrupted, but most of the time, only the user-level process that + issued the request would crash. +

+ +

Monolithic applications

+ +

+ Applications in current operating systems are written in low-level + languages such as C or C++. An application is built using + techniques from more than half a century ago where source code is + compiled to object code and then linked to produce + an executable file meant to run in its own + dedicated address space. +

+ +

+ Aside from system calls, all code in an application is run in one + single address space. Together with the fact that low-level + languages are used, this organization makes the application + vulnerable to viruses and other security-related attacks. A + typical attack exploits the vulnerability to buffer overflow + which is due to the fact that the programming language used to write + the application does not insert boundary checks for arrays, + requiring the programmer to do that with explicit code, which is + therefore frequently not done. +

+ +

+ A buffer overflow in such an application can be exploited in order + to modify the execution of the program so that code that was not + intended by the application writer is executed in place of the + intended application code. Such modification is possible because + the execution stack is part of the application address space, and + the execution stack contains addresses of code to be executed later, + so that the application has direct access to these code addresses. +

+ +

+ In a Lisp operating system, the stack is not accessible to + application code. It is therefore not possible to alter addresses + on the stack representing code to be executed later. Furthermore, + the programming language automatically checks boundaries of arrays, + so that buffer overflows are not possible. +

+ +

+ An application in a Lisp operating system is not built as a + monolithic executable meant to execute in its own address space. + Instead, an application consists of a large number of objects whose + addresses are protected by the system and not directly accessible to + application code. The most common techniques for security attacks + are therefore not possible in such a system. +

+ +

Objectives for a Lisp operating system

+ +

+ The three main objectives of a Lisp operating system correspond to + solutions to the two main problems with exiting systems as indicated + in the previous section. +

+ +

Single address space

+ +

+ Instead of each application having its own address space, we propose + that all applications share a single large address space. This way, + applications can share data simply by passing pointers around, + because a pointer is globally valid, unlike pointers in current + operating systems. +

+ +

+ Clearly, if there is a single address space shared by all + applications, there needs to be a different mechanism to + ensure protection between them so that one application can + not intentionally or accidentally destroy the data of another + application. Most high-level programming languages (in particular + Lisp, but also Java, and many more) propose a solution to this + problem by simply not allowing users to execute arbitrary machine + code. Instead, they allow only code that has been produced from the + high-level notation of the language and which excludes arbitrary + pointer arithmetic so that the application can only address its own + data. This technique is sometimes called "trusted compiler". +

+ +

+ It might sometimes be desirable to write an application in a + low-level language like C or even assembler, or it might be + necessary to run applications that have been written for other + systems. Such applications could co-exist with the normal ones, but + they would have to work in their own address space as with current + operating systems, and with the same difficulties of communicating + with other applications. +

+ +

Object store based on tags

+ +

+ Instead of a hierarchical file system, we propose an object + store which can contain any objects. If a file (i.e. a + sequence of bytes) is desired, it would be stored as an array of + bytes. +

+ +

+ Instead of organizing the objects into a hierarchy, objects in the + store can optionally be associated with an arbitrary number + of tags. These tags are key/value pairs, such as for + example the date of creation of the archive entry, the creator (a + user) of the archive entry, and the access permissions for + the entry. Notice that tags are not properties of the objects + themselves, but only of the archive entry that allows an object to + be accessed. Some tags might be derived from the contents of the + object being stored such as the sender or the date of + an email message. It should be possible to accomplish most searches + of the store without accessing the objects themselves, but only the + tags. Occasionally, contents must be accessed such as when a raw + search of the contents of a text is wanted. +

+ +

+ It is sometimes desirable to group related objects together as + with directories of current operating systems. Should a user + want such a group, it would simply be another object (say instances + of the class directory) in the store. Users who can not + adapt to a non-hierarchical organization can even store such + directories as one of the objects inside another directory. +

+ +

Here are some examples of possible keyword/value pairs, how they + might be used, and what kinds of values are permitted:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ Keyword + + Possible values +
+ category + + The nature of the object such + as movie, music, article, book, user + manual, dictionary, course, lecture, + recipe, program, bank statement, + email. These would be chosen from an + editable set that is defined per user. +
+ name + + A string that is displayed with the object, such as "A Dramatic + Turn of Events", "Three seasons", "Alternative energy". +
+ author + + An object identifying a person, an organization, a company, + etc. +
+ genre. Can be used for movies, music albums, programs, + articles, etc. + + progressive + metal, science, algorithms, garbage + collection, game, programming language + implementation, operating system. These would be + chosen from an editable set that is defined per user. +
+ format + + This tag can be used to identify the file type of documents such + as PDF, ogg/vorbis, MPEG4 PNG, in + which case the tag can be assigned automatically, but also to + identify the source format of files in a directory containing + things like articles or user manuals, for + example LaTeX, Texinfo, HTML. These would + be chosen from an editable set that is defined per user. +
+ date of creation + + A date interval. +
+ composer. Used for music. + + An object representing a person. On a compilation album there + can be more than one tag of this kind. +
+ language. + + An object representing a natural language such + as English, Vietnamese, or a programming languages + such as Lisp, Python. These would + be chosen from an editable set that is defined per user. If + appropriate, a document can have several of these tags, for + instance if some program uses multiple programming languages, or + if a document is written using several languages, such as a + dictionary. +
+ duration. Can be used for things like movies or music + albums. + + An object representing a duration. +
+ source control. Can be used for any documents that use + source control such a programs, etc. + + GIT, SVN, CVS, darks, etc. These + would be chosen from an editable set that is defined per user. +
+ +

+ When (a pointer to) an object is returned to a user as a result of a + search of the object store, it is actually similar to what is called + a "capability" in the operating-system literature. Such a + capability is essentially only a pointer with a few bits indicating + what access rights the user has to the objects. Each creator + may interpret the contents of those bits as he or she likes, but + typically they would be used to restrict access, so that for + instance executing a reader method is allowed, but executing + a writer method is not. +

+ +

Single memory abstraction

+ +

+ Instead of two different memory abstractions (primary and + secondary), the Lisp operating system would contain a single + abstraction which looks like any interactive Lisp system, except + that data is permanent. +

+ +

+ Since data is permanent, application writers are encouraged to + provide a sophisticated undo facility. +

+ +

+ The physical main (semiconductor) memory of the computer simply acts + as a cache for the disk(s), so that the address of an object + uniquely determines where on the disk it is stored. The cache is + managed as an ordinary virtual memory with existing + algorithms. +

+ +

Other features

+ +

Crash proof (maybe)

+ +

+ There is extensive work on crash-proof systems, be it operating + systems or data base systems. In our opinion, this work is + confusing in that the objective is not clearly stated. +

+ +

+ Sometimes the objective is stated as the desire that no data be lost + when power is lost. But the solution to that problem already exists + in every laptop computer; it simply provides a battery that + allow the system to continue to work, or to be shut down in a + controlled way. +

+ +

+ Other times, the objective is stated as a protection against + defective software, so that data is stored at regular intervals + (checkpointing) perhaps combined with a transaction log so + that the state of the system immediately before a crash can always + be recovered. But it is very hard to protect oneself against + defective software. There can be defects in the checkpointing code + or in the code for logging transactions, and there can be defects in + the underlying file system. We believe that it is a better use of + developer time to find and eliminate defects than to aim for a + recovery as a result of existing defects. +

+ +

Multiple simultaneous environments

+ +

+ To allow for a user to add methods to standard generic functions + (such as print-object) without interfering with other + users, we suggest that each user gets a different global + environment. The environment maps names + to objects such as functions, classes, types, packages, and + more. Immutable objects (such as the common-lisp package) + can exist in several different environments simultaneously, but + objects (such as the generic function print-object would be + different in different environments. +

+ +

+ Multiple environments would also provide more safety for users in + that if a user inadvertently removes some system feature, then it + can be recovered from a default environment, and in the worst case a + fresh default environment could be installed for a user who + inadvertently destroyed large parts of his or her environment. +

+ +

+ Finally, multiple environments would simplify experimentation with + new features without running the risk of destroying the entire + system. Different versions of a single package could exist in + different environments. +

+ +

How to accomplish it

+ +

+ The most important aspect of a Lisp operating system is not that all + the code be written in Lisp, but rather to present a Lisp-like + interface between users and the system and between applications and + the system. It is therefore legitimate to take advantage of some + existing system (probably Linux or some BSD version) in order to + provide services such as device drivers, network communication, + thread scheduling, etc. +

+ +

Create a Lisp system to be used as basis

+ +

+ The first step is to create a Common Lisp system that can be used as + a basis for the Lisp operating system. It should already allow for + multiple environments, and it should be available on 64-bit + platforms. Preferably, this system should use as little C code as + possible and interact directly with the system calls of the + underlying kernel. +

+ +

Create a single-user system as a Unix process

+ +

+ The next step is to transform the Common Lisp system into an + operating system in the sense of the API for users and + applications. This system would contain the object store, but + perhaps not access control functionality. +

+ +

+ When this step is accomplished, it is possible to write or adapt + applications such as text editors, inspectors, debuggers, GUI + interface libraries, etc. for the system. +

+ +

Create device drivers

+ +

+ The final step is to replace the temporary Unix kernel with native + device drivers for the new system and to turn the system into a full + multi-user operating system. +

+ +
+
+robert.strandh@gmail.com +
+ + -- cgit 1.4.1-2-gfad0