Towards An Emergent Operating System
Numerical Investigation of Innovative, Non-Hierarchical, Massively Parallel OS Designs using Swarm
Jan Hauser - Strategic Programs, Sun
Manor Askenazi - Swarm Team, SFI
The Elusive General Purpose Supercomputer
Commercially viable general purpose computers have been with us for the last forty years. Computer architectures exist today, which can support anything from database applications, scientific code, desktop publishing, professional graphics to animation and even games...
There is no equivalent architecture in the field of supercomputing.
To be fair, the last twenty years have seen the development of impressive database machines, high performance number crunchers, and heavily optimized graphics engines. As one would expect, these machines have all necessitated highly specific components and custom internal architectures. Unfortunately, no approach has yet been developed which allows the integration of these special purpose components within the scope of a single operating system. In other words, and in practical terms, there is no way to cost-effectively upgrade a graphics engine with database components or number-crunching modules. The only way to upgrade contemporary high-performance computing platforms is by adding new stand-alone computers to the local area network, and running them as independent nodes, each with its own operating system.
The Operating System Barrier
Surprisingly, the difficulty in providing truly scalable systems does not stem from hardware limitations. In fact, there are many examples of processors and communication chips which are specifically designed with scalability in mind... A good example is the S3.mp project which demonstrates the low-latency/high-bandwidth/high-scalability achievable using current technology [1,2].
As it turns out, the difficulty stems from an operating systems barrier. Classic OS techniques are simply unable to address the issues raised by massively parallel, highly-distributed, scalable systems. For example, there is no known method for automatically placing processes on the appropriate nodes in a system so as to minimize both latency and processor load. Other problems include memory allocation, process migration, parallel scheduling and control/administration concerns (how to specify policy etc...).
Systems which do attempt to provide some form of distribution seem to rely heavily on hierarchal variants of the known (sequential) OS algorithms. This means that for example, almost all MIMD supercomputers use a host machine which has special privileges with regards to process creation etc. The other nodes in the system will typically be running a very limited version of the OS and require interaction with the host node when performing certain system calls. This approach is highly unsatisfactory since it is:
- Inefficient: The OS behaves sequentially despite the natural parallelism existing in a multi-node system.
- Non-adaptive: Local changes in the system are not taken into account by the higher levels of the hierarchy for quite a while (notification takes time, especially in centralized, heavily-loaded systems).
- Non-scalable: Hierarchical schemes are not practical in very large systems because too much control is required over the finest elements in the system.
Object Orientation: An Enabling Software Technology
Object orientation has been heralded as a breakthrough in software engineering for many reasons... The benefits of OO concepts such as inheritance, encapsulation and software reuse have been applied in many areas where traditional programming techniques left something to be desired. It is our belief that thanks to an OO perspective, a paradigm change is brewing, which will allow us to overcome the OS barrier. A feel for the sort of changes that OO enables in the field of operating systems can be gained by studying the Spring project developed by Sun . Under Spring, every process (whether at the OS or the application level) is embodied in an object.
This means that, for example, every aspect of OS-application interaction is achieved through messages, that objects can be moved around the system, and run on different processors (since object messages can be translated into network messages transparently). This potent mixture of concurrency and OO (all the way to the micro-kernel), provides us with the basic tools we need in order to revolution-arise the field of massively parallel OS design.
The New Trend: Non-Hierarchical Techniques
What sort of OS mechanisms are being developed today, which will cope with the requirements of a truly scalable system? Some innovative examples can be found in the work of Mario Tokoro on the Computational Field Model [4,5,6]. In this model, all software entities are encapsulated in concurrent objects and are continuously being shifted around on the parallel machine (or network) in order to minimize the overheads of message-passing while at the same time guarding against poor load-balancing. This is achieved by sustaining a form of physics which maintains forces of 'attraction' between communicating objects while applying repulsive forces between objects which reside on the same processor. Finally, inertia is added into the system to avoid a continuous process of re-shuffling of the objects. Though these concepts may seem ill-defined or overly picturesque, the MUSE operating system [7,8] has already been implemented at Sony CS Labs which could be made to support this form of process control...
Another innovative approach to non-hierarchical control called the Computational Market is being developed at Xerox Parc by Huberman and Hogg [9,10]. According to their vision, an efficient and scalable way to coordinate and control distributed computations can be achieved by viewing the system as an economy and allowing the forces of supply and demand to dictate resource allocation. A prototype system called Spawn has been implemented .
As a final remark, we would like to point out that even at the application level, there seems to be an opportunity to introduce more flexible, and dynamic control schemes - even in fields such as scientific computation where systems are normally solved according to extremely rigid frameworks. The Piranha system developed at Yale by Gelernter et al. demonstrates how scientific computation can be broken down into small chunks of code which are arranged in a task graph and executed opportunistically according to the partial order described by the graph [12,13,14,15].
Swarm: A Vital OS Research Tool
The astute reader will notice that, despite the fact that all the techniques proposed in the previous section are based on very different metaphors (n-body physical systems, economies, agents based systems), they all share one common trait, namely, that they are all complex adaptive systems [10,16]. At first, this may seem unfortunate, given science's limited understanding of complex systems (as opposed to their prefered and fully understood linear counterparts). However, it is our belief that, by adopting new design methods and tools we can overcome and harness the complexity of these models in order to create a viable basis for truly scalable operating systems.
One extremely useful tool with which to explore the behavior of complex non-hierarchical OS designs is the Swarm simulation software developed at the Santa Fe institute . This package was designed with complex multi-agent systems in mind and therefore provides unparalleled support for precisely the sort of simulations we require!
The design methodology would then consist of an iterative process whereby the candidate OS framework is modeled, simulated and evaluated using the Swarm system. If the results are favorable (i.e. the OS shows stable and consistently good performance) a more detailed model can be designed. If, on the other hand, the results are not satisfactory, the process is restarted with a new framework based on a different metaphor.
In order to demonstrate the feasibility of this methodology, a simple OS model based on a chemical metaphor is currently being designed.
Note: I will be serving in the Israeli Army from Sept' 97 onwards and will consequently not be able to answer email or pursue these projects in the forseeable future (approx. Sept '98).
For more information please email: mailto:firstname.lastname@example.org
 Nowatzyk, A., Aybay, G., Browne, M., Kelly, E., Parkin, M., Radke, B., Vishin, S., "The S3.mp Scalable Shared Memory Multiprocessor" , ICPP'95.
 Nowatzyk, A., Browne, M, Kelly, E., Parkin, M., "S-Connect: from Networks of Workstations to Supercomputer Performance" , ISCA'95.
 J. Mitchell, J. Gibbons, G. Hamilton, P. Kessler, Y. Khalidi, P. Kougiouris, P. Madany, M. Nelson, M. Powell, and S. Radia. An Overview of the Spring System. Proceedings of Compcon Spring 1994, February 1994.
 Mario Tokoro, "The Society of Objects", Sony CSL technical report SCSL-TR-93-018, Invited talk at OOPSLA'93 and to be published in the Addendum to the OOPSLA'93 Proceedings, December 1993.
 M. Tokoro and K. Honda, "The Computational Field Model for Open Distributed Environments", in Concurrency: Theory, Language, and Architecture, LNCS, No. 491, Springer Verlag, 1991.
 Mario Tokoro, "Computational Field Model: Toward a New Computing Model/Methodology for Open Distributed Environment", Sony CSL technical report SCSL-TR-90-006, in Proceedings of 2nd Workshop on Future Trends in Distributed Computing Systems, Cairo, Sep. 1990., June 1990.
 Yasuhiko Yokote, Fumio Teraoka, Masaki Yamada, Hiroshi Tezuka and Mario Tokoro, "The Design and Implementation of the Muse Object-Oriented Distributed Operating System", Sony CSL technical report SCSL-TR-89-010, in Proceedings of First Conference on Technology of Object-Oriented Languages and Systems, October 1989.
 Yasuhiko Yokote, Fumio Teraoka, Atsushi Mitsuzawa, Nobuhisa Fujinami and Mario Tokoro, "The Muse Object Architecture: A New Operating System Structuring Concept", Sony CSL technical report SCSL-TR-91-002, also appeared in Operating Systems Review, Vol.25, No.2, April, 1991, February 1991.
 Bernardo A. Huberman and Tad Hogg. The Emergence of Computational Ecologies. Appeared in 1992 Lectures in Complex Systems.
 Tad Hogg and Bernardo A. Huberman. Controlling Chaos in Distributed Systems. IEEE Trans. on Systems, Man and Cybernetics, 21, 6, pp. 1325-1332, Nov/Dec 91.
 Carl A. Waldspurger and Tad Hogg and Bernardo A. Huberman and Jeffery O. Kephart and W. Scott Stornetta. Spawn: A Distributed Computational Economy. IEEE Trans. on Software Engineering, 18, 2, pp. 103-117, Feb 92.
 Nicholas Carriero, David Gelernter, David Kaminsky and Jeffery Westbrook. Adaptive Parallelism with Piranha. Technical Report 954, Yale University, Feb 1993.
 Nick Carriero, Eric Freeman, David Gelernter and David Kaminsky. Adaptive Parallelism and Piranha. Yale University, Feb 1994.
 David Gelernter and David Kaminsky. Supercomputing our of Recycled Garbage: Preliminary Experience with Piranha. Technical Report 883, Yale University, Dec 1991.
 Nicholas Carriero, Eric Freeman, and David Gelernter. Adaptive Parallelism on Multiprocessors: Preliminary Experience with Piranha on the CM-5. Technical Report 969, Yale University, May 1993.
 Iqbal Adjali, Jose-Luis Fernandez-Villacanas, Michael Gell. Nonlinear Dynamics in Distributed Systems. Systems Research Division, BT Laboratories, Martlesham Heath, Ipswich IP5 7RE, United Kingdom. (Available as LANL adap-org/9405004 tech report).
 Chris Langton, Nelson Minar, Roger Burkhart. The Swarm Simulation System: A tool for studying complex systems. June 1, 1995 (DRAFT).