unix kiss casestudy

Upload: vpw

Post on 05-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Unix KISS CaseStudy

    1/8

    THE UNIX KISS: A CASE STUDY

    Franco MilicchioDept. Computer Science and Engineering, University Roma TreVia della Vasca Navale, 79 00146 Roma Italy

    [email protected]

    ABSTRACT

    In this paper we show that the initial philosophy used in designing and developing UNIX in early times has been

    forgotten due to fast practices. We question the leitmotif that microkernels, though being by design adherent to the

    KISS principle, have a number of context switches higher than their monolithic counterparts, running a test suite and

    verify the results with standard statistical validation tests. We advocate a wiser distribution of shared libraries by

    statistically analyzing the weight of each shared object in a typical UNIX system, showing that the majority of shared

    libraries exist in a common space for no real evidence of need. Finally we examine the UNIX heritage with an historicalpoint of view, noticing how habits swiftly replaced the intents of the original authors, moving the focus from the earliest

    purpose of is avoiding complications, keeping a system simple to use and maintain.

    KEYWORDS

    UNIX; Statistics; Case Studies; Operating Systems.

    1. INTRODUCTIONUNIX is the eldest operating system still in use, having its roots in the 1960s Multics system. Not by

    chance its original name was Unics, later changed to its renowned denomination. It was designed and

    developed at the Bell Labs by Thompson, Ritchie and McIlroy, trying to avoid some complications itsancestor introduced, keeping the system small and simple. This philosophy, originated by complex systems

    engineering, gained fame under the acronym KISS, keep it simple, stupid, and dates back to the 14th

    century with the lex parsimoniae by the philosopher William of Ockham, who stated entia non sunt

    multiplicanda praeter necessitatem, best known as Ockham's razor entities should not be multiplied beyond

    necessity. We can easily recognize that the whole project followed this rule of thumb even from its first

    version. Small programs were preferred instead of big ones, developing programs that do a single task but

    very efficiently. To run complex jobs, these small applications could, and still are, connected by I/O

    redirection. After many years, and after many death prophecies, UNIX is still one of the most used operating

    systems. A question may arise, whether UNIX has observed in its history the KISS principle, or it has forgot

    this basic rule and followed other habits.

    The operating system core is one of the major concerns. Microkernels have been developed during years

    of research, but the common understanding of them was poor. They were always addressed as neat and

    simple academic design projects, following the KISS principle, but with bad performances due to the numberof context switches necessary to run an application. Even though there is commercial evidence that

    microkernels are not just academic proofs of concepts, with mission-critical real-time operating systems like

    QNX, and even end-user UNIX systems like MacOS X, the leitmotif of monolithic kernels being better

    never faded.

    Dynamic linking was one of the features inherited by UNIX from its ancestor Multics. Reusing code

    through shared libraries is a common practice not only in the UNIX world, but in all modern operating

    systems. This praxis on one hand simplifies the developer's job, but on the other hand can add a high grade of

    complexity in a system if all softwares do actually share their code. Again, another common opinion is that

    sharing prevents a system to be overwhelmed by an ungovernable duplication of resources. However, this

    belief had never been proved right or wrong.

  • 7/31/2019 Unix KISS CaseStudy

    2/8

    In this paper we address these issues from an historical and statistical point of view. We will evaluate the

    number of context switches with a set of tests, validating the results with state of the art statistical analyses.

    On the libraries side, we will take a survey of all shared resources present on our systems, inspecting the

    weight and impact of such libraries to corroborate or contradict the habit of sharing them. Finally, we will

    point out the historical heritage of UNIX, showing how the KISS philosophy had become less important thancommon habits. For our survey we chose Linux and MacOS X. This decision was taken to avoid biased

    results for any statistical analysis by choosing a niche OS. Thus, by selecting two UNIX (or UNIX-like)

    operating systems available to a general public, we achieve an impartial and fair comparison.

    2. KERNEL WARSA kernel is the core of an operating system, giving the minimum abstraction layer for hardware handling,

    inter-process communication and memory management. Along with these, a kernel may provide other

    services like device management, sound and network control. On monolithic systems all services are

    implemented in kernel space, while on microkernels all these aspects are delegated to userland servers. A

    monolithic kernel is thus less adherent to the KISS principle, involving a high grade of complexity, evident

    even by the needed number of lines of code. By construction they have tight and often non-trivialdependencies between their components, affecting the whole system in case of bugs. Microkernels on the

    other hand tend to keep all aspects simple and neat, commissioning all services to the servers. This quality

    affects microkernels on the design side by needing a great care in planning their features.

    In the history of operating systems the debate between the supporters of microkernel design, opposed to

    the monolithic approach, was one of the major discussion arguments. The most famous flaming discussion on

    the topic started between Andrew Tanenbaum and Linus Torvalds, the creators of Minix and Linux

    respectively. Much of the discussion between these two major schools is about performance. A monolithic

    kernel should be more efficient than a microkernel because it requires less context switches for all tasks. This

    fact had always been addressed but never sufficiently investigated or statistically proved.

    We analyzed both MacOS X and Linux with a suite of tests to prove or confute this efficiency issue. All

    the tests have been then evaluated in their statistical significance to validate the results.

    2.1 Test Suite

    The suite consisted of 31 tests, divided into 7 categories. Each suite have been repeated fifty times on a

    freshly-rebooted machine in order to obtain a sufficient number of cases to achieve statistical significance

    (Casella and Berger, 2001; Freedman, 2005). The chosen tests are the following:

    1. Multiplication of two randomly-generated integers with 1024, 2048, 5120, 10240, and 20480 digits;2. Creation of a random file from /dev/randomwith sizes of 1024 KB, 2048 KB, 5120 KB, 10 MB,

    20 MB, 50 MB, and 100 MB;

    3. Conversion of images from PNG to JPEG, TIFF and PostScript;4. Download of a 650 MB CD-ROM ISO image via the HTTP protocol;5. Compression of random files with sizes of 1024 KB, 2048 KB, 5120 KB, and compression of the

    same random sequence repeated 15 times;

    6. Behavior of the Secure Shell daemon while uploading and downloading a 650 MB CD-ROM ISOimage from a client, running on both high and low TCP ports;7. Process generation with classical UNIX calls fork, wait, and exec, and thread spawning.The chosen hardware models for a test have been an IBM IntelliStation M Pro running Linux Ubuntu

    5.04 based on the Linux 2.6.10 kernel versionm, and an Apple eMac PowerPC running Apple MacOS X

    10.4.6 based on Darwin 8.6.0 kernel version. The test suite was designed to be independent of the underlying

    hardware, focusing exclusively on the context switch counts, and not on performances (e.g. execution times).

  • 7/31/2019 Unix KISS CaseStudy

    3/8

    2.2 Statistical Analysis

    A statistical analysis on the test suite has been conducted to check the validity of each test. The results are

    presented in Table 1 for both Linux and Darwin (MacOS X), where mean and standard deviation are shown

    in detail for each set of tests.

    2.2.1 Test Results

    As shown in Table 1 the Linux kernel has an overall number of context switches greater than MacOS X.

    In one test, specifically the HTTP download, the number of context switches on the MacOS X system was

    less than the Linux counterpart with a significant difference. Apart from the number of context switches, the

    variance of MacOS X show that its kernel switches irregularly compared to Linux, which behaves

    smoothly with very sparse outliers. This fact was confirmed also by the confidence intervals and skewness of

    both the systems.

    These results are a clear indication that the assertion that a microkernel has a number of context switches

    greater than a monolithic one has no real statistical evidence. As a matter of fact, a well-designed

    microkernel has a number of context switches less or at worst, comparable to its counterpart. Anyway, these

    results do not provide any performance comparisons between the two kernels, but they supply the important

    information that if a performance loss is present, is not due to the number of context switches.

    Table 1. The statistical results of the test series

    Linux MacOS X

    Test Mean Variance Mean Variance

    bc 1024 87.46 5.58 0.38 0.85

    bc 2048 158.20 9.15 0.44 1.01

    bc 5120 381.14 21.27 1.96 4.72

    bc 10240 779.44 50.65 1.54 3.47

    bc 20480 1459.74 75.86 3.74 6.95

    dd 1024 K 10.68 3.42 0.40 0.83dd 2048 K 7.44 4.42 0.42 0.78

    dd 5120 K 22.82 6.96 0.64 1.17

    dd 10240 K 60.10 244.60 0.50 0.86

    dd 20480 K 64.60 14.76 0.78 0.97

    dd 51200 K 127.58 11.36 2.00 5.96

    dd 102400 K 237.04 76.71 1.40 1.93

    PNG to JPG (1) 121.42 10.90 2.64 4.28

    PNG to TIFF (1) 4.26 0.90 0.52 0.50

    PNG to PS (1) 8.68 1.36 0.52 0.54

    PNG to JPG (2) 8.24 2.41 1.30 2.72

    PNG to TIFF (2) 5.82 4.29 0.68 0.71

    PNG to PS (2) 2.08 0.40 0.52 0.61

    HTTP Download 454540.26 14104.41 466149.94 16854.00

    zip 1024 K 6.96 1.64 1.84 2.06

    zip 1024 K (15 times) 16.82 1.51 1.18 1.85

    zip 2048 K 4.64 0.94 0.96 2.04

    zip 2048 K (15 times) 27.18 3.38 3.62 13.86

    zip 5120 K 5.00 0.83 0.68 1.36

    zip 5120 K (15 times) 223.48 25.01 1.96 1.83

    scp (Download, High) 121994.78 10812.82 89920.74 2868.35

    scp (Download, Low) 158212.84 7542.38 88784.98 2774.65

    scp (Upload, High) 273975.60 11675.82 85522.62 2026.85

    scp (Upload, Low) 282313.78 8000.92 87982.98 2703.27

    Processes (fork) 3.36 1.17 1.10 0.36

    Processes (threads) 16.36 13.01 0.22 0.42

  • 7/31/2019 Unix KISS CaseStudy

    4/8

    2.2.2 Test Validation

    To validate the results we performed a standard t-Test (Hastie et al., 2003) in order to examine if the two

    series of results have statistically significant differences, in particular we conducted both 1- and 2-tailored t-

    Tests. In Table 2 we show the results for those tests that showed a non-significant difference in the 1-tailored

    test, sided by the corresponding 2-tailored study.

    Table 2. The statistical results of the test series

    t-Testp-value

    Test 1-taliroed 2-tailored

    dd 10240 K 0.05 0.09

    PNG to PS (1) 0.02 < 10-6

    PNG to JPG (2) 0.45 < 10-6

    PNG to TIFF (2) 0.19 < 10-6

    HTTP Download 0.27 < 10-4

    zip 1024 K 0.71 < 10-6

    zip 1024 K (15 times) 0.87 < 10-6

    zip 2048 K 0.32 < 10

    -6

    zip 1024 K (15 times) 0.04 < 10-6

    zip 5120 K 0.17 < 10-6

    Processes (threads) 0.01 < 10-6

    All the tests have been proved being different in means and variances with statistical significance, having

    a p-value not exceeding the standard statistical limit of 0.05. In only one test, the creation of a 10 MB file

    from /dev/randomhas a higherp-value of 0.09, signifying that the two tests have no significant difference:

    an analysis of the data showed the presence of an outlier in the Linux system with a total context switches

    count of 1755. Applying an outlier filtering the test showed ap-value < 10-6

    .

    Additionally a Sign test(Abdi, 2006) was performed in order to compare the performances in the overall

    contexts. In this case a positive match was given to MacOS X if the 1-tailoredp-value of the t-Test was less

    than the standard threshold of 0.05. The sign test resulted in the probability of the two series not being

    statistically different ofp

    < 2.9410

    -3

    .

    3. SHARED LIBRARIESA common conduct in the UNIX world is to subdivide a complex program into small pieces, usually

    libraries. This approach is obviously conforming to the KISS principle, and of course is not limited to the

    UNIX world. During years of life, this procedure of using shared libraries have become so commonly spread,

    that there is the ongoing belief that a software using a shared library should install it system-wide, so that

    other softwares could use it as well. Another opinion is that using system-wide shared libraries would

    decrease the amount of disk space wasted by an uncontrollable duplication of resources. The consequence

    is that on a typical UNIX system we cannot recognize immediately who is a user of a library, and worse, if

    there are any. In the next section we analyze the shared library distribution and relative weight on a UNIX

    system, determining if the opinions about sharing system-wide libraries have a real foundation.

    3.1 Statistical Analysis

    We took a survey of all the shared libraries on our Linux system to find out if effectively those beliefs are

    supported by real evidence. We analyzed all the explicitly shared libraries present on the system, meaning

    that self-contained applications like VMWare or Matlab, were not taken into account, since they do not share

    their dynamic libraries. In order to obtain the total users of a library, we followed the linkage up to the second

    order, thus considering as users of a library, another shared library.

  • 7/31/2019 Unix KISS CaseStudy

    5/8

    3.1.1 Self and Non-self

    Roughly counting the number of shared libraries, we reach a total of 500 common resources present on a

    modern UNIX or UNIX-like system. A mere sorting by number of users shows which libraries can be

    counted as self, and which ones should be classified as non-self on an operating system. An operating

    system by its very definition manages software and hardware resources, and on modern systems provides a

    minimal set of libraries for third-party software development. Softwares, other than the operating system

    itself, have no reason to be used a number of times comparable to a system resource. If this happens, the non-

    self classified library should then be taken into account to be included in the operating system.

    In Table 4 we show the ten most used libraries on both operating systems. In the table we read the total

    count of users of each library along with the number of times they are dynamically linked by other

    applications and libraries. As expected, the most used libraries concern with low-level operations, graphic

    environment and dynamic libraries.

    Table 4. The ten most used libraries with their respective linking count

    Linux MacOS X

    Library Count Library Count

    ld-linux 3294 libSystem 1159libc 3291 libiconv 185libm 1879 libgcc_s 127libdl 1871 CoreFoundation 109libpthreads 1561 libncurses 104libz 989 libcups 92libX11 804 libsasl2 83libglib 781 libssl 78libgobject 759 DirectoryService 75libXext 749 Kerberos 70

    3.1.2 Library Weight

    An interesting insight is given by the shared libraries distribution and relative weight on the Linuxsystem. We divided the libraries into six main categories by number of users, and the results are shown in

    Figures 1 and 2. About half of the libraries present on the system are used less than 10 times, and the most

    used libraries (with more than 500 linkers) are actually about 2% of the total, as pictured in Figure 1. Shared

    libraries linked only once are about 28% of the total, and they can hardly classified as shared. To better

    understand the importance of these shared object, we focused on the weight of each libraries group. The

    weight of each library is the size that would have been occupied on a storage medium in case of static

    linkage. Figure 2 shows the weight of each group. The most used libraries, linked more than 100 times,

    weight 85% of the total while they amount to 12% of the number of libraries. The least used libraries used

    less than 10 times, weight only for 2% but are 54% of the total.

    This fact draws a shadow on the claim that shared libraries are a quasi-necessity, reducing disk space and

    of course helping the system to remain under the KISS principle. It happens that more than one fourth of the

    libraries are effectively shared just by name, and the half is shared less than 10 times. In this situation it is

    hard to classify a library as part of the operating system (the self), and adds confusion to the maintainabilityof a system keeping its structure not simple and certainly not stupid. To avoid this intrinsic disorganization,

    we can move the least used libraries from a system-wide location to the application itself, with a static

    linkage or bundling them into an application directory la NeXT. The great benefit is evident: reducing the

    number of unknown libraries avoiding possible orphans, and having a better view of what is part of the

    system and what is not. The disk space overhead caused by the duplication of resources is clearly dependent

    on the limit we impose on the number of library users. For instance, if we had statically linked all libraries

    with less than 20 users on our system, we would add 450 MB of space removing 340 shared objects.

    Although the disk space evidently increases, it remains far under acceptable bounds: removing libraries used

    less than 20 times on a 40 GB hard drive, nowadays considered a small one, would add 1% of the disk space

    decreasing the number of shared objects by 68%.

  • 7/31/2019 Unix KISS CaseStudy

    6/8

    We strongly stress the fact that we chose one of the cleanest and most coherent Linux distribution, having

    just one and only one desktop environment and so not adding other shared libraries for just a single

    application. Other UNIX systems that include more environments, such as Solaris that includes both CDE

    and JDE, or AIX including KDE as well as Gnome, have worse results. On MacOS X the number of shared

    libraries are far less than those present on other systems: since all applications bundle all their resources,there is no library installed system-wide by applications with few obvious exceptions (e.g. device drivers,

    kernel extensions).

    Figure 1. Number of libraries per number of linkers

    4. UNIX HERITAGEAs UNIX was developed, it followed the KISS principle in almost every aspect, although it was not a

    requirement. The everything is a file philosophy, which is characteristic to every UNIX system, was

    present from the very beginning. The first UNIX system already contained the dev directory with the special

    device files. This abstractive approach to devices, files and directories is clearly KISS-compliant, as itpursues an extreme simplicity and coherence in handling files, directories, devices and even IPC-related files

    with a simple API.

    At its birth the directory structure on UNIX was also very simple, as we can see in (Thompson and

    Ritchie, 1971). In the first UNIX there were just few directories, bin, etc, and usr. The first two were

    integral part of the operating system, being respectively the place where the system binaries were stored, and

    where other things regarding the system were to be found (e.g. system libraries, configuration files). The

    usr directory was the place where users had their own personal space, thus properly the users directory.

    Reading the manual we can have the clear perspective of the author's intention: the clear distinction of roles

    between system and users, as the authors themselves say user-maintained programs are not considered part

    of the UNIX system, referring to the Section 6 of the UNIX manual.

  • 7/31/2019 Unix KISS CaseStudy

    7/8

  • 7/31/2019 Unix KISS CaseStudy

    8/8

    header files. Although it might at first seem convincing, this habit is comparable to the practice of separating

    application resources from the application itself. A header by itself has almost no usage without the library it

    describes, so by their purpose they should not be stored in different places. As for applications, storing all the

    available header files in a single location does not help in keeping a resource simple and immediately

    recognizable. Bundling headers and respective binary library in a single location is again a possible solution,avoiding the spread of files in many directories and increasing the system simplicity.

    Again a NeXT approach to software deployment gives a solution to this unreasonable complexity, which

    keeps a system not certainly simple and stupid to understand and maintain. We address NeXT in particular

    because it was a UNIX operating system, but bundles were actually not limited to the NeXT OS: for example

    BeOS applications were bundles even though it was not a UNIX. Bundling all the application-related files in

    a single location makes it simple to recognize an application resource from a system one. In the modern

    NeXT descendant, MacOS X, we can clearly see an effort in simplifying the system, by using application and

    library bundles. Moreover, it introduced locations with significant names like Applications, System,Library,

    and re-establishing the Users directory. Despite the efforts, the system still has common UNIX directories,

    which of course could have been easily avoided retaining a compatibility with the past.

    5. CONCLUSIONWe have analyzed some of the main concerns about the adherence to the KISS principle by two of the

    most used UNIX operating systems available to a general public, MacOS X and Linux. We have proved, with

    statistical evidence and validation, that a microkernel, complying with the KISS principle by design, has no

    more context switches than a classic monolithic kernel, thus negating the opposition to the first family of

    operating system cores. This also proves that if there is a performance difference between the two, is not due

    to the number of context switches. In addition, we examined the current shared library situation, showing that

    a simplification process is needed to satisfy the simplicity of maintenance that modern systems require.

    Moreover, the space required by strictly limiting the number of libraries not only have an insignificant impact

    on modern storage media, but reduce the number of shared resources at least by 50%. Born following the

    KISS principle, UNIX had become a huge and habit-prone system. Vestigial heritages are still present, as in

    the MacOS X system, but have no reason to exist anymore. The KISS principle was of course present even in

    the first version of Bell Labs UNIX, but evidently it was swiftly replaced by habits that still in our times taint

    the simplicity and logic of the original intents.

    ACKNOWLEDGEMENT

    A brief acknowledgement.

    REFERENCES

    Abdi, H., 2006, Binomial Distribution: Binomial and Sign Tests. Encyclopedia of Measurement and Statistics. Neil J.

    Salkind Editor, Sage Publications, Inc.

    Casella, G. and Berger, R. L., 2001. Statistical Inference. Duxbury Press, Duxbury Advanced Series, USA.

    Freedman, D., 2005. Statistical Models: Theory and Practice. Cambridge University Press, New York, NY, USA.

    Hastie, T. et al., 2003. The Elements of Statistical Learning. Springer, New York, NY, USA.

    IEEE, 2003. Standard for Information Technology-Standardization Application Environment Profile-POSIX Realtime

    and Embedded Application Support (AEP).Institute of Electrical and Electronics Engineers, STD 1003.13-2003.

    Ritchie, D. M. and Thompson, K., 1983. The UNIX time-sharing system. Communications of the ACM, Vol. 26, No. 1,

    pp 84-89.

    Salus, P. H., 1994.A Quarter Century of UNIX. Addison-Wesley Professional, Boston, MA, USA.

    Thompson, K. and Ritchie, D. M., 1971, UNIX Programmer's Manual. Bell Laboratories.