An Autonomic Hybrid Multiagent Service Architecture to Reduce IT Troubleshooting
Fernández Carrasco, Luis M.
MetadataShow full item record
This dissertation is submitted to the Graduate Programs in Mechatronics and Information Technologies of the School of Engineering in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Information Technologies and Communications, with a major in Intelligent Systems. This document describes a novel architectural design that provides an operating-system-like service as it includes autonomic computing features. The objective of this new service is to reduce troubleshooting in general by promoting a self-managing paradigm. The use of computing systems is nowadays something that is taken for granted. One just needs to look around and will easily see that there is a computation process going on in almost every direction. Moreover, now such processes are not just restricted to personal computers but to devices such as cellular phones, PDAs, laptops, etc. In other words, computing systems now are ubiquitous. Furthermore, these devices are not isolated units of processes but are interconnected and can send and receive information at any time, anywhere. The Internet adds another layer of complexity to this already labyrinthine setting. The demands that now people and current business models place on computing systems go from running a simple application, where the hardware was not built specifically for such application, to a cooperative network where all constituents are using a variety of systems and commands. As it can be seen, managing all these networked devices as a whole in a robust and transparent manner demands a lot of resources and time. Nevertheless, it is something that has to be done. IBM observed the problem that managing sets of heterogeneous devices that need to work, cooperate and communicate with each other represented. They perceived this problem as the main obstacle to more progress in the IT industry, i.e., complexity was threatening the development of better IT solutions. Consequently, in 2001, IBM launched its autonomic computing initiative which main objective is to have self-managing systems, more specifically, systems that are self-configuring, self-optimizing, self-healing and self-protecting. Thus, leaving really important tasks to human involvement and delegating administrative tasks to the system itself. Something similar to what the human autonomic nervous system does. The IBM initiative has caught up the attention of a number of institutions both from the IT industry and from academia as everyone sees that managing current and future IT environments will demand a new paradigm. The research project, that this dissertation presents, tackles one aspect of the problem described lines above. The objective is to reduce the troubleshooting in general that a typical user faces when using a personal computer. Consequently, the solution proposed is to design a new system architecture that provides an operating-system-like service which shows all four characteristics that autonomic computing looks for, i.e., self-configuration, self-optimization, vi self-protection and self-healing. This approach is supported by the fact that operating systems are the ones that, at a low level, handle most computing resources and that they are present in all devices which ensures the applicability of the proposed solution and its impact. Moreover, current approaches to autonomic computing systems are usually built on top of non-autonomic ones. That approach may not be right as what one wants is to have a fully autonomic system. Furthermore, having an operating-system-like service that is indeed autonomic allows the development of other autonomic systems that could run on top of it. What is more, current autonomic systems initiatives do not fully integrate all four characteristics, whereas this research does so. The model is a combination of a multiagent design, selection methods found in nature- based algorithms, and learning techniques. The main idea is to mold each component that a typical operating system manages as an agent, incorporate performance criterion evaluators to select the best candidate to perform a task (self-optimization), provide a flexible yet robust communication protocol among agents that allows the execution of any job (self-configuration), and implement specialized agents that supervise and learn from threats and normal program executions in order to keep the system running (self-protection and self-healing). The system was evaluated using a multiagent simulation and, although there might be some objections to this testing approach, a lot of effort was put in order to have the simulated environment perform similarly to one found in real-life systems. Consequently, a programming language was created, named HAL, which allowed the simulation of applications, both benign and harmful, running in a multiagent environment asking for resources. Thus, the simulated prototype is very close to a computer that runs applications, allowing a proper evaluation of the proposed design. Something worth pointing out is the fact that autonomic computing in general is very task dependant and that there are quite few approaches to having an autonomic operating system service provider (e.g., Unity). This fact did not allow a precise one-to-one comparison with other closely related works as people are applying the autonomic computing paradigm to a variety of systems where what changes is the way such feature is achieved. Nevertheless, this research has provided some contributions to the field, namely, a novel architecture for future OS design that is autonomic, a low-level service that shows all four autonomic features, a framework for simulation of autonomic systems, a programming language that can be used to implement testing and evaluation of autonomic systems, and a way to measure and evaluate self-? properties in computing systems. The following chapters of this document present the work that was conducted in order to achieve what has been described lines above, including the first approach to this problem, excitable media. Excitable media provided the guidelines and set the path in order to achieve the results that this research found, results that reaffirm the idea that, at the beginning steered this research project, a multiagent autonomic operating system service is a good way to reduce troubleshooting by providing a self-managing environment.