The node monitoring component of a scalable systems software environment
This research describes Fountain, a suite of software used to monitor the resources of a cluster. A cluster is a collection of individual computers that are connected with a high speed communication network. They are traditionally used by users who desire more resources, such as processing power and memory, than any single computer can provide. A common drawback to effectively utilizing such a large scale system is the management infrastructure, which often does not often scale well as the system grows. Large-scale parallel systems provide new research challenges in the area of systems soft-ware, the programs or tools that manage the system from boot-up to running a parallel job. The approach presented in this thesis utilizes a collection separate components that communicate together to achieve a common goal. While systems software comprises a broad array of components, this thesis focuses on the design choices for a node monitoring component. We will describe Fountain, an implementation of the Scalable Systems Software (SSS) node monitor specification. It is targeted at aggregate node monitoring for clusters, focusing on both scalability and fault tolerance as its design goals. It leverages widely used technologies such as XML and HTTP to present an interface to other components in the SSS environment.