Security and Performance Modeling: An Environment and Tools for Measurement-Based Evaluation of the Availability of Field Systems
The thrust of this work is to develop an automated methodology to monitor, measure, and assess the reliability of the Internet from different perspectives. Measuring the accessibility of Internet sites is crucial in providing the necessary feedback for planning future expansion of the Internet. Currently, the data collection on the accessibility of a set of 100+ most-frequently-visited Web servers is in progress. Examples of possible causes of failures to connect include congestion along the route to the server, the server being down for maintenance, or the gateway timeout. The collected data is being used to study the patterns of failures both in the behavior of the network as a whole and from a user perspective. The research is supported by an automated facility, ANALYZE-NOW, which consists of an extensive monitoring and data-gathering environment (see Figure 1). This environment is capable of identifying:
- major error and failure modes,
- performance and availability bottlenecks,
- interactions between hardware and software,
- correlation between availability and performance,
- critical error-propagation paths,
- failure recurrence, and
- measurement-based performance and availability models.
Future work includes an implementation of user-level recovery for distributed Web-based applications, metrics to describe reliability of Internet network hosts, and development of new tools to evaluate host reliability.