- Professor Ravishankar K. Iyer
- Zbigniew Kalbarczyk (Research Scientist)
- Keith Whisnant (Graduate Student)
- Qilun Liu (Graduate Student)
The research proposed is to develop methodologies and tools for designing and implementing very large-scale, real-time embedded computer systems that:
- achieve ultra high computational performance through use of parallel hardware architectures;
- achieve and maintain functional integrity via distributed, hierarchical monitoring and control;
- are required to be highly available; and
- are dynamically reconfigurable, maintainable and evolvable.
The specific application that will drive this research and provide a test platform for it is the trigger and data ac-quisition system for BTeV, an accelerator-based High Energy Physics (HEP) experiment to study matter-antimatter asymmetries (also known as Charge-Parity violation) in the decays of particles containing the bottom quark. BTeV was recently approved by Fermilab and will be constructed over the next 5-6 years to run in conjunction with the Fermilab Tevatron Collider. The experiment is expected to run for at least 5 years. It requires a massively parallel, heterogeneous cluster of computing elements to reconstruct 15 million particle interactions (events) per second and uses the reconstructions to decide which events to retain for further data analysis. Creating usable software for this type of real-time embedded system will require research into solutions of general problems in the fields of computer science and engineering. We plan to approach these problems in a way that is general, and to produce methodologies and tools that can be applied to many scientific and commercial problems.
The classes of systems targeted by this research include those embedded in environments, like BTeV, that pro-duce very large streams of data which must be processed in real-time using data dependent computation strategies. Such systems are inextricably tied to the environment in which they must operate, and must perform complex computations within the timing constraints mandated by their environments. These systems require ultra high performance (on the order of 1,012 operations per second). The level of performance requires parallel hardware architectures, which in the case of BTeV is composed of a mix of thousands of commodity processors, special purpose processors such as Digital Signal Processors (DSPs), and specialized hardware such as Field Programmable Gate Arrays (FPGAs), all connected by very high-speed networks. The systems must be dynamically reconfigurable, to allow a maximum amount of performance to be delivered from the available and potentially changing resources. The systems must be highly available, since the environments produce the data streams continuously over a long period of time, and interesting phenomena important to the analysis being done are rare and could occur in the data at any time. To achieve the high availability, the systems must be fault tolerant, self-aware, and fault adaptive, since any malfunction of processing elements, the interconnection switches or the front-end sensors (which provide the input stream) can result in unrecoverable loss of data. Faults must be corrected in the shortest possible time, and corrected semi-autonomously (i.e., with as little human intervention as possible). Hence distributed and hierarchical monitoring and control are vital.