Work performed in the DEPEND group is fundamentally reinventing the design of computer systems, from hardware to applications. Systems are designed to use artificial intelligence and machine-learning to control and optimize large-scale heterogeneous computer systems to meet the performance and resiliency requirements of those emergent applications.
Two recent publications on scheduling application kernels to achieve high performance, and detecting failures and performance anomalies in heterogenous systems are accepted at ICML 2020 and Supercomputing 2020.
- Saurabh Jha , Shengkun Cui, Subho Banerjee , Tianyin Xu, Jeremy Enos, Mike Showerman, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer (2020). Understanding, Detecting, and Localizing Failures in High-Performance Storage Systems. Proceedings of the International Conference for High-Performance Computing, Networking, Storage and Analysis 2020 Nov 17 (SC 2020)
- Subho S Banerjee, Saurabh Jha, Zbigniew Kalbarczyk, Ravishankar K. Iyer. Inductive Bias-driven Reinforcement Learning For Efficient Schedules in Heterogeneous Clusters. Thirty-seventh International Conference on Machine Learning (ICML 2020).