Thumbnail
Access Restriction
Subscribed

Author Mueller, Frank ♦ Leangsuksun, Chokchai ♦ Engelmann, Christian ♦ Varma, Jyothish ♦ Bernholdt, David E. ♦ Wang, Chao ♦ Scott, Stephen L. ♦ Gottumukkala, Narasimha R. ♦ Shet, Aniruddha G. ♦ Sadayappan, P.
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Availability ♦ High-end computing ♦ Fault tolerance ♦ Ras ♦ Group membership ♦ Reliability ♦ Monitoring
Abstract MOLAR is a multi-institutional research effort that concentrates on adaptive, reliable, and efficient operating and runtime system (OS/R) solutions for ultra-scale high-end scientific computing on the next generation of supercomputers. This research addresses the challenges outlined in FAST-OS (forum to address scalable technology for runtime and operating systems) and HECRTF (high-end computing revitalization task force) activities by exploring the use of advanced monitoring and adaptation to improve application performance and predictability of system interruptions, and by advancing computer reliability, availability and serviceability (RAS) management systems to work cooperatively with the OS/R to identify and preemptively resolve system issues. This paper describes recent research of the MOLAR team in advancing RAS for high-end computing OS/Rs.
Description Affiliation: Louisiana Tech University, Ruston, LA (Gottumukkala, Narasimha R.; Leangsuksun, Chokchai) || Oak Ridge National Laboratory, Oak Ridge, TN (Engelmann, Christian; Scott, Stephen L.; Bernholdt, David E.) || North Carolina State University, Raleigh, NC (Varma, Jyothish; Wang, Chao; Mueller, Frank) || The Ohio State University, Columbus, OH (Shet, Aniruddha G.; Sadayappan, P.)
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 1975-04-01
Publisher Place New York
Journal ACM SIGOPS Operating Systems Review (OPSR)
Volume Number 40
Issue Number 2
Page Count 10
Starting Page 63
Ending Page 72


Open content in new tab

   Open content in new tab
Source: ACM Digital Library