Monitoring Analytics for In Situ Workflows at the Exascale
The Mona project is a collaboration between Georgia Tech, University of Oregon, Oak Ridge National Laboratory, and Princeton Plasma Physics Laboratory. Led by Dr. Greg Eisenhauer, the project is aimed at providing scalable platform and application monitoring for in situ workflows at the exascale.
Personnel:
- { Karsten Schwan – Georgia Tech (GT) -- passed away 10/28/2015 }
- Hasan Abbasi – Oak Ridge National Lab (ORNL)
- Greg Eisenhauer – Georgia Tech (GT)
- Stephane Ethier – Princeton Plasma Physics Lab (PPPL)
- Kevin Huck – University of Oregon (UO)
- Allen Malony – University of Oregon (UO)
- Matthew Wolf – Georgia Tech (GT)
Project Description. Future exascale machines will create enormous data volumes: if each program on each machine node only produces just one GB of data (the size of a video captured on a smartphone) for each of its many output steps, moving the resulting terabytes of data to storage would engender inordinate storage and energy cost as well as large delays in extracting useful scientific results from the many terabytes of stored data generated by every single program run. This has led to a revolution in data management, from previous 'offline' solutions to new 'online' methods that inspect and analyze output data on the same machine where data is created. Such online data management co-running with simulations accelerates the scientific processes being carried out, provides rapid and timely scientific insights, and can help avoid unnecessary and scientifically invalid simulation computations.
The MONA(lytics) project seeks to understand, evaluate, and ultimately, control the online data flows generated by future exascale applications and the analytics processing applied to those flows: their volumes, speeds, and processing needs; the energy saved by online vs. offline data processing; the effects of next generation computer hardware and of the new ways of performing data management; and the tradeoffs in how well data is analyzed vs. the costs of doing so, when approximate methods are sufficient for the immediate scientific insights being sought.
The project's approach to understanding future data management methods is experimental:
- create new methods for performance monitoring and understanding -- monitoring analytics (monalytics) -- to provide developers and scientists with a deep understanding of the performance properties of data management actions taken for the exascale simulations being run;
- implement those methods and apply them to key high end simulations driving U.S. efforts toward the exascale regime; and
- ensure that those methods are efficient, by co-running them with simulations in ways that do not hinder simulation and thus, science progress, all with the goal to improve exascale data management as well as its utility to the science applications being run.
MONA team members have extensive experiences with online and performance monitoring, producing widely used software tools (e.g., ADIOS at ORNL, TAU at UO, EVPath at GT), interacting with a broad range of science code developers and users (e.g., the fusion modeling community – Ethier at PPPL), working with high end machines in the U.S. DOE National Labs (e.g., Titan at ORNL), and involved in U.S. industry efforts to create future computing technologies (e.g., Intel Corp.).
Project Goals and Impact
- Evaluate, analyze and understand performance of online data management workflows
- Explore metrics, such as energy saved, in online vs. offline data processing to aid understanding of tradeoffs
- Evaluate and understand co-scheduling of analysis and visualization tasks with simulations to improve utility of online data management workflows
- MONA aims to provide workflow orchestration systems with end to end feedback on performance metrics
- MONA’s models will provide the framework for smarter resource utilization for data management workflows
- MONA will optimize end to end metrics such as energy consumption and total time to insight for complex exascale workflows
- MONA will provide a framework for users to easily create proxy workflows that can be used within the co-design pipeline