A Cloud-based Software Infrastructure for Distributed Simulation

A Cloud-based Software Infrastructure for Distributed Simulation

Cloud computing services range from simple data backup to the possibility of deploying entire compute clusters or data centers in a remote environment. Companies can utilize cloud computing to serve peak-loads on their IT infrastructure by automatically renting the needed resources on-demand from cloud computing providers. The flexibility, cost-efficiency and user-friendliness of cloud services make it also an attractive model to address computational challenges in the scientific community. Individual research groups do not need to provide and maintain IT-infrastructure on their own, but instead rely on cloud services to satisfy their needs.

Different abstraction layers for cloud services exist which are commonly referred to as “Infrastructure-as-a-Service” (IaaS), “Platform-as-a-Service” (PaaS) and “Software-as-a-Service” (SaaS). A specialized cloud service (“Simulation-Platform-as-a-Service” (SimPaaS)) can be implemented inside that cloud stack that eases and optimizes cloud utilization for scientific simulations.

Our SimPaaS cloud will implement a platform on top of IaaS, which provides virtualized resources for automatic distributed and scalable deployment of simulation applications.

Specific simulation applications can be implemented as SaaS on top of the simulation platform.

For evaluation purposes we set up a prototypical IaaS cloud based on the open-source cloud operating system OpenStack, which is also widely used in the industry. The long-term goal is to provide a cloud-based software infrastructure which supports the simulations conducted in the context of the SWZ Clausthal-Göttingen.

Our research interests comprise three aspects of a SimPaaS cloud, which are discussed in the following.

Ease of use

The abstraction level and flexibility of clouds offers the chance to free the individual scientists from the burden of infrastructure provisioning and management. We are investigating how a cloud-based software infrastructure can support scientists throughout the whole simulation life-cycle, which involves modeling, deployment, execution, monitoring, and collection and representation of the simulation results. In the modeling phase, software engineering techniques known from model-driven development can be utilized. One option is to extend a graphical modeling language (e.g. UML) to support the modeling process for simulations. Another option is the development of a domain-specific language (DSL) targeting the modeling of cloud-based simulation.To ease the deployment we are also investigating techniques to automatically map the developed simulation model to a cloud-based infrastructure.

Reliability

The reliability of our SimPaaS cloud and applications deployed on it will be assessed and improved by modeling the entire Cloud system including applications hierarchically. In this model, dependencies between cloud components (physical servers, instances, web servers, services, etc.) will be established by exploring the execution process of the application and the structure of cloud systems.

Based on the dependencies, the reliability of applications will be estimated and the influences of cloud components reliability to applications reliability will be determined.

Then according to the influences and cloud- and application-specific constraints, such as reliability level requirement, resource usage limitation and performance requirement, strategies improving reliability will be engineered. These strategies will give optimal suggestions for specific scenarios. The suggestions will show us how to improve the structure (e.g. where and how many instances should be started) or which components’ reliability should be promoted to improve the applications’ reliability.

Efficiency

According to survey results released at 2013 OpenStack Summit in Hong Kong, the three key business drivers for OpenStack clouds are: cost savings, operational efficiency and the open platform. In our project we specifically focus on one of these key factors – operational efficiency or, in other words, high performance of our cloud environment. 

To achieve such efficiency in resource utilization, we consider investigating and implementing a few options, including, but not limited to:

  • developing alternative scheduling mechanisms for Virtual Machines (VMs)
  • enabling live migration of VMs in the process of simulation runs
  • reducing networking and virtualization latencies (e.g. TCP/IP communication overhead)
  • raising the priority of some of the cloud services via running them inside a real time OS

Alternative scheduling mechanisms for the VMs can be implemented in the cloud by modifying nova scheduler to take into account optimal hardware requirements for any given simulation. One of the objectives is to enable VM scheduling with more control (e.g. with a given priority, or at a given time). Another objective could be to adjust these scheduling mechanisms to particular simulation classes.

Live migration of VMs could make it possible rearranging and reassigning of these instances to more suitable hardware, which was unavailable when instances were initially assigned. This could potentially lead to better utilization of resources and increased performance.

One of the ways to reduce networking and/or virtualization latencies would be to eliminate TCP/IP overhead by either minimizing its impact or excluding it from the workflow altogether.

Yet another way to achieve higher efficiency of the cloud would be to patch host operating system (in our case Ubuntu) with real time capabilities and rearrange the priority of the services running inside the cloud, for instance, giving nova scheduler the highest priority to make sure it completes its tasks before other services start using hardware resources.

 

Use-case Simulations

To evaluate our approach, we are collaborating with scientists from different research areas, who provided example simulations from their field of study. Three applications have been selected to provide a heterogeneous testbed for our SimPaaS cloud:

  • Material Simulation (Prof. Brenner, TU Clausthal)
  • Monte-Carlo Simulation in High Energy Physics (Prof. Quadt, Uni Göttingen)
  • Modeling and Optimization of Public Transport Networks (Prof. Schöbel, Uni Göttingen)

Involved Scientist

Publication list

2016

  • F. Glaser, Domain Model Optimized Deployment and Execution of Cloud Applications with TOSCA , Proceedings of the 9th System Analysis and Modelling Conference (SAM 2016), Saint-Malo, France, 2016. 

2015

  • F. Glaser, J. N. Serrano, J. Grabowski, A. Quadt, ATLAS user analysis on private cloud resources at GoeGrid , Proceedings of the 21st Conference on Computing in High Energy Physics and Nuclear Physics (CHEP 2015), 13-17 April, Okinawa, Japan, available online: http://iopscience.iop.org/article/10.1088/1742-6596/664/2/022020, 2015. 
  • F. Glaser, Towards Domain-Model Optimized Deployment and Execution of Scientific Applications in Cloud Environments , Proceedings of the Doctoral Symposium at the 5th Conference on Cloud Computing and Services Sciences (DCCLOSER 2015), Lisbon, Portugal, 2015.
  • M. Göttsche, F. Glaser, S. Herbold, J. Grabowski, Automated Deployment and Parallel Execution of Legacy Applications in Cloud Environments, Proceedings of the 8th IEEE International Conference on Service Oriented Computing & Applications (SOCA 2015), Rom, Italy, 2015.
  • H. Richter, About the Suitability of Clouds in High-Performance Computing, to be published in Proc. ISC Cloud&Big Data, Sept. 28–30, Frankfurt, Germany, 2015.
  • H. Richter and A. Keidel and R. Ledyayev, Über die Eignung von Clouds für das Hochleistungsrechnen (HPC), in IfI Technical Report Series ISSN 1860-8477, IfI-15-03, editor: Department of Computer Science, Clausthal University of Technology, Germany, 2015.

2014

2013