Loading…
Visit the OpenStack Summit page for the latest news, registration and hotels.
Wednesday, November 5 • 13:50 - 14:30
Cold Start Booting of 1000 VMs Under 10 Minutes

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!



The Worldwide LHC Computing Grid (WLCG) is a global collaboration that is analyzing CERN's LHC data, and it consists of more than 170 computing centers in 40 countries.

 

The CMS experiment is maintaining a large infrastructure to readout and filter the data from the detector. The High Level Trigger (HLT) cluster is composed of 15k cores in 1500 compute nodes dedicated to online data and event filtering. However, this resource is used only for about 30% of the time, due to the accelerator duty cycle and the various maintenance periods and the rest of time it is free. Only during these unused times an OpenStack cloud is started on top of the cluster allowing it to join the WLCG for offline data analysis.

 

For running the required software for the offline data analysis, specific system images are generated and distributed through the cloud. The OpenStack s image service (Glance) has to distribute a 1,7 GB image to almost 1500 servers, so the VMs can boot. Every time images are added or changed, Glance has to redistribute the new images to all nova compute nodes which poses challenges. Due to the usage pattern, a rapid startup is needed to maximize the available cluster time of sometimes only a few hours.

 

In this presentation we will discuss how we managed to boot all the available nova compute nodes, in under 10 minutes. Also the specificities of this standard cluster usage and opportunistic cloud usage will be explained, as will the details of the deployment of the infrastructure, the issues encountered and their solutions.




Speakers
avatar for Anastasios Andronidis

Anastasios Andronidis

Technical Student, CERN
March, 2010 9th Quattor Workshop , Quattor workshop focused on current status of the production release of the compiler and new features available in the development release . Participant October, 2012 RO-LCG 2012 , Using a cloud infrastructure for the on demand provisioning of worker... Read More →


Wednesday November 5, 2014 13:50 - 14:30 CET
Room 251

Attendees (0)