Tuesday, June 2, 2015 |
HG F30 |
10:30 - 11:00 |
The ADES Model for Computational Science, Nicola Marzari (EPFL, Switzerland) Abstract |
Tuesday, June 2, 2015
HG F30, 10:30 - 11:00
The ADES Model for Computational Science; Nicola Marzari (EPFL, Switzerland)
Co-Authors: Giovanni Pizzi (EPFL, Switzerland); Andrea Cepellotti (EPFL, Switzerland); Andrius Merkys (EPFL, Switzerland); Nicolas Mounet (EPFL, Switzerland); Riccardo Sabatini ( EPFL, Switzerland); Martin Uhrin (EPFL, Switzerland); Boris Kozinsky (Robert Bosch RTC, Cambridge MA, USA)
Computational science has seen a meteoric rise in the scope, breadth, and depth of its efforts. Notwithstanding this prevalence and impact, it is often still performed using the renaissance model of individual artisans gathered in a workshop, under the guidance of an established practitioner. Great benefits could follow from adopting concepts and tools coming from computer science to manage, preserve, and share these computational efforts. I will illustrate here our vision for the four pillars that should sustain such effort (the ADES model: Automation, Data, Environment, and Sharing) and discuss their implementation in the open-source AiiDA platform (http://www.aiida.net).
11:00 - 11:30 |
Data Management in Climate Science - Cost-Benefit Considerations at DKRZ, Thomas Ludwig (DKRZ, Germany) Abstract |
Tuesday, June 2, 2015
HG F30, 11:00 - 11:30
Data Management in Climate Science - Cost-Benefit Considerations at DKRZ; Thomas Ludwig (DKRZ, Germany)
Co-Authors:
Climate science is highly data intensive and data is the raw material for the scientist for gaining new insights. The German Climate Computing Centre DKRZ stores dozens of Petabyte of modelling output on disks and tapes. Costs for this part of the infrastructure are increasing and we look into methods of how to reduce them. The talk will introduce the most important aspects of data management within climate science and cover various aspects during the lifecycle of data. We will explain how a well-balanced HPC system needs to be configured and which investment and operational costs are generated by it. A look at Exascale systems will conclude the presentation.
11:30 - 12:00 |
Big & Smart, High Energy Data, Jean-Roch Vlimant (California Institut of Technology, USA) Abstract |
Tuesday, June 2, 2015
HG F30, 11:30 - 12:00
Big & Smart, High Energy Data; Jean-Roch Vlimant (California Institut of Technology, USA)
Co-Authors: Maria Spiropulu (Caltech, USA)
The raw data rate at the LHC is 1 Petabyte/sec. In terms of production, capturing, communicating, aggregating, storing and analyzing it this is arguably the biggest and most challenging science data frontier and one that offers solutions for other fields and affords innovation as we move to the next era of the High Luminosity LHC when the needs and requirements can be an order magnitude bigger. I will discuss Big and Smart Data and intelligent data handling systems in high energy physics as we move forward towards the High Luminosity LHC and beyond and highlight the importance of validation and verification that we can afford in this science domain.
12:00 - 12:30 |
The Human Brain Project, Sean Hill (EPFL, Switzerland) Abstract |
Tuesday, June 2, 2015
HG F30, 12:00 - 12:30
The Human Brain Project; Sean Hill (EPFL, Switzerland)
Co-Authors:
The aim of the Human Brain Project (HBP) is to integrate global neuroscience knowledge and data into supercomputer-based models and simulations to accelerate our understanding of the human brain. To do this, HBP will deliver six collaborative ICT platforms: Neuroinformatics, Brain Simulation, High Performance Computing, Medical Informatics, High Performance Computing, Neuromorphic Computing and Neurorobotics. The HBP will create new technologies for interactive supercomputing, visualization and big data analytics; federated analysis of globally distributed data; simulation of the brain; objective classification of disease; and neuromorphic computing systems based on brain-like principles.
13:30 - 14:00 |
Scientific Big Data Analytics at the John von Neumann-Institute for Computing (NIC), Thomas Lippert (Forschungszentrum Jülich, Germany) Abstract |
Tuesday, June 2, 2015
HG F30, 13:30 - 14:00
Scientific Big Data Analytics at the John von Neumann-Institute for Computing (NIC); Thomas Lippert (Forschungszentrum Jülich, Germany)
Co-Authors:
The importance of data analytics, management, sharing and preservation of very big, often heterogeneous or distributed data sets - besides the basic technical requirements transfer and storage - is of increasing significance for science, research and industry. The John von Neumann Institute for Computing, a joint institute by DESY, GSI and Forschungszentrum Jülich in Germany, is going to establish a call for project submission in the field of scientific big data analytics (SBDA). The goal is to extend and optimize the existing HPC and data services. A call for expressions of interest has been launched in order to identify and analyze the needs of the scientific communities.
14:00 - 14:30 |
Big Data Based Materials Discovery, Peter W. J. Staar (IBM Research, Switzerland) Abstract |
Tuesday, June 2, 2015
HG F30, 14:00 - 14:30
Big Data Based Materials Discovery; Peter W. J. Staar (IBM Research, Switzerland)
Co-Authors:
Traditionally, the discovery of new materials is extremely labor-intensive. This approach is not scalable and as such this field of research is ideally suited for a big data based cognitive computing approach. In this talk, we will present how this approach is applied in practice. We will discuss in detail the various aspects ranging from the used data-models to the algorithms applied. Special attention will be given to the design and construction the knowledge graph, which encodes the underlying data and knowledge model. We will discuss in detail how the nodes and edges in the graph are designed and how the algorithms can be used to refine and optimize the edges of the graph.
14:30 - 15:00 |
Communication Efficient Distributed Training of Machine Learning Models, Martin Jaggi (ETH Zurich, Switzerland) Abstract |
Tuesday, June 2, 2015
HG F30, 14:30 - 15:00
Communication Efficient Distributed Training of Machine Learning Models; Martin Jaggi (ETH Zurich, Switzerland)
Co-Authors: Virginia Smith (UC Berkeley, USA); Martin Takáč (Lehigh University, USA);
Jonathan Terhorst (UC Berkeley, USA); Sanjay Krishnan (UC Berkeley, USA);
Thomas Hofmann (ETH Zurich, Switzerland); Michael I. Jordan (UC Berkeley, USA)
Communication remains the most significant bottleneck in the performance of distributed optimization algorithms for large-scale machine learning. We propose a communication-efficient framework, COCOA, that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication. We provide a strong convergence rate analysis for this class of algorithms, as well as experiments on real-world distributed datasets with implementations in Spark. In our experiments, we find that as compared to state-of-the-art mini-batch versions of SGD and SDCA algorithms, COCOA converges to the same .001-accurate solution quality on average 25× as quickly.
15:00 - 15:30 |
Hierarchical Bayesian Models on HPC Platforms, Panagiotis Hadjidoukas (ETH Zurich, Switzerland) Abstract |
Tuesday, June 2, 2015
HG F30, 15:00 - 15:30
Hierarchical Bayesian Models on HPC Platforms; Panagiotis Hadjidoukas (ETH Zurich, Switzerland)
Co-Authors: Panagiotis Angelikopoulos (ETH Zurich, Switzerland); Steven Wu (ETH Zurich, Switzerland); Petros Koumoutsakos (ETH Zurich, Switzerland)
Hierarchical Bayesian modeling provides an inference framework to fuse heterogeneous data into engineering applications. Hierarchical models suffer from extensive computational demands arising from multiple model evaluations and data intensity. We tackle both challenges using our parallel framework for uncertainty quantification. We combine state of the art parallelized sampling schemes to achieve multiple levels of nested parallelism and manage large data volumes generated by the simulations. We demonstrate our approach to the calibration of Pharmacokinetic models using heterogeneous experimental data measurements on supercomputing platforms.