# Interdisciplinary Research Unit D1

# Data-driven materials science

## Research Unit Leader: Sergio Conti, Michael Griebel

## PIs: Sergio Conti, Michael Griebel, Stefan Müller

## Contributions by Michael Ortiz (Bonn/Caltech), Marc Alexander Schweitzer

## Topic and goals

The IRU on data-driven materials science (DDMS) combines (big) data analysis of empirical and simulation data with multiscale modeling techniques to understand material behavior and design new materials and processes for their synthesis. The general aim is to build simulation strategies around large data sets, and not around specific empirical material models. In particular, we will focus on molecular dynamics with data-mined potentials from electronic structure calculations, and on relaxation and microstructure formation in a data-driven setting, with application to multiscale crystal plasticity. DDMS will require new analytical frameworks, new computational methods, and new multiscale concepts in order to first generate fundamental model-free material data and then to use them.

## State of the art, our expertise

Currently, data science profoundly influences fields such as finance, marketing, social sciences, security, policy, and medical informatics. However, the full potential of data science as it relates to science, technology, engineering, and mathematics (STEM) in general, and materials science in particular, is yet to be realized. We shall contribute to this goal by developing a DDMS approach tailored to scientific computing and analysis, with the potential for changing the way in which material data is generated and utilized by science and industry. DDMS will also forge new and far-reaching connections with data mining, specifically, through the use of multiscale analysis for purposes of generating fundamental model-free material data. This IRU has a team with a broad background, ranging from analysis and calculus of variations to continuum mechanics, scientific computing, algorithmic developments, modeling, and applications to physics and engineering problems.

**Molecular dynamics with data-mined potentials**. Molecular dynamics is traditionally based on empirical potentials, which are models with parameters fitted to a few data points from experiments and electronic structure calculations. Here, the data-driven perspective consists in replacing these models by massive simulation data. The research group of Griebel at the INS and the Virtual Materials Design division of Hamaekers at Fraunhofer SCAI possesses strong expertise in numerical simulation, multiscale methods, high-dimensional techniques, and machine learning approaches, especially for computational materials science and computational chemistry. In particular, a parallel molecular dynamics software code was developed [GKZ07] and successfully applied to several problems in materials science and nanotechnology [GH04]. Moreover, methods for dimensionality reduction of highdimensional data [GH14, BGG16] and error estimates for multivariate regression [BG17] were introduced. In [BBHM17] local descriptors for machine learning approaches for many-body systems were developed. These provide, together with a recently introduced sparse-grid-based adaptive multiscale approach for electronic structure methods [GHC], the basis for data-driven molecular dynamics; see Figure (a).

**Data-driven solid mechanics**. Solid mechanics is traditionally based on solving an initial-boundary value problem with models for the stress field obtained by fitting a few parameters to experimental data or atomistic simulations. For elasticity, the data-driven approach replaces the strain-stress model with massive data in strain-stress space; see Figure (b). On the analytical side, Conti, Müller and Ortiz worked on the development of variational tools for the study of materials, such as the theory of relaxation, including in particular A-quasiconvexity and J-convergence. They applied these general tools to specific problems in classical modeldriven mechanics, such as shape-memory alloys, thin-film elasticity, dislocations and crystal plasticity [CO05, CGO15, CGM16]. At the modeling level, one approach to data-driven continuum mechanics was developed by Ortiz and coworkers [KO16, KO17a], including first applications to dynamics [KO17b]. In the case of geometrically linear elasticity, this amounts to finding the field of strain-stress pairs ("; _) with the smallest distance from a given data set of pairs of matrices, where " is a symmetrized gradient and _ has prescribed divergence. Very recently, Conti, Müller, and Ortiz formulated a new mathematical approach to this type of problem, including in particular existence and approximation results and a new framework for relaxation [CMO18].

## Research program

**Molecular dynamics with data-mined potentials**. The underlying model on the scale of atoms is the Schrödinger equation. Due to its high dimensionality, approximate electronic structure methods like Hartree–Fock, configuration interaction, Möller–Plesset, coupled cluster, and density functional theory have been derived and applied with great success. Following the data-driven science paradigm, one main aim is the development of force fields which are generated from elec- tronic structure calculations by data-mining techniques. To this end, we will first propose novel distances in configuration and phase space based on multiscale descriptors for atomic environments. These will be used in the framework of a multiscale many-body expansion as introduced in [GHC] to generate data-driven force fields, which account for short-range and long-range interactions. A further aim is to sample the force field space in a goal-oriented fashion. To this end, we will develop new importance sampling and active learning approaches. In this framework, we will also apply dimension reduction and manifold learning techniques to the high-dimensional chemical space of all possible materials and molecules. Here, a further aim is to describe this chemical space by a reduced number of appropriate continuous degrees of freedom, which can be used to develop predictive models with application in virtual screening and design approaches. In addition, our intention is to derive generative models to identify novel materials and molecules with desired properties. Finally, we will address uncertainty quantification across scales in datadriven molecular dynamics and multiscale modeling. This is linked to uncertainty quantification problems also studied in RA B2.

**Data-driven solid mechanics**. We will develop a mathematical framework for a data-driven approach to continuum solid mechanics, going beyond the linear elasticity setting mentioned above. The classical formulation, based on the vectorial calculus of variations with the deformation field as the independent variable, is intimately related to the idea of having specific models for the elastic energy, the dissipation and the elastic stress as functions of the strain and of a certain set of internal variables and their rates of evolution. The data-driven formulation contains a larger set of independent variables and avoids specific models for their interdependencies, as already shown in our preliminary work on geometrically linear elasticity [CMO18]. The extension to viscoelasticity, damage, plasticity and fracture will require the introduction of history dependence. The space of histories is much too large to be sampled effectively, therefore it is important to be able to select a few degrees of freedom which can enrich the local phase space. The theory needs to account for two types of parameters: measurable quantities with a clear physical meaning, such as stress, strain, temperature and energy density, and internal variables, which – consistent with our model-free approach – should not result from heuristic modeling but instead arise spontaneously from the data, much as in deep learning approaches. In this formulation, the DDMS problem will necessarily include a discrete data set of material points. This requires new discretization and convergence criteria; the different level of reliability of different points in the data set calls for stochastic modeling components and relates naturally to importance sampling and uncertainty quantification. We intend to demonstrate the applicability and the usefulness of the general tools of DDMS to the study of polycrystal plasticity. Crystal plasticity is one field the traditional multiscale approach has struggled to cope with; the development of a new, data-driven multiscale framework can potentially lead to improved simulation methods and improved understanding of material be- havior. The analytical and variational methods have strong connections to RA C1, although here the focus is on model-free, data-driven approaches, whereas C1 deals with explicit models.

## Summary

Data-driven approaches have recently achieved impressive successes in many fields of research. This IRU will explore their applicability to materials science over several length scales, ranging from molecular dynamics to macroscopic simulations in continuum mechanics. The IRU combines expertise from numerical simulation, analysis, and material modeling and aims both for abstract methodological development and for concrete application to specific problems.

**Structural remarks**. This IRU will be jointly funded by HCM and the Fraunhofer Institute SCAI. It will greatly benefit from part-time recruitment of Ortiz, a leading expert in mechanics and materials science, as a Bonn Research Chair for the period 2016–2025.

## Bibliography

[BBHM17] J. Barker, J. Bulin, J. Hamaekers, and S. Mathias. LC-GAP: Localized Coulomb descriptors for the Gaussian approximation potential. *In Scientific Computing and Algorithms in Industrial Simulations, pages 25–42. Springer, 2017*.

[BG17] B. Bohn and M. Griebel. Error estimates for multivariate regression on discretized function spaces. *SIAM J. Numer. Anal., 55(4):1843–1866, 2017*.

[BGG16] B. Bohn, J. Garcke, and M. Griebel. A sparse grid based method for generative dimensionality reduction of high-dimensional data. *J. Comput. Phys., 309:1–17, 2016*.

[CGM16] S. Conti, A. Garroni, and S. Müller. Dislocation microstructures and strain-gradient plasticity with one active slip plane. *J. Mech. Phys. Solids, 93:240–251, 2016*.

[CGO15] S. Conti, A. Garroni, and M. Ortiz. The line-tension approximation as the dilute limit of linearelastic dislocations. *Arch. Ration. Mech. Anal., 218:699–755, 2015*.

[CMO18] S. Conti, S. Müller, and M. Ortiz. Data-driven problems in elasticity. *Arch. Ration. Mech. Anal., 2018. DOI:10.1007/s00205-017-1214-0*.

[CO05] S. Conti and M. Ortiz. Dislocation microstructures and the effective behavior of single crystals. *Arch. Rat. Mech. Anal., 176:103–147, 2005*.

[GH04] M. Griebel and J. Hamaekers. Molecular dynamics simulations of the elastic moduli of polymer-carbon nanotube composites. *Comput. Methods Appl. Mech. Engrg., 193:1773– 1788, 2004*.

[GH14] M. Griebel and A. Hullmann. Dimensionality reduction of high-dimensional data with a nonlinear principal component aligned generative topographic mapping.* SIAM J. Sci. Comput., 36(3):A1027–A1047, 2014*.

[GHC] M. Griebel, J. Hamaekers, and R. Chinnamsetty. An adaptive multiscale approach for electronic structure methods. *Multiscale Model. Simul. to appear*.

[GKZ07] M. Griebel, S. Knapek, and G. Zumbusch. *Numerical simulation in molecular dynamics, volume 5 of Texts in Computational Science and Engineering. Springer, Berlin, 2007*.

[KO16] T. Kirchdoerfer and M. Ortiz. Data-driven computational mechanics. *Comput. Methods Appl. Mech. Engrg., 304:81–101, 2016*.

[KO17a] T. Kirchdoerfer and M. Ortiz. Data driven computing with noisy material data sets.* Comput. Methods Appl. Mech. Engrg., 326:622–641, 2017*.

[KO17b] T. Kirchdoerfer and M. Ortiz.* Data-driven computing in dynamics. Internat. J. Numer. Methods Engrg., 2017. DOI:10.1002/nme.5716*.