# HCM Workshop: Nonsmooth Optimization and its Applications

## Monday, May 15

08:30 - 09:00 |
Self-registration |

09:00 - 09:20 |
Opening |

09:20 - 10:00 |
Michael Hintermüller: (Pre)Dualization, Dense Embeddings of Convex Sets, and Applications in Image Processing |

10:00 - 10:40 |
Yuri Ledyaev: Multidirectional mean-value inequalities and dynamic optimization |

10:40 - 11:10 |
Coffee Break |

11:10 - 11:50 |
Adil Baghirov: Nonsmooth DC Optimization: Methods and Applications |

11:50 - 12:30 |
Helmut Gfrerer: On first-order optimality conditions for general optimization problems |

12:30 - 14:30 |
Lunch Break |

14:30 - 15:10 |
James Burke: Variational Analysis of Convexly Generated Spectral Max Functions |

15:10 - 15:50 |
Lukas Adam: Chance constrained problems: A view from variational analysis |

15:50 - 16:20 |
Coffee Break |

16:20 - 17:00 |
Coralia Cartis: Universal regularization methods -- varying the smoothness, the power and the accuracy |

## Tuesday, May 16

09:20 - 10:00 |
Pierre-Antoine Absil: Nonsmooth optimization on manifolds |

10:00 - 10:40 |
Michael Ulbrich: A Proximal Gradient Method for Matrix Optimization Problems Arising in Ensemble Density Functional Theory |

10:40 - 11:10 |
Group photo and coffee break |

11:10 - 11:50 |
Michel Thera: Perturbation of error bounds |

11:50 - 12:30 |
Stefan Ulbrich: Optimization of elastoplastic contact problems with applicationto deep drawing processes |

12:30 - 14:30 |
Lunch Break |

14:30 - 15:10 |
Diethard Pallaschke: Quasidifferentiability and the Minkowski-Radström-Hörmander Lattice |

15:10 - 15:50 |
Miroslav Bacak: On proximal mappings and their generalizations |

15:50 - 16:20 |
Coffee Break |

16:20 - 17:00 |
David Salas: Characterizations of nonconvex sets with smooth boundary in terms of the metric projection in Hilbert spaces |

## Wednesday, May 17

09:20 - 10:00 |
René Henrion: Subdifferential estimates for Gaussian probability functions |

10:00 - 10:40 |
Dmitriy Drusvyatskiy: Expanding the reach of optimal methods |

10:40 - 11:10 |
Coffee Break |

11:10 - 11:50 |
Aris Daniilidis: Self-contracted curves: recent developments and applications |

11:50 - 12:30 |
Napsu Karmitsa: Clustering in large data sets with the limited memory bundle method |

12:30 - 15:10 |
Lunch Break |

15:10 - 19:00 |
Excursion: The Excursion will be a boat trip (90 minutes long) from Alter Zoll to Königswinter and back and then a city tour from Alter Zoll to Bönnsch. People can either meet at Alter Zoll or in front of the Mathematics department. Our meeting in front of the Mathematics department (Endenicher Allee 60) will be at 15:15 and at Alter Zoll 15:45. The boat trip starts at 16:00 and ends at 17:30. Then from 17:45 to 18:45 we have a city tour. |

19:00 - |
Dinner in the Restaurant "Brauhaus Bönnsch" (Sterntorbrücke 4) |

## Thursday, May 18

09:20 - 10:00 |
Boris Mordukhovich: Critical multipliers in nonsmooth optimization via second-order variational analysis |

10:00 - 10:40 |
Dominikus Noll: Non-smooth optimization to control infinite-dimensional systems |

10:40 - 11:10 |
Coffee Break |

11:10 - 11:50 |
Hermann Schichl: An SQP-inspired second order bundle method for nonsmooth nonconvex optimization |

11:50 - 12:30 |
Mikhail Solodov: A globally convergent Linear-Programming-Newton method for piecewise smooth constrained equations |

12:30 - 14:30 |
Lunch Break |

14:30 - 15:10 |
Russell Luke: Quantitative Convergence Analysis of Picard Iterations: a View from Infinte Dimensions |

15:10 - 15:50 |
Somayeh Hosseini: A gradient sampling method on algebraic varieties |

15:50 - 16:20 |
Coffee Break |

16:20 - 17:00 |
Sona Taheri: A DC Optimization Algorithm for Clusterwise Linear Regression |

## Friday, May 19

## Abstracts

## Pierre-Antoine Absil: Nonsmooth optimization on manifolds

Manifolds can be thought of as the mathematically adequate generalization of as the search space of smooth optimization problems. However, there are also several important problems that can be phrased as the optimization of a nonsmooth function on a (smooth) manifold, e.g., sparse PCA, the optimal bounding box problem, some versions of the economic dispatch problem, and range-based independent component analysis. The purpose of this talk is to present some of these applications and to discuss algorithms available to address them.

## Lukas Adam: Chance constrained problems: A view from variational analysis

We consider problems with joint chance constraints where the data are nonlinear, possibly nonconvex, random functions with finite distribution. Such problems are generally hard to tackle as the feasible set is a union of certain sets. We introduce integer variables and by their relaxation we arrive at an optimization problem with a MPCC-like structure. We derive strong stationarity conditions for this problem and relate them to the stationarity conditions of the original problem. Based on MPCC regularizations, we propose an iterative algorithm and show its convergence.

## Miroslav Bacak: On proximal mappings and their generalizations

Proximal mappings were introduced by Moreau in the 1960's. Since then, they have become an indispensable tool in convex analysis and optimization in Hilbert spaces. In this talk I will present possible generalizations of proximal mappings outside of Hilbert spaces.

## Adil Baghirov: Nonsmooth DC Optimization: Methods and Applications

An unconstrained optimization problem with the objective function represented, implicitly or explicitly, as a difference of two convex functions is considered. Numerical methods for solving such problems are designed and their convergence is discussed. Applications of these methods in machine learning and regression analysis are also considered.

## Radu Ioan Bot: The proximal-gradient approach in the nonconvex setting from both a discrete and continuous perspective

We address the minimization of the sum of a proper and lower semicontinuous function with a possibly nonconvex smooth function, first by means of a proximal-gradient algorithm with inertial and memory effects. We prove that the sequence of iterates converges to a critical point of the objective, provided that a regularization of the latter function is a so-called KL function; in other words, it satisfies the Kurdyka-Lojasiewicz inequality. We also approach the minimization problem from a continuous perspective via an implicit dynamical system of proximal-gradient type. Under similar circumstances, the generated trajectory is proved to asymptotically converge to a critical point of the objective. Convergence rates for the trajectory in terms of the Lojasiewicz exponent are also derived. To the class of KL functions belong semialgebraic, real subanalytic, uniformly convex and convex functions satisfying a growth condition.

## James Burke: Variational Analysis of Convexly Generated Spectral Max Functions

The spectral abscissa is the largest real part of an eigenvalue of a matrix and the spectral radius is the largest modulus. Both are examples of spectral max functions---the maximum of a real-valued function over the spectrum of a matrix. These mappings arise in the control and stabilization of dynamical systems. In 2001, Burke and Overton characterized the regular subdifferential of the spectral abscissa and showed that the spectral abscissa is subdifferentially regular in the sense of Clarke if and only if all active eigenvalues are nonderogatory (geometric multiplicity of 1). In this talk we discuss how the same result can be obtained for the class of convexly generated spectral max functions.

## Coralia Cartis: Universal regularization methods -- varying the smoothness, the power and the accuracy

Adaptive cubic regularization methods have emerged as a credible alternative to linesearch and trust-region for smooth nonconvex optimization, with optimal complexity amongst second-order methods. Here we consider a general/new class of adaptive regularization methods, that use first- or higher-order local Taylor models of the objective regularized by a(ny) power of the step size and applied to convexly-constrained optimization problems.

We investigate the worst-case evaluation complexity/global rate of convergence of these algorithms, when the level of sufficient smoothness of the objective may be unknown or may even be absent. We find that the methods accurately reflect in their complexity the degree of smoothness of the objective and satisfy increasingly better bounds with improving accuracy of the models. The bounds vary continuously and robustly with respect to the regularisation power, the accuracy of the model and the degree of smoothness of the objective. This work is joint with Nick Gould (Rutherford Appleton Laboratory, UK) and Philippe Toint (University of Namur, Belgium).

## Aris Daniilidis: Self-contracted curves: recent developments and applications

Self-contracted curves intrinsically link to both continuous and discrete descent dynamics involving convex potentials. In this talk we shall recall initial motivations, main ideas and results concerning this notion, presenting the current state of the art for future developments.

## Dmitriy Drusvyatskiy: Expanding the reach of optimal methods

First-order methods with optimal convergence rates for smooth convex minimization have had a profound impact on computation.

Two important downsides of such accelerated methods, often remarked in the literature, is a lack of geometric intuition and non-monotone behavior. I will begin by describing a new intuitive viewpoint on acceleration based on optimally averaging quadratic lower models of the function, thereby highlighting an immediate relation to cutting plane ideas. Without convexity, best-known complexity bounds worsen to those achieved by simple gradient descent. In the second part of the talk, I will discuss the extent to which acceleration can still be useful outside of the convex setting, focusing on minimizing compositions of convex functions and smooth maps.

## Helmut Gfrerer: On first-order optimality conditions for general optimization problems

The basic first-order optimality condition states that the negative gradient of the objective belongs to the regular normal cone of the feasible region. In the nonconvex case the computation of the regular normal cone is a nontrivial task, and therefore one has to confine himself with some estimates of it. It is easy to find a lower estimate of the regular normal cone yielding so-called S-stationarity conditions. For an upper estimate one often uses the limiting normal cone which results in so-called M-stationarity conditions. Though a comprehensive calculus is available for computing limiting normals, sometimes it is very difficult or even impossible to give a point-based representationof the limiting normal cone. In this talk we provide a tighter estimate for the regular normal cone by means of regular normals to a series on tangent cones to tangent cones of the feasible set. We present an example of a mathematical program with equilibrium constraints, where this approach yields manageable optimality conditions wheras it is impossible to write down the M-stationarity conditions.

## René Henrion: Subdifferential estimates for Gaussian probability functions

A probability function assigns to each value of some parameter (typically: decision vector) the probability of satisfying a given random inequality system depending on that parameter. Such functions play a prominent role in many engineering problems be it in the context of reliability maximization or of probabilistic constraints. Though continuity of probability functions can be easily enforced by mild conditions, differentiability or even Lipschitz continuity cannot be automatically taken for granted even under nicest problem data (smooth inequality system, Gaussian distribution of the random vector). In order to end up at Lipschitz continuity or differentiability, it is reasonable to analyze probability functions in a nonsmooth context first. The talk provides an upper estimate for the limiting (Mordukhovich) subdifferential of Gaussian probability functions in the general continuous case. Taking this as a starting point, suitable additional assumptions will be identified that guarantee the mentioned nicer properties along with subdifferential/gradient formulae ready for use in the algorithmic solution of optimization problems with probabilistic constraints.

## Michael Hintermüller: (Pre)Dualization, Dense Embeddings of Convex Sets, and Applications in Image Processing

For a class of non smooth minimization problems in Banach spaces, predualization

results and their connection to dense embeddings of convex sets are discussed. Motivating

applications are related to nonsmooth filters in mathematical image processing. For this

problem class also some numerical aspects are highlighted including primal/dual splitting

or ADMM-type methods as well as proper (numerical) dissipation reducing discretization.

## Somayeh Hosseini: A gradient sampling method on algebraic varieties

This talk is concerned with the numerical solution of nonsmooth optimization problems on real algebraic varieties. The method proposed in this work generalizes the gradient sampling method for Riemannian manifolds to problems on such sets. Our motivation comes from applications in low-rank matrix and tensor optimization, where one is faced with the fact that smooth manifolds of fixed rank, say, manifolds of rank-r matrices, are not closed, and hence convergence of Riemannian algorithms is difficult to establish even for smooth functions.

## Napsu Karmitsa: Clustering in large data sets with the limited memory bundle method

Clustering is among most important tasks in data mining. This problem in large data sets is challenging for most existing clustering algorithms. Here we introduce a new algorithm based on nonsmooth optimization techniques for solving the minimum sum-of-squares clustering problems in large data sets. The clustering problem is first formulated as a nonsmooth optimization problem. Then the limited memory bundle method [Haarala et.al.: Math. Prog., Vol. 109, No. 1, pp. 181-205, 2007] is modified and combined with an incremental approach to solve this problem. The algorithm is evaluated using real world data sets with both large numbers of attributes and large numbers of data points. The new algorithm is also compared with some other optimization based clustering algorithms. The numerical results demonstrate that the new algorithm was both efficient and accurate and it can be used to provide real-time clustering in large data sets.

## Yuri Ledyaev: Multidirectional mean-value inequalities and dynamic optimization

We discuss recent results on generalized mean-value inequalities for lower semicontinuous functions for smooth Banach spaces.

Previously, such inequlities have been used to establish relations between Dini derivatives and subgradients of lower semicontinuous functions, to derive metric regularity of functional relations or equivalence of viscosity and minimax solutions in theory of generalized solutions of first-order PDE.

In this talk we demonstrate how new mean-value inequality can be used to provide shorter proofs of optimality conditions for nonsmooth calculus of variation problems. We also discuss application of these results to nonsmooth dynamic optimization problems on manifolds.

## Russell Luke: Quantitative Convergence Analysis of Picard Iterations: a View from Infinte Dimensions

In recent work we proposed a framework for a quantitative convergence analysis of Picard iterations in finite dimensions. This strategy employed the well-established submonotonicity property of mappings at the fixed point set, assuming this to be nonempty. Another property that emerged is what we call submonotonicity. This property is new and generalizes prox-regularity and co-hypomonotonicity as well as, naturally, convexity. We report on the extension of these structures to infinite dimensional settings and demonstrate their application to prototypical algorithms like the proximal point algorithm.

## Marko Mäkelä: Practical Challenges for Nonsmooth Optimization Methods

Optimization problems arise in many fields of application, for example in economics, mechanics, engineering, biosciences and data mining. Often those practical problems involve different kind of nonsmoothness caused by, for example, physical, technological, methodological or numerical reasons. Furthermore, instead of a single optimization criteria there are typically several goals to be optimized simultaneously. Moreover, it is common to have some constraints and both continuous and discrete variables in real world applications.

Also the big number of the variables usually means extra trouble to the used solution methods. Thus we are faced on large-scale nonsmooth multiobjective mixed-integer nonlinear programming (MINLP) problems with constraints. In this talk, we will present some methods attacking the above mentioned challenges together with some applications.

## Boris Mordukhovich: Critical multipliers in nonsmooth optimization via second-order variational analysis

We introduce the notions of critical and noncritical multipliers for subdifferential variational systems extending to a general framework the corresponding notions by Izmailov and Solodov developed for the classical KKT setting. It has been well recognized that critical multipliers are largely responsible for slow convergence of major primal-dual algorithms of optimization. The approach of this paper allows us to cover generalized KKT systems arising in various classes of constrained and nonsmooth optimization including composite and minimax problems, conic programming, etc. Concentrating on a polyhedral subdifferential case and employing recent results of second-order subdifferential theory, we obtain complete characterizations of critical and noncritical multipliers via the problem data. It is shown that noncriticality is equivalent to a certain error bound for a perturbed variational system and that critical multipliers can be ruled out by full stability of local minimizers in problems of composite optimization. For the latter class we establish the equivalence between noncriticality of multipliers and robust isolated calmness of the associated solution map and then derive explicit characterizations of these notions via appropriate second-order sufficient conditions. It is finally proved that the Lipschitz-like/Aubin property of solution maps yields their robust isolated calmness.

Based on joint work with Ebrahim Sarabi (Miami University, USA)

## Dominikus Noll: Non-smooth optimization to control infinite-dimensional systems

Well-designed feedback regulators maintain stability and performance of controlled systems in closed-loop. Computing such regulators is a challenging non-differentiable multi-objective optimization problem, which is why non-smooth optimization has played a key role in the development of efficient algorithms for robust control design [1,2].

In this talk we show that non-smooth optimization techniques [3] can also be used to control infinite-dimensional systems. This includes boundary and distributed control of partial differential equations, delay and dead-time systems, [4], but also feedback control of fractional order differential equations.

Current methods for infinite-dimensional systems rely heavily on system reduction or identification, which is unreliable procedurally and often not justified theoretically. Our approach avoids these pitfalls.

References:

1. P. Apkarian, D. Noll. Nonsmooth H-synthesis. IEEE Transactions on Automatic Control, vol. 51, no. 1, 2006, pp. 229 – 244.

2. P. Apkarian, M.N. Dao, D. Noll. Parametric robust structured control design. IEEE Transactions on Automatic Control, vol. 60, no. 7, 2015, pp. 1857 – 1869.

3. P. Apkarian, D. Noll, L. Ravanbod. Nonsmooth bundle trust-region algorithm with applications to robust stability. Set-Valued and Variational Analysis, vol. 24, no. 1, 2016, pp. 115-148.

4. R.F. Curtain, H.J. Zwart. An introduction to infinite-dimensional linear system theory. Springer Verlag 1995.

## Diethard Pallaschke: Quasidifferentiablty and the Minkowski-Radström-Hörmander Lattice

For a finite-dimensional linear space X the quasi differential calculus of V.F. Demyanov and A.M. Rubinov has some similarities to the classical Frechet differential calculus. More exactly, in the quasi differential calculus the dual space X′ of gradients from the Frechet calculus is replaced by the Minkowski-Radström-Hörmander Lattice MRH(X) of differences of sublinear functions. First we show that the space MRH(X) can be endowed with a norm under which it becomes a Banach-space. Since the elements of Minkowski-Radström-Hörmander Lattice consists of equivalence classes and have no unique representation we study two types of minimal representatives, namely reduced and inclusion minimal representatives. Finally we consider the topological dual of MRH(X) and give for the 2-dimensional case an explicit example of a norm continuous linear functional, which is not continuous with respect to the supremum norm.

## Claudia Sagastizábal: VU Decomposition and Partial Smoothness for Sublinear Functions

VU decomposition and partial smoothness are related notions for generic nonsmooth functions. Sublinear functions are widely used in optimization algorithms as they themselves contain important structures. We discuss how to characterize several concepts related to partial smoothness for this type of functions. In particular, for a closed sublinear function h we introduce a relaxed decomposition depending on certain Vε and Uε subspaces that converge continuously to the V- and U-counterparts as ε tends to zero. To the new VεUε decomposition corresponds a relaxed smooth manifold that contains the manifold where h is partly smooth.

## David Salas: Characterizations of nonconvex sets with smooth boundary in terms of the metric projection in Hilbert spaces

The present communication is a continuation of a previous work over conditions for regularity of the metric projection. Here we provide a full characterization of closed bodies in Hilbert spaces with - smooth boundary in terms of the smoothness of the metric projection and the behavior of its first derivative. The smoothness of the metric projection onto -submanifolds of Hilbert spaces is also studied. Finally, we apply our results to the study of solutions of projected differential equations.

## Hermann Schichl: An SQP-inspired second order bundle method for nonsmooth nonconvex optimization

In this talk I will present the first step on a roadmap towards an algorithm for solving general nonsmooth nonconvex programs. Taking inspiration from the SQP-method for smooth optimization we develop a second-order bundle method for minimizing a nonsmooth objective function subject to nonsmooth inequality constraints starting from a strictly feasible point. Instead of using a penalty function or filter or a merit function to deal with the constraints, we determine the search direction by solving a convex quadratically constrained quadratic program to obtain good iteration points. Furthermore, global convergence of the method is proved under certain mild assumptions. For a concrete implementation numerical results will be presented, as well as an application to certificates of infeasibility and exclusion boxes for numerical constraint satisfaction problems.

## Mikhail Solodov: A globally convergent Linear-Programming-Newton method for piecewise smooth constrained equations

The LP-Newton method for constrained equations, introduced some years ago, has powerful properties of local superlinear convergence, covering both possibly nonisolated solutions and possibly nonsmooth equation mappings.

We develop a related globally convergent algorithm, based on the LP-Newton subproblems and linesearch for the equation's infinity-norm residual.

In the case of smooth equations, global convergence of this algorithm to B-stationary points of the residual over the constraint set is shown, which is a natural result: nothing better should generally be expected in variational settings.

However, for the piecewise smooth case only a property weaker than B-stationarity can be guaranteed in general.

We then develop an additional procedure for piecewise smooth equations that avoids undesirable accumulation points, thus achieving the intended property of B-stationarity.

## Sona Taheri: A DC Optimization Algorithm for Clusterwise Linear Regression

A new algorithm for solving the clusterwise linear regression problem using its nonsmooth optimization formulation is considered. The algorithm is based on a combination of a difference of convex algorithm (DC Algorithm) and the incremental approach. It is especially efficient for solving clusterwise linear regression problems in very large data sets. The proposed algorithm is tested on small to large synthetic and real world data sets and compared with other algorithms for clusterwise linear regression using numerical results.

## Michel Thera: Perturbation of error bounds

In the current presentation, I intend to extend the developments in Kruger, Ngai and Thera, SIAM J. Optim. 20(6), 3280--3296 (2010) and, more precisely, to characterize, in the Banach space setting, the stability of the local and global error bound property of inequalities determined by proper lower semicontinuous under data perturbations.

I will propose new concepts of (arbitrary, convex and linear) perturbations of the given function defining the system under consideration, which turn out to be a useful tool in our analysis.

The characterizations of error bounds for families of perturbations can be interpreted as estimates of the `radius of error bounds'.

The definitions and characterizations are illustrated by examples. This presentation summarizes recent joint works with Huynh Van Ngai, Marco Lopez Cerda and A. Kruger.

## Michael Ulbrich: A Proximal Gradient Method for Matrix Optimization Problems Arising in Ensemble Density Functional Theory

We present a proximal gradient method to solve a nonconvex optimization problem that is equivalent to the ensemble density functional theory (EDFT) model for electronic structure calculations in computational chemistry. The EDFT model is especially well suited for metallic systems. It can be cast as a matrix optimization problem with orthogonality constraints. Our algorithm uses an equivalent reformulation in which the singular values of the matrix-valued optimization variable arise in the cost function and the constraints. For the proposed proximal gradient method, convergence to stationary points is established. Numerical results show that this method can outperform the well-known self-consistent field iteration on many metallic systems.

## Stefan Ulbrich: Optimization of elastoplastic contact problems with application to deep drawing processes

Hydroforming of sheet metals involves elastoplasticity and frictional contact, which can be modeled by an evolutionary quasi-variational inequality (EQVI). The aim is to control the blank holder force and the fluid pressure in such a way that no damages occur during the forming process and that the final shape of the sheet metal product coincides with the desired geometry. This leads to a challenging optimization problem for an EQVI given by a multiplicative finite plasticity model with frictional contact. We propose a semismooth FE-discretization of the EQVI. For the solution of the resulting reduced optimization problem we apply a trust-region bundle method and give some analytical justification for a simpler model problem. To deal with the high computational costs, we discuss model reduction techniques to reduce the complexity of the discretized EQVI. Moreover, we show optimization results for a hydroforming process based on the full FE-discretization and discuss a strategy towards an efficient reduced order model based optimization approach.

This is joint work with Daniela Bratzke and Anna Walter within SFB 666.

## Xianfu Wang: On construction of c-splitting potentials in the multimarginal optimal transport

Given a c-cyclically monotone set, is it a c-splitting set? For two marginal, this follows from a generalization of Rockafellar's explicit construction of a convex antiderivative in the c-convexity theory. For multimarginal, this is closely related to the multi-marginal dual Kantorovich problem. I will study a special class of multi-marginal costs.