Casimir: Catalyst, smoothing, and inference

Casimir: Catalyst, smoothing, and inference#

A toolbox of selected optimization algorithms including Casimir-SVRG (as well as SVRG and Catalyst-SVRG as special cases) for unstructured tasks such as binary classification, and structured prediction tasks such as visual object localization and named entity recognition. This is code accompanying the paper “A Smoother Way to Train Structured Prediction Models” in NeurIPS 2018. The package is licensed under the GPLv3 license.

Overview#

This package considers minimizing convex functions of the form

\[f(w) = \frac{1}{n} \sum_{i=1}^n f_i(w) + r(w),\]

where each \(f_i: R^d \to R\) is convex and \(r: R^d \to R\) is a strongly convex regularizing function.

All primal incremental optimization algorithms require defining an incremental first order oracle (IFO) for \(f\). That is, given an integer \(i\) and a \(w \in R^d\), the IFO returns the function value \(f_i(w)\) and the gradient \(\nabla f_i(w)\) if it exists or a subgradient \(g \in \partial f_i(w)\) otherwise.

This package also considers non-smooth functions of the form \(h(Aw)\) where \(h\) is smoothable, i.e., a smooth approximation \(h_\mu\) can be computed analytically or algorithmically. Examples include the structural hinge loss for structured prediction. In this case, the code must define a smoothed incremental first order oracle, which returns \(f_{i, \mu}(w)\) and \(\nabla f_{i, \mu}(w)\).

This package provides primal incremental optimization algorithms to minimize \(f(w)\) defined above. The implemented algorithms include Casimir-SVRG, SVRG and SGD. To use these optimization algorithms on new data and loss functions, the user simply has to define a (smoothed) IFO for this problem. The framework of IFOs decouples the optimization from the data handling and loss function definition. This idea is captured by the figure below.

_images/fig.001.png

Table of Contents#

Authors#

Cite#

If you found this package useful, please cite the following work. If you use this code, please cite:

@incollection{pillutla-etal:casimir:neurips2018,
title = {{A Smoother Way to Train Structured Prediction Models}},
author = {Pillutla, Krishna and
          Roulet, Vincent and
          Kakade, Sham M. and
          Harchaoui, Zaid},
booktitle = {Advances in Neural Information Processing Systems},
year = {2018},
}

@incollection{pillutla-etal:casimir:ssp2023,
title = {{Modified Gauss-Newton Algorithms under Noise}},
author = {Pillutla, Krishna and
          Roulet, Vincent and
          Kakade, Sham M. and
          Harchaoui, Zaid},
booktitle = {IEEE SSP},
year = {2023},
}

Direct any questions, comments or concerns to Krishna Pillutla.

Acknowledgments#

This work was supported by NSF Award CCF-1740551, the Washington Research Foundation for innovation in Data-intensive Discovery, and the program “Learning in Machines and Brains” of CIFAR.

adsi    nsf esci      cifar