# SQwash: Distributionally Robust Learning in PyTorch with 1 Additional Line of Code¶

This package implements reducers based on the superquantile a.k.a. Conditional Value at Risk (CVaR) for distributionally robust learning in PyTorch with GPU support. The package is licensed under the GPLv3 license.

The superquantile allows for distributional robustness by averaging over the worst $$\theta$$ fraction of the losses in each minibatch, as illustrated in the following figure.

## Installation¶

Once you have PyTorch >=1.7, you can grab SQwash from pip:

$pip install sqwash  Alternatively, if you would like to edit the package, clone the repository, cd into the main directory of the repository and run $ pip install -e .


The only dependency of SQwash is PyTorch, version 1.7 or higher. See the PyTorch webpage for install instructions.

## Quick Start¶

As the name suggests, it requires only a one-line modification to the usual PyTorch training loops. See the notebooks folder for a full example on CIFAR-10.

  1 2 3 4 5 6 7 8 9 10 11  from sqwash import SuperquantileReducer criterion = torch.nn.CrossEntropyLoss(reduction='none') # set reduction='none' reducer = SuperquantileReducer(superquantile_tail_fraction=0.5) # define the reducer # Training loop for x, y in dataloader: y_hat = model(x) batch_losses = criterion(y_hat, y) # shape: (batch_size,) loss = reducer(batch_losses) # Additional line to use the superquantile reducer loss.backward() # Proceed as usual from here ... 

The package also gives a functional version of the reducers, similar to torch.nn.functional:

 1 2 3 4 5 6 7 8 9  import torch.nn.functional as F from sqwash import reduce_superquantile for x, y in dataloader: y_hat = model(x) batch_losses = F.cross_entropy(y_hat, y, reduction='none') # must set reduction='none' loss = reduce_superquantile(batch_losses, superquantile_tail_fraction=0.5) # Additional line loss.backward() # Proceed as usual from here ... 

The package can also be used for distributionally robust learning over pre-specified groups of data. Simply obtain a tensor of losses for each element of the batch and use the reducers in this pacakge as follows:

 1 2  loss_per_group = ... # shape: (num_groups,) reducer = reduce_superquantile(loss_per_group, superquantile_tail_fraction=0.6) 

## Functionality¶

This package provides 3 reducers, which take a tensor of losses on a minibatch and reduce them to a single value.

• MeanReducer: the usual reduction, which is equivalent to specifying reduction='mean' in your criterion.

Given a torch.Tensor denoting a vector $$\ell = (\ell_1, \cdots, \ell_n)$$, the MeanReducer simply returns the mean $$\sum_{i=1}^n \ell_i / n$$. The functional equivalent of this is reduce_mean.

• SuperquantileReducer: computes the superquantile/CVaR of the batch losses.

Given a torch.Tensor denoting a vector $$\ell = (\ell_1, \cdots, \ell_n)$$, the SuperquantileReducer with a superquantile_tail_fraction denoted by $$\theta$$ returns the $$(1-\theta)-$$ superquantile $$\mathrm{SQ}_\theta$$ of $$\ell$$. See the Mathematical Definitions for its precise definition. Its functional counterpart is reduce_superquantile.

• SuperquantileSmoothReducer: computes a smooth counterpart of the superquantile/CVaR of the batch losses.

Given a torch.Tensor denoting a vector $$\ell = (\ell_1, \cdots, \ell_n)$$, the SuperquantileReducer with a superquantile_tail_fraction denoted by $$\theta$$ and a smoothing parameter denoted by $$\nu$$ returns the $$\nu-$$ smoothed $$(1-\theta)-$$ superquantile $$\mathrm{SQ}_\theta^\nu$$ of $$\ell$$. See the Mathematical Definitions for its precise definition. Its functional counterpart is reduce_superquantile_smooth.

See here for details of the API. Each of these reducers work just as well with cuda tensors for efficient distributionally robust learning on the GPU.

## Mathematical Definitions¶

The $$(1-\theta)-$$ superquantile of $$\ell=(\ell_1, \cdots, \ell_n)$$ to an average over the $$\theta$$ fraction of the largest elements of $$\ell$$, if $$n\theta$$ is an integer. See the figure at the top of the page. Formally, it is given by the two equivalent expressions (which are also valid when $$n\theta$$ is not an integer):

$\mathrm{SQ}_{\theta}(\ell) = \max\Bigg\{ q^\top \ell \, : \, q \in R^n_+, \, q^\top 1 = 1, \, q_i \le \frac{1}{n\theta} \Bigg\} = \min_{\eta \in R} \Bigg\{ \eta + \frac{1}{n\theta} \sum_{i=1}^n \max\{\ell_i - \eta, 0\} \Bigg\}.$

The $$\nu-$$ smoothed $$(1-\theta)-$$ superquantile of $$\ell=(\ell_1, \cdots, \ell_n)$$ is given by

$\mathrm{SQ}_{\theta}^\nu(\ell) = \max\Bigg\{ q^\top \ell - \frac{\nu}{2n}\big\|q - u \big\|^2_2 \, : \, q \in R^n_+, \, q^\top 1 = 1, \, q_i \le \frac{1}{n\theta} \Bigg\}.$

where $$u = \mathbf{1}_n / n$$ denotes the uniform distribution over $$n$$ atoms.

## Authors¶

For any questions or comments, please raise an issue on github, or contact Krishna Pillutla.

## Cite¶

If you found this package useful, please cite the following work. If you use this code, please cite:

@inproceedings{DBLP:conf/ciss/LPMH21,
author    = {Yassine Laguel and
Krishna Pillutla and
J{\'{e}}r{\^{o}}me Malick and
Zaid Harchaoui},
title     = {{A Superquantile Approach to Federated Learning with Heterogeneous
Devices}},
booktitle = {55th Annual Conference on Information Sciences and Systems, {CISS}
2021, Baltimore, MD, USA, March 24-26, 2021},
pages     = {1--6},
publisher = {{IEEE}},
year      = {2021},
}


## Acknowledgments¶

We acknowledge support from NSF DMS 2023166, DMS 1839371, CCF 2019844, the CIFAR program “Learning in Machines and Brains”, faculty research awards, and a JP Morgan PhD fellowship. This work has been partially supported by MIAI – Grenoble Alpes, (ANR-19-P3IA-0003).