Software

Packages

XBern Confidence Intervals
An XBern or exchangeable Bernoulli distribution is a probability distribution over binary vectors which is exchangeable, i.e., the probability mass does not change when the coordinates of the vector are permuted. This package gives confidence intervals for the mean of the XBern distribution.

For n binary vectors in k-dimensions, the confidence interval could vary from 1 / sqrt(n) (if the k dimensions are copies of each other) to 1 / sqrt(nk) (if the k dimensions are independent). This software gives adaptive confidence intervals yielding 1 / sqrt(n) intervals in high correlation and automatically improving to 1 / sqrt(nk) + 1 / n^{3/4} when correlations are low.

Install as pip install xbern-confidence-intervals.

Dataset Grouper
A scalable library to create, write, and iterate over group-partitioned (i.e. federated) datasets. It allows the creation of federated datasets suitable for pretraining and finetuning large language models. Install as pip install dataset-grouper.

Mauve (Documentation)
A package to compute the Mauve score for neural text generation. Install as pip install mauve-text. It is also supported via the HuggingFace Evaluate package.

SQwash (Documentation)
Distributionally robust learning in PyTorch with 1 additional line of code. Install as pip install sqwash.

Geom-Median
Fast and Differentiable Geometric Median in PyTorch and NumPy. Install as pip install geom-median.

Casimir (Documentation)
A toolbox of selected optimization algorithms for unstructured tasks such as binary classification, and structured prediction tasks such as visual object localization and named entity recognition.

RFA TensorFlow Federated Implementation of robust aggregation for federated learning using the geometric median.

Code to reproduce results from papers

LiDP Auditing: Code to reproduce the experimental results of this NeurIPS 2023 paper, which shows how to improve the sample complexity of auditing differential privacy (DP) by auditing the equivalent notion of Lifted DP with randomized hypothesis tests and adaptive confidence intervals.

Federated Learning with Partial Personalization: PyTorch implementation of various personalized federated learning algorithms and experiments on text, vision, and speech data. Reproduce results from this ICML 2022 paper.

Mauve Experiments: Implementation of Mauve and other similarity measures for neural text generation. Reproduce results from this NeurIPS 2021 paper.

RFA and PyTorch port tRFA: Implementation of RFA, a robust aggregation algorithm for federated learning, in simulation. Reproduce results from this paper published in IEEE Transactions on Signal Processing 2022.

Simplicial-FL: Implementation of Simplicial-FL to handle device heterogeneity in federated learning, in simulation. Reproduce results from this paper.

Krishna Pillutla

Packages

Code to reproduce results from papers