A partial collection of publications. This Google Scholar page contains more information.


Preprints

BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models.
Qijun Luo, Hengxu Yu, Xiao Li.
[arXiv Preprint]   [Github page]
>>> BAdam is a block coordinate method embedded with Adam as the inner solver for large language models. It is memory efficient and allows training Llama 2-7B using a single RTX3090-24GB GPU. It also saves considerable backward time thanks to the chain rule.

High Probability Guarantees for Random Reshuffling.
Hengxu Yu, Xiao Li.
[arXiv Preprint]
>>> We estabilished a set of high probability finite-time compleixity guarantees for RR, including finding a stationary point, desinging a stopping criterion that yields the last iterate result, and avoding saddle points.

Revisiting Subgradient Method: Complexity and Convergence Beyond Lipschitz Continuity.
Xiao Li, Lei Zhao, Daoli Zhu, Anthony Man-Cho So.
[arXiv Preprint]
>>> The subgradient method and some of its varaints possess convergence properties without Lipschitz continuity at all.

Distributed Stochastic Optimization under a General Variance Condition.
Kun Huang, Xiao Li, Shi Pu.
[arXiv Preprint]
>>> We established the convergence of FedAvg using a general variance condition, providing an informative measrue for charactering data heterogeneity. Interestingly, we revealed that SCAFFOLD also suffers from data heterogeneity under this general variance condition.


Refereed Articles

ReSync: Riemannian Subgradient-based Robust Rotation Synchronization.
Huikang Liu, Xiao Li, Anthony Man-Cho So.
Advances in Neural Information Processing Systems (NeurIPS 2023).   [NeurIPS Link]
>>> We present ReSync, a Riemannian subgradient-based method for solving the nonconvex nonsmooth robust rotation synchronization problem, and provide linear convergence guarantees for ReSync in terms of finding the underlying rotations.

Distributed Random Reshuffling over Networks.
Kun Huang, Xiao Li, Andre Milzarek, Shi Pu, Junwen Qiu.
IEEE Transactions on Signal Processing, 71, 1143-1158, 2023.   [IEEE Link]   [arXiv]
>>> We design a simple yet powerful decentralized random reshuffling method over networks. Similar complexity and convergence guarantees to the centralized random reshuffling method are established.

Finite-Time Analysis of Decentralized Single-Timescale Actor-Critic.
Qijun Luo, Xiao Li.
Transactions on Machine Learning Research, 2023.   [TMLR Link]   [arXiv]
>>> We showed the \(\tilde {\mathcal O}(\varepsilon^{-2})\) sample complexity for single-timescale AC algorithm (previous results are based on either double-loop or two-timescale scheme). The key step is to estalish the smoothness condition of the optimal critic variable.

Convergence of Random Reshuffling Under The Kurdyka-Lojasiewicz Inequality.
Xiao Li, Andre Milzarek, Junwen Qiu.
SIAM Journal on Optimization, 33(2), 1092-1120, 2023.   [SIAM Link]   [arXiv]   [Slides]
>>> The sequence convergence results are established for random reshuffling (stochastic). The key insights are: 1) derive subsequence convergence using diminishing step sizes and 2) combine diminishing step sizes with the traditional KL analysis.

A Unified Convergence Theorem for Stochastic Optimization Methods.
Xiao Li, Andre Milzarek.
Advances in Neural Information Processing Systems (NeurIPS 2022).   [NeurIPS Link]   [Slides]
>>> In this work, we provide a fundamental convergence theorem and apply it to obtain almost sure convergence results for SGD, RR, prox-SGD, and stochastic model-based methods.

A Provable Splitting Approach for Symmetric Nonnegative Matrix Factorization.
Xiao Li, Zhihui Zhu, Qiuwei Li, Kai Liu.
IEEE Transactions on Knowledge and Data Engineering, 35(3), 2206-2219, 2023.   [IEEE Link]   [arXiv]
>>> After reformulating the symmetric NMF as a nonsymmetric penalized version, we design a set of alternating algorithms and provide a unified framework for analyzing their convergence by modifying the existing KL analysis.

Weakly Convex Optimization over Stiefel Manifold Using Riemannian Subgradient-Type Methods.
Xiao Li, Shixiang Chen, Zengde Deng, Qing Qu, Zhihui Zhu, Anthony Man Cho So.
SIAM Journal on Optimization, 31(3), 1605–1634, 2021.   [SIAM Link]   [arXiv]
>>> We provide the first complexity/convergence results for Riemannian subgradient-type methods. The key insight is that weakly convex function restricted on smooth embedded manifold is still weakly convex.

Exact Recovery of Multichannel Sparse Blind Deconvolution via Gradient Descent.
Qing Qu, Xiao Li, Zhihui Zhu.
SIAM Journal on Imaging Sciences, 13(3), 1630-1652, 2020.   [SIAM Link]
>>> We show that the multi-channel sparse blind deconvolution problem has a local strong convexity-like property, which yields strong global convergence property to the optimal/underlying solution of gradient descent.

Geometric Analysis of Nonconvex Optimization Landscapes for Overcomplete Learning.
Qing Qu, Yuexiang Zhai, Xiao Li, Yuqian Zhang, Zhihui Zhu.
International Conference on Learning Representation (ICLR 2020).   [ICLR Link]
>>> We consider a \(\ell_4\) maximization problem over the sphere for learning sparsely used overcomplete dictionaries. We show this problem has benigh global geometry so that simple local search algorithms are guaranteed to find the global solution.

Nonconvex Robust Low-rank Matrix Recovery.
Xiao Li, Zhihui Zhu, Anthony Man-Cho So, Rene Vidal.
SIAM Journal on Optimization, 30(1), 660–686, 2020.   [SIAM Link]   [arXiv]   [Code]
>>> We propose the robust nonconvex matrix recovery problem, and show that subgradient method will linearly converge to optima. The key insight is the recovery condition “\(\ell_1/\ell_2\)-RIP’’ leads to optimization properties — weak convexity and sharpness.

A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution.
Qing Qu, Xiao Li, Zhihui Zhu.
Advances in Neural Information Processing Systems (NeurIPS 2019).   [NeurIPS Link]
>>> We give a comprehensive geometric characterization for the Multichannel Sparse Blind Deconvolution on the sphere, including a large gradient area, a benign area, and a perturbation area.

Dropping Symmetry for Fast Symmetric Nonnegative Matrix Factorization.
Zhihui Zhu, Xiao Li, Kai Liu, Qiuwei Li.
Advances in Neural Information Processing Systems (NeurIPS 2018).   [NeurIPS Link]   [Code]
>>> We formulate a nonsymmetric penalized version for the symmetric NMF and show that solving the nonsymmetric version returns a solution for the symmetric one, thus transfering symmetric NMF to a nonsymmetric one theoretically.

Designing Robust Sensing Matrix for Image Compression.
Gang Li, Xiao Li, Sheng Li, Huang Bai, Qianru Jiang, Xiongxiong He.
IEEE Transactions on Image Processing, 24(12), 5389-5400, 2015.   [IEEE Link]
>>> We consider the error matrix arising in the sparse representation, and merge it into the sensing matrix design procedure in compressed sensing.