A partial collection of publications. This Google Scholar page contains more information.


Revisiting Subgradient Method: Complexity and Convergence Beyond Lipschitz Continuity.
Xiao Li, Lei Zhao, Daoli Zhu, Anthony Man-Cho So.
[arXiv Preprint]
>>> The subgradient method and some of its varaints possess convergence properties without Lipschitz continuity at all.

Randomized Coordinate Subgradient Method for Nonsmooth Composite Optimization.
Lei Zhao, Ding Chen, Daoli Zhu, Xiao Li.
[arXiv Preprint]
>>> The first coordinate-type subgradient method for weakly convex and nonsmooth composite functions.

Distributed Stochastic Optimization under a General Variance Condition.
Kun Huang, Xiao Li, Shi Pu.
[arXiv Preprint]
>>> We established the convergence of FedAvg in nonconvex setting using a general variance condition. This also provide an alternative informative measrue for charactering data heterogeneity in federated learning.

Refereed Articles

ReSync: Riemannian Subgradient-based Robust Rotation Synchronization.
Huikang Liu, Xiao Li, Anthony Man-Cho So.
Advances in Neural Information Processing Systems (NeurIPS 2023).   [arXiv]
>>> We present ReSync, a Riemannian subgradient-based method for robust rotation synchronization, and provide guarantees for finding underlying rotations under the random corruption model.

Distributed Random Reshuffling over Networks.
Kun Huang, Xiao Li, Andre Milzarek, Shi Pu, Junwen Qiu.
IEEE Transactions on Signal Processing, 71, 1143-1158, 2023.   [IEEE Link]   [arXiv]
>>> We design a simple yet powerful decentralized random reshuffling method over networks. Similar complexity and convergence guarantees to the centralized random reshuffling method are established.

Finite-Time Analysis of Decentralized Single-Timescale Actor-Critic.
Qijun Luo, Xiao Li.
Transactions on Machine Learning Research, 2023.   [TMLR Link]   [arXiv]
>>> We showed the \(\tilde {\mathcal O}(\varepsilon^{-2})\) sample complexity for single-timescale AC algorithm (previous results are based on either double-loop or two-timescale scheme). The key step is to estalish the smoothness condition of the optimal critic variable.

Convergence of Random Reshuffling Under The Kurdyka-Lojasiewicz Inequality.
Xiao Li, Andre Milzarek, Junwen Qiu.
SIAM Journal on Optimization, 33(2), 1092-1120, 2023.   [SIAM Link]   [arXiv]   [Slides]
>>> The sequence convergence results are established for random reshuffling (stochastic). The key insights are: 1) derive subsequence convergence using diminishing step sizes and 2) combine diminishing step sizes with the traditional KL analysis.

A Unified Convergence Theorem for Stochastic Optimization Methods.
Xiao Li, Andre Milzarek.
Advances in Neural Information Processing Systems (NeurIPS 2022).   [NeurIPS Link]   [Slides]
>>> In this work, we provide a fundamental convergence theorem and apply it to obtain almost sure convergence results for SGD, RR, prox-SGD, and stochastic model-based methods.

A Provable Splitting Approach for Symmetric Nonnegative Matrix Factorization.
Xiao Li, Zhihui Zhu, Qiuwei Li, Kai Liu.
IEEE Transactions on Knowledge and Data Engineering, 35(3), 2206-2219, 2023.   [IEEE Link]   [arXiv]
>>> We formulate a nonsymmetric penalized version for the symmetric NMF, and show that solving the nonsymmetric version returns a solution for the symmetric one, thus transfering symmetric NMF to a nonsymmetric one theoretically.

Weakly Convex Optimization over Stiefel Manifold Using Riemannian Subgradient-Type Methods.
Xiao Li, Shixiang Chen, Zengde Deng, Qing Qu, Zhihui Zhu, Anthony Man Cho So.
SIAM Journal on Optimization, 31(3), 1605–1634, 2021.   [SIAM Link]   [arXiv]
>>> We provide the first complexity/convergence results for Riemannian subgradient-type methods. The key insight is that weakly convex function restricted on smooth embedded manifold is still weakly convex.

Exact Recovery of Multichannel Sparse Blind Deconvolution via Gradient Descent.
Qing Qu, Xiao Li, Zhihui Zhu.
SIAM Journal on Imaging Sciences, 13(3), 1630-1652, 2020.   [SIAM Link]
>>> We show that the multi-channel sparse blind deconvolution problem has a local strong convexity-like property, which yields strong global convergence property to the optimal/underlying solution of gradient descent.

Nonconvex Robust Low-rank Matrix Recovery.
Xiao Li, Zhihui Zhu, Anthony Man-Cho So, Rene Vidal.
SIAM Journal on Optimization, 30(1), 660–686, 2020.   [SIAM Link]   [arXiv]   [Code]
>>> We propose the robust nonconvex matrix recovery problem, and show that subgradient method will linearly converge to optima. The key insight is the recovery condition “\(\ell_1/\ell_2\)-RIP’’ leads to optimization properties — weak convexity and sharpness.

A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution.
Qing Qu, Xiao Li, Zhihui Zhu.
Advances in Neural Information Processing Systems (NeurIPS 2019).   [NeurIPS Link]

Dropping Symmetry for Fast Symmetric Nonnegative Matrix Factorization.
Zhihui Zhu, Xiao Li, Kai Liu, Qiuwei Li.
Advances in Neural Information Processing Systems (NeurIPS 2018).   [NeurIPS Link]   [Code]