My research interests lie in several fundamental aspects of optimization and machine learning. They are often motivated by practical needs and/or observations. Here are currently ongoing research projects of my group.

## Design and Analysis of Stochastic Optimization Methods

Stochastic gradient method (SGD) plays an important role in modern machine learning due to the large-scale feature of the arising problems. Most of the (supervised) learning problem amounts to solving an emperical risk minimization problem, i.e., minimizing a finite-sum objective \(f(w) = \frac{1}{n}\sum_{i = 1}^n f_i(w)\). There are several variants of SGD for solving this type of problem. In the theory side, the existing analyses mainly apply to the following i.i.d. sampling version of SGD:

\[w_{t+1} = w_t - \alpha_t \nabla f_{i_t} (w_t).\]Here, \(i_t\) is sampled from \(\{1,\ldots, n\}\) *uniformly at random* and every sampling is independent from the previous ones. Though such a SGD scheme is well known and widely analyzed, a quick criticism of this scheme is that it may not visit all the data points in one epoch, resulting in inefficency in practice. What is instead often implemented in practice (in PyTorch) is the so-called *random reshuffling* (RR), which in each epoch randomly *permutes* \(\{1,\ldots, n\}\) (denoted by \(\sigma_t\)) and updates by sweeping the elements of \(\sigma_t\) sequentially. That is, \(i_t\) incrementally takes value in \(\sigma_t\). It is easy to see that such a method will have a much smaller sampling variance and will utilize all the data points in one epoch. In practice, one also uses adaptive step szie (learning rate) and momentum for stablizing and accelerating training, leading to the well known method Adam. These stochastic optimization methods and their variants serve as the fundamental solvers of modern machine learning problems.

Understanding the properties of these methods are of practical importance, which is a major direction of this project. Antoher important direction is to design new stochastic optimization methods for better performance.

Here are some related papers of this project:

Convergence of Random Reshuffling Under The Kurdyka-Lojasiewicz Inequality.

Xiao Li, Andre Milzarek, Junwen Qiu.

SIAM Journal on Optimization, 33(2), 1092-1120, 2023. [SIAM Link] [arXiv] [Slides]

Distributed Random Reshuffling over Networks.

Kun Huang, Xiao Li, Andre Milzarek, Shi Pu, Junwen Qiu.

IEEE Transactions on Signal Processing, 71, 1143-1158, 2023. [IEEE Link] [arXiv]

A Unified Convergence Theorem for Stochastic Optimization Methods.

Xiao Li, Andre Milzarek.

Advances in Neural Information Processing Systems (NeurIPS 2022). [NeurIPS Link] [Slides]

## Nonconvex Optimization and Algorithm Design

In general, nonconvex optimization is not feasible for attaining global optimality. There are a set of nonconvex problems in machine learning and signal processing that exhibit benign properties. For instance, a fundamental problem is to recover \(w^*\) from \(y = Xw^* + e\), given \(y\) and \(X\). We omit the dimension here for simplicity. Note that the lower case \(y\), \(w^*\), and \(e\) are not restricted to be vectors, they can also be matrices. Such a problem has various names in different communities by imposing different structures on \(w^*\). A family of good examples are variants of (robust) PCA when \(w^*\) is assume to be a low-rank matrix. In this project, we consider both the analysis of the nonconvex optimization problem landscape and the design of efficient algorithms.

Here are some outcomes along this line of research:

ReSync: Riemannian Subgradient-based Robust Rotation Synchronization.

Huikang Liu, Xiao Li, Anthony Man-Cho So.

Advances in Neural Information Processing Systems (NeurIPS 2023). [arXiv]

Weakly Convex Optimization over Stiefel Manifold Using Riemannian Subgradient-Type Methods.

Xiao Li, Shixiang Chen, Zengde Deng, Qing Qu, Zhihui Zhu, Anthony Man Cho So.

SIAM Journal on Optimization, 31(3), 1605–1634, 2021. [arXiv]

Exact Recovery of Multichannel Sparse Blind Deconvolution via Gradient Descent.

Qing Qu, Xiao Li, Zhihui Zhu.

SIAM Journal on Imaging Sciences, 13(3), 1630-1652, 2020. [SIAM Link]

Nonconvex Robust Low-rank Matrix Recovery.

Xiao Li, Zhihui Zhu, Anthony Man-Cho So, Rene Vidal.

SIAM Journal on Optimization, 30(1), 660–686, 2020. [arXiv]

A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution.

Qing Qu, Xiao Li, Zhihui Zhu.

Neural Information Processing Systems (NeurIPS 2019). [NeurIPS Link]