Jia Liu

ECE 8101: Nonconvex Optimization for Machine Learning
_{^{(Spring 2022)}}

Personnel

Instructor: Jia (Kevin) Liu, Assistant Professor, Dept. of Electrical and Computer Engineering
Contact: 420 Dreese Labs, liu@ece.osu.edu
Time & Location: TuTh 11:10AM - 12:30PM, Knowlton Hall 190
Office Hours: Wed 5:00PM - 6:00PM

Course Description

This course will introduce algorithm design and convergence analysis in nonconvex optimization theory as well as their applications in solving modern machine learning and data science problems. The goal of this course is to prepare graduate students with a solid theoretical and mathematical foundation at the intersection of optimization and machine learning so that they will be able to use optimization to solve advanced machine learning problems and/or conduct advanced research in the related fields. This course will take the traditional linear, nonlinear, and convex optimization taught in operation research or related engineering fields (e.g., ECE, CSE) as a prerequisite, and focus on topics in nonconvex optimization that are of special interest in the machine learning community.

Course Materials

There is no required textbook. Most of the materials covered in the class will be based on classical books and recently published papers and monographs. A list of historically important and/or trending papers on ML optimization theory will be provided on the course website.

Paper Reading Assignments

There will be estimated six paper reading assignments, each of which will be assigned during each topic set. Reading assignment must be typeset in NeurIPS format. In each reading assignment, each student writes a review of a set of related papers in a topic set published in recent major machine learning venues (e.g., ICML, NeurIPS, ICLR, AAAI) or on arXiv. Some papers may be from the papers lectured in class. The reviews may include the followings: 1) a summary of the papers and their connections; 2) strengths/weaknesses of the papers from the following aspects: soundness of assumptions/theorems, empirical evaluation, novelty, and significance, etc.; 3) which parts are difficult to understand, questions about proofs/results/experiments (if there are any); and 4) how the papers can be improved and extended.

Final Project

You could choose to finish a project individually or by a team of no more than two persons. Final reports will be due after project presentations in the final week. Final reports should follow the NeurIPS format. Each project is required to have a 20-minute presentation in the final week. Attendance to your fellow students' presentations is required. Potential project ideas include but are not limited to: i) nontrivial extension of the results introduced in class; ii) novel applications in your own research area; iii) new theoretical analysis of an existing algorithm, etc. Each project should contain something new. It is important that you justify its novelty.

Grading Policy

- Class Participation: 10%; Paper Reading Assignments: 60%; Final project: 30%.

Late Policy

Without the consent of the instructor, late paper reading assignments or final report will not be accepted and will result in a grade of zero. In the case of a conference deadline or something of the like, a 5-day written notice of extension is required. In the case of an emergency (sudden sickness, family problems, etc.), an after-the-fact notice is acceptable. But we emphasize that this is reserved for true emergencies.

Schedule

Here is the full class schedule, which follows the lecture progress and class interests, with some adjustments in the syllabus.

Class	Date	Topics	Lecture Topics	Lecture Notes	Lecture Recordings
1	1/11	1. Course Info & Introduction	1. Course Info & Introduction	Lecture 1	Video 01-01
2	1/13	2. First-Order Methods for Nonconvex Optimization	2-1. Math Background Review	Lecture 2-1	Video 01-02
3	1/18		2-1. Math Background Review	Lecture 2-1	Video 02-01
4	1/20		2-2. Convexity	Lecture 2-2	Video 02-02
5	1/25		2-3. Gradient Descent	Lecture 2-3	Video 03-01
6	1/27		2-3. Gradient Descent	Lecture 2-3	Video 03-02
7	2/1		2-4. Stochastic Gradient Descent (General Expectation Minimization, Finite-Sum Minimization)	Lecture 2-4	Video 04-01
8	2/3				Video 04-02
9	2/8				Video 05-01
10	2/10		2-5. Variance-Reduced Methods (SAG, SVRG, SAGA, SPIDER, PAGE)	Lecture 2-5	Video 05-02
11	2/15				Video 06-01
12	2/17				Video 06-02
13	2/22		2-6. Adaptive Methods (AdaGrad, RMSProp, Adam)	Lecture 2-6	Video 07-01
14	2/24				Video 07-02
15	3/1				Video 08-01
16	3/3	3. Federated and Decentralized Learning	3-1. Federated Learning (Distributed Learning, FedAvg)	Lecture 3-1	Video 08-02
17	3/8				Video 09-01
18	3/10				Video 09-02
Spring Break
19	3/22	3. Federated and Decentralized Learning	3-2. Decentralized Learning (Decentralized SGD, Gradient Tracking)	Lecture 3-2	Video 11-01
20	3/24				Video 11-02
21	3/29				Video 12-01
22	3/31				Video 12-02
23	4/5				Video 13-01
24	4/7	4. Zeroth-Order Methods for Nonconvex Optimization	4-1. ZO Methods with Random Directions of Gradient Estimation	Lecture 4-1	Video 13-02
25	4/12			Lecture 4-1	Video 14-01
26	4/14		4-2. Variance-Reduced Zeroth-Order Methods	Lecture 4-2	Video 14-02
27	4/19	5. First-Order Nonconvex Optimization with Special Geometric Structure	5-1. The PL Condition and NTK	Lecture 5-1	Video 15-01
28	4/21		5-2. NTK and Weak-Quasi-Convexity	Lecture 5-2	Video 15-02
Final Exam Week (Project Presentations)

Academic Integrity

This course will follow OSU's Code of Academic Conduct. Discussions of homework assignments and final projects are encouraged. However, what you turn in must be your own. You should not directly copy solutions from others. Any reference (including online resources) used in your solution must be clearly cited.