Connect With Aresty

RAD Collaboratory SURF
New Optimization Approaches to Large Language Model Pretraining
Project Summary
We will explore several new optimization approaches to accelerate the pretraining of large language models (LLMs). Candidate approaches include second-order optimization and gradient orthogonalization. The student will examine the performance of these optimizers on GPT model families. We will also design new approaches to lower the communication cost of these optimizers.


Sign in to view more information about this project.