|
We will explore several new optimization approaches to accelerate the pretraining of large language models (LLMs). Candidate approaches include second-order optimization and gradient orthogonalization. The student will examine the performance of these optimizers on GPT model families. We will also design new approaches to lower the communication cost of these optimizers.
|
Sign in
to view more information about this project.