Events
Visitor Talk: Atlas Wang (UT Austin)
Efficiently Scaling Up and Training LLMs
As the sizes of Large Language Models (LLMs) continue to grow exponentially, it becomes imperative to explore novel computing paradigms that can address the dual challenge of scaling these models while adhering to constraints posed by compute and data resources. This presentation will delve into several strategies aimed at alleviating this dilemma: (1) refraining from training models entirely from scratch, instead making use of readily available pre-trained models to kickstart the training of a new, larger model; (2) extending this concept to enhance data efficiency during the neural scaling process; (3) integrate more efficient, sparse dropout-inspired training algorithms. The talk will be concluded by a few (pretty random) thoughts and reflections.
Speaker Bio
Professor Zhangyang “Atlas” Wang is a tenured Associate Professor and holds the Temple Foundation Endowed Faculty Fellowship #7, in the Chandra Family Department of Electrical and Computer Engineering at The University of Texas at Austin. He is also a faculty member of UT Computer Science (GSC) and the Oden Institute CSEM program.
Meanwhile, in a part-time role, he serves as the Director of AI Research & Technology for Picsart, where he leads the development of cutting-edge, GenAI-powered tools for creative visual editing. He was the Jack Kilby/Texas Instruments Endowed Assistant Professor in the same department from 2020 to 2023. From 2017 to 2020, he was an Assistant Professor of Computer Science and Engineering, at the Texas A&M University. During 2021 – 2022, he also held a visiting researcher position at Amazon Search. He received his Ph.D. degree in ECE from UIUC in 2016, advised by Professor Thomas S. Huang; and his B.E. degree in EEIS from USTC in 2012.
Prof. Wang has broad research interests spanning from the theory to the application aspects of machine learning (ML). At present, his core research mission is to leverage, understand and expand the role of sparsity, from classical optimization to modern neural networks, whose impacts span over many important topics such as efficient training/inference/transfer (especially, of large foundation models), robustness and trustworthiness, learning to optimize (L2O), generative AI, and graph learning. His research is gratefully supported by NSF, DARPA, ARL, ARO, IARPA, DOE, as well as dozens of industry and university grants. Prof. Wang co-founded the Conference on Parsimony and Learning (CPAL) and serves as its inaugural Program Chair. He is an elected technical committee member of IEEE MLSP and IEEE CI; and regularly serves as area chairs, invited speakers, tutorial/workshop organizers, various panelist positions and reviewers. He is an ACM Distinguished Speaker and an IEEE senior member.
Prof. Wang has received many research awards, including an NSF CAREER Award, an ARO Young Investigator Award, an IEEE AI’s 10 To Watch Award, an INNS Aharon Katzir Young Investigator Award, a Google Research Scholar award, an IBM Faculty Research Award, a J. P. Morgan Faculty Research Award, an Amazon Research Award, an Adobe Data Science Research Award, a Meta Reality Labs Research Award, and two Google TensorFlow Model Garden Awards. His team has won the Best Paper Award from the inaugural Learning on Graphs (LoG) Conference 2022; and has also won five research competition prizes from CVPR/ICCV/ECCV since 2018. He feels most proud of being surrounded by some of the world’s most brilliant students: his Ph.D. students include winners of seven prestigious fellowships (NSF GRFP, IBM, Apple, Adobe, Amazon, Qualcomm, and Snap), among many other honors.