Welcome！

I am currently a researcher in the AI Frontiers group at Microsoft Research. Previously, I earned my PhD from Stanford University, advised by Matei Zaharia and James Zou. My research focuses on machine learning, with a recent emphasis on AI marketplaces, an exciting topic at the intersection of ML, economics, and data systems.

Email: lingjiaochen [at] [microsoft] [dot] [com]

2025-02: New! Preprints on LLM selection and synthetic data generation
2025-01: New! Two papers accepted at TMLR and DMLR
2024-12: New! Two papers presented at ICML and NeurIPS 2024
2024-04: Super excited to co-organize the inaugural Compound AI Systems Workshop on June 13th. Please send your work on designing and optimizing these systems!
2024-03: Two Papers cited by the White House 2024 Economic Report!
2024-03: LLM drift is accepted by Harvard Data Science Review!
2024-03: preprint on scaling properties of compound AI systems.
2024-03: preprint on impact of ChatGPT on peer reviews.
2024-01: The implications of LLM Drift are featured by The Wall Street Journal
2024-01: We are excited to announce the ICLR 2024 Workshop on Data Problems for Foundation Models (DPFM)!
2023-09: LLM drift is featured by The New York Times.
2023-08: I have a podcast on FrugalGPT with MLOps.community. Thank you Demetrios for the enjoyable discussion!
2023-08: LLM drift is featured by The Wall Street Journal.
2023-08: LLM drift is featured by Scientific Americian.
2023-07: LLM drift is featured by Fortune.
2022-12: Four papers are presented at ICLR, ICML and NeurIPS 2022
2022-09: HAPI website is online. Feedback is weclome!
2022-12: I attend the Data Science Rising Star Workshop. Thank you, University of Chicago!
2020-07: FrugalML is accepted by NeurIPS 2020 (Oral, top 1% submissions)
2019-09: Morpheus is being integrated into GraalVM by Oracle

Journal Publications

Data Acquisition: A New Frontier in Data-centric AI.
Lingjiao Chen, Bilge Acun, Newsha Ardalani, Yifan Sun, Feiyang Kang, Hanrui Lyu, Yongchan Kwon, Ruoxi Jia, Carole-Jean Wu, Matei Zaharia, James Zou.
Data-centric Machine Learning Research (To appear), 2025.
[PDF] [Code and Data]
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance.
Lingjiao Chen, Matei Zaharia, James Zou.
Transactions on Machine Learning Research, 2024.
[PDF] [Code and Data]
How is ChatGPT's behavior changing over time?
Lingjiao Chen, Matei Zaharia, James Zou.
Harvard Data Science Review Issue 6.2, 2024.
[PDF] [Code and Data]
Towards Linear Algebra over Normalized Data.
Lingjiao Chen, Arun Kumar, Jeffrey F. Naughton, Jignesh M. Patel.
Proceedings of the VLDB Endowment Volume 10 Issue 11, 2017.
[PDF] [Technical Report] [Code and Data]
Distributed User-centric Scheduling for Visible Light Communication Networks.
Lingjiao Chen, Jiaheng Wang, Jiantao Zhou, Derrick Wing Kwan Ng, Robert Schober, and Chunming Zhao.
Optics Express Volume 24 Issue 14, 2016.
[PDF]

Conference and Workshop Publications

Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems.
Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou.
NeurIPS Conference on Neural Information Processing Systems, 2024.
[PDF] [Code and Data]
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews.
Weixin Liang, Zachary Izzo, Yaohui Zhang, Haley Lepp, Hancheng Cao, Xuandong Zhao, Lingjiao Chen, Haotian Ye, Sheng Liu, Zhi Huang, Daniel McFarland, James Zou.
ICML International Conference on Machine Learning, 2024.
[PDF] [Code and Data]
Analyzing ChatGPT’s Behavior Shifts Over Time.
Lingjiao Chen, Matei Zaharia, James Zou.
NeurIPS Conference on Neural Information Processing Systems R0-FoMo Workshop, 2023.
[PDF]
DataPerf: Benchmarks for Data-centric AI Development.
The DataPerf team.
NeurIPS Conference on Neural Information Processing Systems, 2023.
[PDF] [Website]
HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions.
Lingjiao Chen, Zhihua Jin, Sabri Eyuboglu, Christopher Re, Matei Zaharia, James Zou.
NeurIPS Conference on Neural Information Processing Systems, 2022.
[PDF] [Website]
Estimating and Explaining Model Performance When Both Covariates and Labels Shift.
Lingjiao Chen, Matei Zaharia, James Zou.
NeurIPS Conference on Neural Information Processing Systems, 2022.
[PDF]
Efficient Online ML API Selection for Multi-Label Classification Tasks.
Lingjiao Chen, Matei Zaharia, James Zou.
ICML International Conference on Machine Learning, 2022.
[PDF]
How Did the Model Change? Efficiently Assessing Machine Learning API Shifts.
Lingjiao Chen, Matei Zaharia, James Zou.
ICLR International Conference on Learning Representations, 2022.
[PDF]
SEAL: Interactive Tool for Semantic Error Analysis and Labeling.
Nazneen Rajani, Weixin Liang, Lingjiao Chen, Meg Mitchell, James Zou.
EMNLP Conference on Empirical Methods in Natural Language Processing, 2022.
[PDF]
ML API Shift Assessments: Change is Coming!
Lingjiao Chen, Matei Zaharia, James Zou.
ICML International Conference on Machine Learning SRML Workshop, 2021 (Oral).
[PDF]
Have the Cake and Eat It Too? Higher Accuracy and Less Expense when Using Multi-label ML APIs Online.
Lingjiao Chen, Matei Zaharia, James Zou.
ICML International Conference on Machine Learning DMMLSYS Workshop, 2021.
[PDF]
SOLON: Communication-efficient Byzantine-resilient Distributed Training via Redundant Gradients.
Lingjiao Chen, Leshang Chen, Hongyi Wang, Susan Davidson, Edgar Dobriban.
ISCA International Symposium on Computer Architecture SPSL Workshop, 2021.
[PDF]
FrugalML: How to Use ML Prediction APIs More Accurately and Cheaply.
Lingjiao Chen, Matei Zaharia, James Zou.
NeurIPS Conference on Neural Information Processing Systems, 2020 (Oral).
[PDF]
To Call or not to Call? Using ML Prediction APIs more Accurately and Economically.
Lingjiao Chen, Matei Zaharia, James Zou.
ICML International Conference on Machine Learning EcoPaDL Workshop, 2020.
[PDF]
Towards Model-based Pricing for Machine Learning in a Data Marketplace.
Lingjiao Chen, Paraschos Koutris, Arun Kumar.
ACM SIGMOD International Conference on Management of Data, 2019.
[PDF] [Technical Report]
Demonstration of Nimbus: Model-based Pricing for Machine Learning in a Data Marketplace.
Lingjiao Chen, Hongyi Wang, Leshang Chen, Paraschos Koutris, Arun Kumar.
ACM SIGMOD International Conference on Management of Data, 2019.
[PDF] [Code and Data]
Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra.
Side Li, Lingjiao Chen, Arun Kumar.
ACM SIGMOD International Conference on Management of Data, 2019.
[PDF] [Code and Data]
Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent.
Fengan Li, Lingjiao Chen, Arun Kumar, Jeffrey F. Naughton, Jignesh M. Patel, Xi Wu.
ACM SIGMOD International Conference on Management of Data, 2019.
[PDF] [Technical Report] [Code and Data]
The Effect of Network Width on the Performance of Large-batch Training.
Lingjiao Chen, Hongyi Wang, Jinman Zhao, Dimitris Papailiopoulos, Paraschos Koutris.
NIPS Conference on Neural Information Processing Systems, 2018.
[PDF] [Technical Report]
DRACO: Byzantine-resilient Distributed Training via Redundant Gradients.
Lingjiao Chen, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos.
ICML International Conference on Machine Learning, 2018.
[PDF] [Technical Report]
Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training.
Xi Wu, Uyeong Jang, Jiefeng Chen, Lingjiao Chen, Somesh Jha.
ICML International Conference on Machine Learning, 2018.
[PDF] [Technical Report]
Draco: Robust Distributed Training against Adversaries.
Lingjiao Chen, Hongyi Wang, Dimitris Papailiopoulos.
SysML, 2018.
[PDF]
Accelerating Linear Algebra over Normalized Data.
Lingjiao Chen.
ACM SIGMOD International Conference on Management of Data Student Research Competition, 2017.
[PDF] Second Runner-up Award Winner
Model-based Pricing: Do Not Pay for More than What You Learn!
Lingjiao Chen, Paraschos Koutris, Arun Kumar.
ACM SIGMOD International Conference on Management of Data DEEM Workshop, 2017.
[PDF]

Technical Reports and Preprints

Optimizing Model Selection for Compound AI Systems.
Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Matei Zaharia, James Zou, Ion Stoica.
Arxiv, 2025.
[PDF] [Code and Data]
BARE: Combining Base and Instruction-Tuned Language Models for Better Synthetic Data Generation.
Alan Zhu, Parth Asawa, Jared Quincy Davis, Lingjiao Chen, Ion Stoica, Joseph E Gonzalez, Matei Zaharia.
Arxiv, 2025.
[PDF] [Code and Data]
Solon: Communication-efficient Byzantine-resilient Distributed Training via Redundant Gradients.
Lingjiao Chen, Leshang Chen, Hongyi Wang, Susan Davidson, Edgar Dobriban.
Arxiv, 2021.
[PDF]