About Me

I am a Ph.D. candicate in the Computer Science Department at Virginia Tech, advised by Professor Dawei Zhou. I received my M.S. in Biostatistics from the University of North Carolina at Chapel Hill in 2024, where I was advised by Professor Yun Li, and my B.S. in Statistics and B.A. in English Literature and Linguistics from Zhejiang University in 2022.

My research focuses on Machine Learning and Data Mining, with an emphasis on developing statistically grounded, computationally efficient frameworks to address challenges in open-world learning. I am particularly interested in advancing the theoretical and empirical understanding of LLMs to improve their reliability and generalization in real-world applications.

What I'm Focusing

  • Open-world LLM

    Developed advanced methods to enhance large language models with robust adaptability and precision in open-world settings. [LensLLM]

  • Ai4Science

    Applied ML and statistical modeling to biological data for structure prediction, biomarker discovery, and disease analysis. [DISPROTBENCH]

  • Quant

    Designing statistical and machine learning models for quantitative research, risk analysis, and trading strategies.

  • Exploring

    Investigating interdisciplinary problems across CS and statistics to drive real-world innovation.

Latest News

  • Received GPSS Travel Fund from VT Graduate School

    Honored to receive a travel grant from the Virginia Tech Graduate School to attend and present at ICML'25 in Vancouver.

  • Received Student Travel Award from CS Department

    Grateful to the Virginia Tech Computer Science Department for supporting my ICML'25 travel through their student award program.

  • One Paper Accepted by ICML'25

    My first leading paper "LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection" has been accepted by ICML'25.

Resume

Education

  1. Virginia Tech

    2024 — 2029

    College of Engineering
    PhD candidate in Computer Science

  2. University of North Carolina at Chapel Hill

    2022 — 2024

    Gillings School of Public Health
    Master of Science in Biostatistics

  3. Zhejiang University

    2018 — 2022

    School of Mathematical Sciences
    Bachelor of Science in Statistics
    &&
    School of International Studies
    Bachelor of Arts in English Literature and Linguistics

Experience

  1. Graduate Research Assistant

    July 2024 — Present

    Computer Science, Virginia Tech, Blacksburg, USA

    • Applied rigorous statistical methods and core computer science principles to investigate large language model (LLM) behavior, bridging the gap between theoretical guarantees and real-world performance in open-world settings.
  2. Graduate Research Assistant

    Aug 2022 — May 2024

    Biostatistics, UNC-Chapel Hill, Chapel Hill, USA

    • Architected scalable AI/ML frameworks that unify heterogeneous biomedical data (single-cell & bulk RNA-seq, EHRs) to generate precise patient-level insights and streamline large-scale comparative analyses.
  3. Student Researcher

    Oct 2022 — May 2023

    Statistics, Zhejiang University, Hangzhou, China

    • Applied advanced ML/DL techniques to both quantitative finance and computational biology, delivering state-of-the-art market-forecasting/trading systems and precision RNA-seq & biomarker pipelines that markedly boost predictive accuracy and operational efficiency.
  4. Software Design Engineer Intern

    July 2021 — Sep 2021

    Alibaba, Hangzhou, China

    • Replaced a legacy static-path system with a Java-based Dynamic Source Routing (DSR) algorithm, embedding a node- energy score that raised end-to-end packet delivery from 88 % to 97 % and cut average latency by 30 % during 500-node stress tests under highly dynamic traffic.
  5. Quantitative Research Intern

    Jan 2021 — Feb 2021

    Tenbagger Capital Management, Hangzhou, China

    • Built and maintained a web-scraped financial warehouse (5 M+ records across 1,000 equities) capturing key metrics— transaction rate, gross-profit margin, leverage ratio, etc.—to enable rapid factor research.
    • Developed an automated Python long-short strategy driven by those factors, delivering ≈15 % annualized alpha and Sharpe 1.3 in 3-year walk-forward tests.

Projects

  1. LensLLM: Unveiling Fine-Tuning Dynamics for LLM Selection

    Aug 2024 — Feb 2025

    VirginiaTech (Prof.Dawei Zhou), VLOG lab

    • Formulated a PAC-Bayesian bound that isolates the pre-power vs. power phases in LLM fine-tuning and pinpoints the phase- transition threshold.
    • Designed LENSLLM, an NTK-based rectified scaling law that predicts a model's full-data performance after observing only a small training subset.
    • Built a progressive sampling algorithm that cuts fine-tuning compute by up to 88.5 %.
    • Validated on 7 LLM families (OPT, T5, GPT-2, mT5, BART, etc.), achieving 91.1 % relative accuracy and 85.8 % Pearson correlation, outperforming five baselines across FLAN, WikiText, and Gigaword tasks.
  2. Cell-type Specific RNA-sequence Genetic Deconvolution

    May 2023 — May 2024

    UNC-Chapel Hill (prof. Yun Li)

    • Spearheaded a scalable Bayesian Gaussian–based probabilistic framework that integrates single-cell and bulk RNA-seq data through a nonlinear dynamic covariate system, producing precise sample-level cell-type expression estimates.
    • Delivered up to 123.1 % higher Pearson correlation than leading methods across multiple datasets, evidencing superior accuracy and robustness.
  3. Mixed Effects Modeling via Deep Learning

    Oct 2022 — May 2023

    UNC-Chapel Hill (prof. Baiming Zou)

    • Engineered a deep-learning semiparametric regression pipeline in R that extracts high-impact biomarkers from EHRs while adjusting for complex, nonlinear confounding.
    • Built Linux-shell workflows that scale comparative-effectiveness analyses from simulated to real-world datasets, sharply reducing preprocessing time and ensuring reproducibility.
  4. Development of Stock Investment System Based on Big Data Analysis and Statistical Optimization

    May 2021 — July 2022

    Zhejiang University (prof. Peng Zhang), Leading researcher

    • Engineered a mid-frequency stock-trend forecasting framework for the CSI 300 & CSI 500 constituents (RSRS filter ➜ GBDT ➜ 4-state Gaussian HMM) on a 10-year, minute-bar corpus of 800+ Chinese equities, boosting directional hit-rate 52 → 67 %, with α = 8.4 %, Sharpe = 1.6, and max drawdown capped at -5.8 %.
    • Curated 30+ technical, sector-beta, and macro features to detect market regime shifts, achieving stable cross-validated gains across rolling 3-year windows.
    • Built a low-latency algo-trading platform (MATLAB UI + Java backend, Redis queue, JDBC risk ledger) delivering < 100 ms tick-to-execution, executing 1 k+ orders/day and supporting 1,500 concurrent orders on a two-node HA cluster (32 cores, 256 GB RAM).
    • Scaled operations to ¥2 M (~$300 k) notional portfolio, eliminated manual-entry errors, and cut post-trade reconciliation time by 85 % via automated nightly ETL, model retraining, and walk-forward back-testing of 50 GB fresh tick data.

Publications

Publications

  1. DISPROTBENCH: A Disorder-Aware, Task-Rich Benchmark for Evaluating Protein Structure Prediction in Realistic Biological Contexts

    2025

    Xinyue Zeng, Tuo Wang, Adithya Kulkarni, Alexander Lu, Alexandra Ni, Phoebe Xing, Junhan Zhao, Siwei Chen, Dawei Zhou
    Preprint'25

  2. LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection

    2025

    Xinyue Zeng, Haohui Wang, Junhong Lin, Jun Wu, Tyler Cody, Dawei Zhou
    International Conference on Machine Learning (ICML)

  3. Scientific Hypothesis Generation and Validation: Methods, Datasets, and Future Directions

    2025

    Adithya Kulkarni, Fatimah Alotaibi, Xinyue Zeng, Longfeng Wu, Tong Zeng, Barry Menglong Yao, Minqian Liu, Shuaicheng Zhang, Lifu Huang, Dawei Zhou

  4. Cell-type specific inference from bulk RNA-sequencing data by integrating single cell reference profiles via EPIC-unmix

    2024

    Chenwei Tang, Quan Sun, Xinyue Zeng, Xiaoyu Yang, Fei Liu, Jinying Zhao, Yin Shen, Bixiang Liu, Jia Wen, Yun Li

  5. Development of stock investment system based on big data analysis and statistical optimization

    2022

    Xinyue Zeng, Yuting Fu, Haiyun Zou, Peng Zhang
    National Innovation and Entrepreneurship Program at Zhejiang University

Portfolio

Awards & Honors

  1. GPSS Travel Fund

    2024 — 2025

    Virginia Tech Graduate School

  2. CS Travel Award

    2024 — 2025

    Virginia Tech CS Department

  3. Biostatistics Travel Award

    2023 — 2024

    UNC-Chapel Hill

  4. Undergraduate Innovation and Entrepreneurship Award

    2021 — 2022

    Zhejiang University

Presentations

  1. LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection

    2025

    International Conference on Machine Learning (ICML)

  2. Gaussian-Process-Based Cell Type Specific Unmixing of Bulk Expression Profiles

    2024

    ENAR 2024 Spring Meeting

  3. Wildlife recognition based on deep learning

    2022

    Dissertation Defense for Undergraduate Thesis at Zhejiang University

Invited Talks

  1. Guest Lecture: CS 4824

    March 2025

    Virginia Tech

Copyrights

  1. Statistical Analysis Platform

    2023

    Software Copyright in China

Contact

Contact Form