About Me

I am a first-year Ph.D. student in the Computer Science Department at Virginia Tech, advised by Professor Dawei Zhou. I received my M.S. in Biostatistics from the University of North Carolina at Chapel Hill in 2024, where I was advised by Professor Yun Li, and my B.S. in Statistics and B.A. in English Literature and Linguistics from Zhejiang University in 2022.

My research focuses on Machine Learning and Data Mining, with an emphasis on developing statistically grounded, computationally efficient frameworks to address challenges in open-world learning. I am particularly interested in advancing the theoretical and empirical understanding of LLMs to improve their reliability and generalization in real-world applications.

What I'm Focusing

  • design icon

    Open-world LLM

    Developed advanced methods to enhance large language models with robust adaptability and precision in open-world settings.

  • Bio icon

    Ai4Bio

    Applied ML and statistical modeling to biological data for structure prediction, biomarker discovery, and disease analysis.

  • quant icon

    Quant

    Designing statistical and machine learning models for quantitative research, risk analysis, and trading strategies.

  • explore icon

    Exploring

    Investigating interdisciplinary problems across CS and statistics to drive real-world innovation.

Latest News

Resume

Education

  1. Virginia Tech

    2024 — 2029

    College of Engineering
    PhD candidate in Computer Science

  2. University of North Carolina at Chapel Hill

    2022 — 2024

    Gillings School of Public Health
    Master of Science in Biostatistics

  3. Zhejiang University

    2018 — 2022

    School of Mathematical Sciences
    Bachelor of Science in Statistics
    &&
    School of International Studies
    Bachelor of Arts in English Literature and Linguistics

Experience

  1. Graduate Research Assistant in AI

    July 2024 — Present

    Virginia Tech, Blacksburg, USA
    • Applied rigorous statistical methods and core computer science principles to investigate large language model (LLM) behavior, bridging the gap between theoretical guarantees and real-world performance in open-world settings.

  2. Graduate Research Assistant in Cell-type Specific RNA-sequence Genetic Deconvolution

    Aug 2022 — May 2024

    UNC-Chapel Hill, Chapel Hill, USA
    • Designed and led the development of a scalable probabilistic modeling framework that integrates single-cell and bulk RNA-seq data using a nonlinear dynamic covariate system for accurate, sample-specific cell-type expression inference. Achieved up to 123.1% improvement in Pearson correlation over existing methods across diverse datasets, demonstrating strong performance and robustness.

  3. Research Assistant in Deep Learning Mixed Effects Modeling

    Oct 2022 — May 2023

    UNC-Chapel Hill, Chapel Hill, USA
    • Developed and implemented a deep learning–based semiparametric regression framework in R to identify key biomarkers from electronic health records, adjusting for complex confounding; streamlined comparative effectiveness analysis across simulated and real-world datasets using Linux shell scripting for scalable data processing.

  4. Leading Researcher in Stock Investment System Based on Big Data Analysis and Statistical Optimization

    Mar 2021 — May 2022

    Zhejiang University, Hangzhou, China
    • Designed and implemented a mid-frequency stock trend forecasting system using statistical learning methods—including RSRS features, decision trees, and hidden Markov models—based on industry-specific factors; built a fully automated trading platform with an interactive interface using MATLAB and Java, integrating predictive analytics with real-time execution.

  5. Software Design Engineer in Test Intern

    July 2021 — Sep 2021

    Alibaba, Hangzhou, China
    • Engineered and deployed a Java-based Dynamic Source Routing (DSR) algorithm to replace a static routing system, incorporating a statistically-informed node energy metric to optimize route evaluation and ensure adaptive, business-aligned performance under dynamic network conditions.

  6. Quantitative Research Intern

    Jan 2021 — Feb 2021

    Tenbagger Capital Management, Hangzhou, China
    • Built and maintained a structured financial database using web crawlers to collect key indicators (e.g., transaction rate, gross profit margin, leverage ratio); developed a Python-based long-short trading strategy leveraging statistical analysis and automated data pipelines.

Projects

  1. Deep-Learning-based Wildlife Image Recognition

    Sep 2021 — May 2022

    Zhejiang University, Hangzhou, China
    • Developed a deep learning pipeline integrating UNet-based image segmentation and Z-score normalization to enhance data quality; conducted comparative evaluation of CNN architectures (VGG, ResNet, MobileNet, InceptionV3) on the segmented dataset, resulting in a robust target recognition model with improved accuracy and reduced loss.

  2. Thyroid Medical Image & Video Classification

    Sep 2021 — Jan 2022

    Zhejiang University, Hangzhou, China
    • Collaborated with clinical staff to collect and preprocess medical imaging data; conducted comparative analysis of CNN architectures for thyroid image classification and successfully adapted the best-performing model to video data for real-time detection of potential thyroid abnormalities.

  3. Deep Learning for Large-Scale Image Classification

    July 2021 — Nov 2021

    McGill University, Remote
    • Conducted performance benchmarking of CNN models on large-scale network image data using statistical evaluation of loss and accuracy metrics; enhanced EfficientNet with a Convolutional Block Attention Module (CBAM), improving model performance from less than50% to 84.24% accuracy by strengthening spatial and channel-wise feature representation.

  4. Personalized Fashion Recommendation via Facial Analysis

    Mar 2020 — Mar 2021

    Zhejiang University, Hangzhou, China
    • Built a data-driven personalized recommendation system by collecting user preferences and image data via web crawlers; implemented a face recognition pipeline combining SVM and PCA, and developed a Local Adversarial Disentangling Network (LADN) for makeup transfer, demonstrating the statistical and visual impact of personalized styling.

Publications

Publications

  1. LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection

    2025

    Xinyue Zeng, Haohui Wang, Junhong Lin, Jun Wu, Tyler Cody, Dawei Zhou
    International Conference on Machine Learning (ICML)

  2. Scientific Hypothesis Generation and Validation: Methods, Datasets, and Future Directions

    2025

    Adithya Kulkarni, Fatimah Alotaibi, Xinyue Zeng, Longfeng Wu, Tong Zeng, Barry Menglong Yao, Minqian Liu, Shuaicheng Zhang, Lifu Huang, Dawei Zhou

  3. Cell-type specific inference from bulk RNA-sequencing data by integrating single cell reference profiles via EPIC-unmix

    2024

    Chenwei Tang, Quan Sun, Xinyue Zeng, Xiaoyu Yang, Fei Liu, Jinying Zhao, Yin Shen, Bixiang Liu, Jia Wen, Yun Li

  4. Development of stock investment system based on big data analysis and statistical optimization

    2022

    Xinyue Zeng, Yuting Fu, Haiyun Zou, Peng Zhang
    National Innovation and Entrepreneurship Program at Zhejiang University

  5. Future Cities Report 2021

    2021

    Xinyue Zeng
    Academy of Internet Finance at Zhejiang University

  6. Analysis and Research on Macroeconomic Regulation and Control under Market Fluctuations

    2020

    China's Strategic Emerging, Volume 202009, Page 3

  7. Personalized Fashion Recommendation Software Based on Big Data

    2020

    Xueer Ni and Xinyue Zeng
    Student Research Training Program at Zhejiang University

Portfolio

Awards & Honors

  1. Biostatistics Travel Award

    2023 — 2024

    UNC-Chapel Hill

  2. Undergraduate Innovation and Entrepreneurship Award

    2021 — 2022

    Zhejiang University

  3. Third Prize of the Fifth "LSCAT" Cup Zhejiang Translation Contest

    2020

    China Translators Association

  4. Public Service Model

    2019 — 2020

    Zhejiang University

Presentations

  1. LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection

    2025

    International Conference on Machine Learning (ICML)

  2. Gaussian-Process-Based Cell Type Specific Unmixing of Bulk Expression Profiles

    2024

    ENAR 2024 Spring Meeting

  3. Wildlife recognition based on deep learning

    2022

    Dissertation Defense for Undergraduate Thesis at Zhejiang University

Talks

  1. Guest Lecture: CS 4824

    March 2025

    Virginia Tech

Copyrights

  1. Statistical Analysis Platform

    2023

    Software Copyright in China

Contact

Contact Form