I am
Yu Zhang (张彧)
.
I earned my PhD in
the College of Computer Science and Technology
,
Zhejiang University
(浙江大学计算机学院), under the supervision of
Prof. Zhou Zhao (赵洲)
.
Previously, I graduated from
Chu Kochen Honors College
, Zhejiang University (浙江大学竺可桢学院), with dual bachelor’s degrees in Computer Science and Automation.
I have also served as a visiting scholar at
University of Rochester
with
Prof. Zhiyao Duan
and
University of Massachusetts Amherst
with
Prof. Przemyslaw Grabowicz
.
My research interests primarily focus on
Multi-Modal Generative AI
, specifically in
Spatial Audio, Music, Singing, and Speech
. I have published first-author papers at top international AI conferences, such as NeurIPS, ACL, and AAAI. Currently, I am working on
spatial audio generation with multimodal prompts
and
streaming voice conversion
.
I am actively seeking research collaborations. Please feel free to contact me via email at [email protected].
🔥 News
2025.07
: We released the full dataset and evaluation code of
ISDrama
(Immersive Spatial Drama Generation through Multimodal Prompting)!
2025.07
: We released the code of
TCSinger2
(Customizable Multilingual Zero-shot Singing Voice Synthesis)!
2025.07
: 🎉 2 papers are accepted by ACM-MM 2025!
2025.06
: 🎉 I earned my PhD in Computer Science from Zhejiang University!
2025.05
: 🎉 2 papers are accepted by ACL 2025!
2025.04
: I come to the
University of Rochester
as a visiting scholar, working with
Prof. Zhiyao Duan
.
2024.12
: 🎉 1 paper is accepted by AAAI 2025!
2024.11
: We released the code of
TCSinger
(Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control)!
2024.09
: We released the full dataset and code of
GTSinger
(A Global Multi-Technique Singing Corpus for all singing tasks)!
2024.09
: 🎉 1 paper is accepted by NeurIPS 2024 (Spotlight)!
2024.09
: 🎉 1 paper is accepted by EMNLP 2024!
2024.05
: 🎉 1 paper is accepted by ACL 2024!
2024.05
: We released the code of
StyleSinger
(Style Transfer for Out-of-Domain Singing Voice Synthesis)!
2023.12
: 🎉 1 paper is accepted by AAAI 2024!
📝 Publications
*denotes co-first authors
🔊 Spatial Audio
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
Yu Zhang
, Wenxiang Guo, Changhao Pan, et al.
Project
|
MRSDrama is the first multimodal recorded spatial drama dataset, containing binaural drama audios, scripts, videos, geometric poses, and textual prompts.
ISDrama is the first immersive spatial drama generation model through multimodal prompting.
Our work is promoted by multiple media and forums, such as
and
ACM-MM 2025
A Multimodal Evaluation Framework for Spatial Audio Playback Systems: From Localization to Listener Preference
, Changhao Pan*, Wenxiang Guo*,
Yu Zhang*
, et al.
🎼 Music Generation
Versatile Framework for Song Generation with Prompt-based Control
Yu Zhang
, Wenxiang Guo, Changhao Pan, et al.
Project
VersBand is a multi-task song generation framework for synthesizing high-quality, aligned songs with prompt-based control.
TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis
Yu Zhang
, Ziyue Jiang, Ruiqi Li, et al.
Project
|
TCSinger 2 is a multi-task multilingual zero-shot SVS model with style transfer and style control based on various prompts.
Our work is promoted by multiple media and forums, such as
and
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
Yu Zhang
, Ziyue Jiang, Ruiqi Li, et al.
Project
|
TCSinger is the first zero-shot SVS model for style transfer across cross-lingual speech and singing styles, along with multi-level style control.
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
Yu Zhang
, Changhao Pan, Wenxinag Guo, et al.
Project
|
GTSinger is a large Global, multi-Technique, free-to-use, high-quality singing corpus with realistic music scores, designed for all singing tasks.
Our work is promoted by multiple media and forums, such as
,
, and
.
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis
Yu Zhang
, Rongjie Huang, Ruiqi Li, et al.
Project
|
StyleSinger is the first singing voice synthesis model for zero-shot style transfer of out-of-domain reference singing voice samples.
ACL 2025
STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation
, Wenxiang Guo*,
Yu Zhang*
, Changhao Pan*, et al. |
Project
|
AAAI 2025
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
, Wenxiang Guo,
Yu Zhang
, Changhao Pan, et al. |
Project
|
ACL 2024
Robust Singing Voice Transcription Serves Synthesis
, Ruiqi Li,
Yu Zhang
, Yongqi Wang, et al. |
Project
|
💬 Speech Synthesis
Preprint
Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion
,
Yu Zhang
, Baotong Tian, Zhiyao Duan. |
Project
Preprint
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
, Ziyue Jiang, Yi Ren, Ruiqi Li, Shengpeng Ji, Zhenhui Ye, Chen Zhang, Bai Jionghao, Xiaoda Yang, Jialong Zuo,
Yu Zhang
, et al.
💡 Others
IJCAI 2025
Leveraging Pretrained Diffusion Models for Zero-Shot Part Assembly
, Ruiyuan Zhang, Qi Wang, Jiaxiang Liu,
Yu Zhang
, et al.
📖 Educations
2020.09 - 2025.06
, PhD, Computer Science, College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang
2016.09 - 2020.06
, Undergraduate, Computer Science & Automation, Chu Kochen Honors College, Zhejiang University, Hangzhou, Zhejiang
💻 Industrial Experiences
🔍 Research Experiences
2025.04-2025.06
Visiting Scholar at
University of Rochester
, working with
Prof. Zhiyao Duan
2020.06-2020.09
Research Intern at
Alibaba-Zhejiang University Joint Institute of Frontier Technologies
, working with
Prof. Jianke Zhu
2019.07-2020.01
Visiting Scholar at
University of Massachusetts Amherst
, working with
Prof. Przemyslaw Grabowicz
.
2018.09-2019.06
Research Assistant at
Institute of Cyber-Systems and Control in Zhejiang University
, working with
Prof. Chunlin Zhou
.
2018.09-2019.06
Research Assistant at
Institute of Computer System Architecture in Zhejiang University
, working with
Prof. Chunming Wu
.
🎖 Honors and Awards
2024.09
Outstanding PhD Student Scholarship of Zhejiang University (Top 10%)
2020.06
Outstanding Graduate of Zhejiang University (Undergraduate) (Top 5%)
2019.09
First-Class Academic Scholarship of Zhejiang University (Undergraduate) (Top 5%)
📚 Academic Services
Conference Reviewer: NeurIPS (2024, 2025), ICLR (2025), ACL (2024, 2025), ACM-MM (2025), EMNLP (2024, 2025)
Journal Reviewer: IEEE TASLP.