添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
相关文章推荐
欢乐的消防车  ·  [ASP.net WebForm] ...·  2 月前    · 
幸福的豆腐  ·  闸弄口新村_百度百科·  1 年前    · 
高兴的自行车  ·  CorelDRAW Graphics ...·  2 年前    · 

I am Yu Zhang (张彧) .

I earned my PhD in the College of Computer Science and Technology , Zhejiang University (浙江大学计算机学院), under the supervision of Prof. Zhou Zhao (赵洲) . Previously, I graduated from Chu Kochen Honors College , Zhejiang University (浙江大学竺可桢学院), with dual bachelor’s degrees in Computer Science and Automation. I have also served as a visiting scholar at University of Rochester with Prof. Zhiyao Duan and University of Massachusetts Amherst with Prof. Przemyslaw Grabowicz .

My research interests primarily focus on Multi-Modal Generative AI , specifically in Spatial Audio, Music, Singing, and Speech . I have published first-author papers at top international AI conferences, such as NeurIPS, ACL, and AAAI. Currently, I am working on spatial audio generation with multimodal prompts and streaming voice conversion .

I am actively seeking research collaborations. Please feel free to contact me via email at [email protected].

🔥 News

  • 2025.07 : We released the full dataset and evaluation code of ISDrama (Immersive Spatial Drama Generation through Multimodal Prompting)!
  • 2025.07 : We released the code of TCSinger2 (Customizable Multilingual Zero-shot Singing Voice Synthesis)!
  • 2025.07 : 🎉 2 papers are accepted by ACM-MM 2025!
  • 2025.06 : 🎉 I earned my PhD in Computer Science from Zhejiang University!
  • 2025.05 : 🎉 2 papers are accepted by ACL 2025!
  • 2025.04 : I come to the University of Rochester as a visiting scholar, working with Prof. Zhiyao Duan .
  • 2024.12 : 🎉 1 paper is accepted by AAAI 2025!
  • 2024.11 : We released the code of TCSinger (Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control)!
  • 2024.09 : We released the full dataset and code of GTSinger (A Global Multi-Technique Singing Corpus for all singing tasks)!
  • 2024.09 : 🎉 1 paper is accepted by NeurIPS 2024 (Spotlight)!
  • 2024.09 : 🎉 1 paper is accepted by EMNLP 2024!
  • 2024.05 : 🎉 1 paper is accepted by ACL 2024!
  • 2024.05 : We released the code of StyleSinger (Style Transfer for Out-of-Domain Singing Voice Synthesis)!
  • 2023.12 : 🎉 1 paper is accepted by AAAI 2024!
  • 📝 Publications

    *denotes co-first authors

    🔊 Spatial Audio

    ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
    Yu Zhang , Wenxiang Guo, Changhao Pan, et al.

    Project | Hugging Face

  • MRSDrama is the first multimodal recorded spatial drama dataset, containing binaural drama audios, scripts, videos, geometric poses, and textual prompts.
  • ISDrama is the first immersive spatial drama generation model through multimodal prompting.
  • Our work is promoted by multiple media and forums, such as weixin and
  • ACM-MM 2025 A Multimodal Evaluation Framework for Spatial Audio Playback Systems: From Localization to Listener Preference , Changhao Pan*, Wenxiang Guo*, Yu Zhang* , et al.
  • 🎼 Music Generation

    Versatile Framework for Song Generation with Prompt-based Control
    Yu Zhang , Wenxiang Guo, Changhao Pan, et al.

    Project

  • VersBand is a multi-task song generation framework for synthesizing high-quality, aligned songs with prompt-based control.
  • TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis
    Yu Zhang , Ziyue Jiang, Ruiqi Li, et al.

    Project |

  • TCSinger 2 is a multi-task multilingual zero-shot SVS model with style transfer and style control based on various prompts.
  • Our work is promoted by multiple media and forums, such as weixin and

    TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
    Yu Zhang , Ziyue Jiang, Ruiqi Li, et al.

    Project |

  • TCSinger is the first zero-shot SVS model for style transfer across cross-lingual speech and singing styles, along with multi-level style control.
  • GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
    Yu Zhang , Changhao Pan, Wenxinag Guo, et al.

    Project | Hugging Face

  • GTSinger is a large Global, multi-Technique, free-to-use, high-quality singing corpus with realistic music scores, designed for all singing tasks.
  • Our work is promoted by multiple media and forums, such as weixin , weixin , and zhihu .
  • StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis
    Yu Zhang , Rongjie Huang, Ruiqi Li, et al.

    Project |

  • StyleSinger is the first singing voice synthesis model for zero-shot style transfer of out-of-domain reference singing voice samples.
  • ACL 2025 STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation , Wenxiang Guo*, Yu Zhang* , Changhao Pan*, et al. | Project |
  • AAAI 2025 TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching , Wenxiang Guo, Yu Zhang , Changhao Pan, et al. | Project |
  • ACL 2024 Robust Singing Voice Transcription Serves Synthesis , Ruiqi Li, Yu Zhang , Yongqi Wang, et al. | Project |
  • 💬 Speech Synthesis

  • Preprint Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion , Yu Zhang , Baotong Tian, Zhiyao Duan. | Project
  • Preprint MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis , Ziyue Jiang, Yi Ren, Ruiqi Li, Shengpeng Ji, Zhenhui Ye, Chen Zhang, Bai Jionghao, Xiaoda Yang, Jialong Zuo, Yu Zhang , et al.
  • 💡 Others

  • IJCAI 2025 Leveraging Pretrained Diffusion Models for Zero-Shot Part Assembly , Ruiyuan Zhang, Qi Wang, Jiaxiang Liu, Yu Zhang , et al.
  • 📖 Educations

  • 2020.09 - 2025.06 , PhD, Computer Science, College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang
  • 2016.09 - 2020.06 , Undergraduate, Computer Science & Automation, Chu Kochen Honors College, Zhejiang University, Hangzhou, Zhejiang
  • 💻 Industrial Experiences

    🔍 Research Experiences

  • 2025.04-2025.06 Visiting Scholar at University of Rochester , working with Prof. Zhiyao Duan
  • 2020.06-2020.09 Research Intern at Alibaba-Zhejiang University Joint Institute of Frontier Technologies , working with Prof. Jianke Zhu
  • 2019.07-2020.01 Visiting Scholar at University of Massachusetts Amherst , working with Prof. Przemyslaw Grabowicz .
  • 2018.09-2019.06 Research Assistant at Institute of Cyber-Systems and Control in Zhejiang University , working with Prof. Chunlin Zhou .
  • 2018.09-2019.06 Research Assistant at Institute of Computer System Architecture in Zhejiang University , working with Prof. Chunming Wu .
  • 🎖 Honors and Awards

  • 2024.09 Outstanding PhD Student Scholarship of Zhejiang University (Top 10%)
  • 2020.06 Outstanding Graduate of Zhejiang University (Undergraduate) (Top 5%)
  • 2019.09 First-Class Academic Scholarship of Zhejiang University (Undergraduate) (Top 5%)
  • 📚 Academic Services

  • Conference Reviewer: NeurIPS (2024, 2025), ICLR (2025), ACL (2024, 2025), ACM-MM (2025), EMNLP (2024, 2025)
  • Journal Reviewer: IEEE TASLP.
  •