I am a third year Ph.D student in the Department of Automation at Tsinghua University, advised by Prof.
Jiwen Lu
and Prof.
Jie Zhou
. In 2020, I obtained my B.Eng. in the Department of Automation, Tsinghua University.
I am broadly interested in computer vision and deep learning. My current research focuses on model architectures and generative models.
Email
 / 
Google Scholar
 / 
Github
2022-03:
Check out our work at
CVPR 2022
on
language-guided dense prediction (
DenseCLIP
).
2021-09:
GFNet
and
DynamicViT
are accepted to
NeurIPS 2021
.
2021-07:
2 papers on video understanding and interpretable metric learning are accepted to
ICCV 2021
.
UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models
Wenliang Zhao
*,
Lujia Bai
*,
Yongming Rao
,
Jie Zhou
,
Jiwen Lu
preprint
[arXiv]
[Code]
[Project Page]
UniPC is a training-free framework designed for the fast sampling of diffusion models, which consists of a corrector (UniC) and a predictor (UniP) that share a unified analytical form and support arbitrary orders.
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
Yongming Rao
*,
Wenliang Zhao
*,
Yansong Tang
,
Jie Zhou
,
Ser-Nam Lim
,
Jiwen Lu
NeurIPS
, 2022
[arXiv]
[Code]
[Project Page]
[中文解读]
HorNet is a family of generic vision backbones that perform explicit high-order spatial interactions based on Recursive Gated Convolution.
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
Yongming Rao
*,
Wenliang Zhao*
,
Guangyi Chen
,
Yansong Tang
,
Jie Zhou
,
Jiwen Lu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (
CVPR
)
, 2022
[arXiv]
[Code]
[Project Page]
[中文解读]
DenseCLIP is a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP.
Global Filter Networks for Image Classification
Yongming Rao
*,
Wenliang Zhao*
,
Zheng Zhu
,
Jiwen Lu
,
Jie Zhou
Conference on Neural Information Processing Systems (
NeurIPS
)
, 2021
[arXiv]
[Code]
[Project Page]
[中文解读(By HappyAIWalker)]
Global Filter Networks is a transformer-style architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
Yongming Rao
,
Wenliang Zhao
,
Benlin Liu
,
Jiwen Lu
,
Jie Zhou
,
Cho-Jui Hsieh
Conference on Neural Information Processing Systems (
NeurIPS
)
, 2021
[arXiv]
[Code]
[Project Page]
[知乎]
We present a dynamic token sparsification framework to prune redundant tokens in vision transformers progressively and dynamically based on the input.
Towards Interpretable Deep Metric Learning with Structural Matching
Wenliang Zhao*
,
Yongming Rao
*, Ziyi Wang,
Jiwen Lu
,
Jie Zhou
IEEE International Conference on Computer Vision (
ICCV
)
, 2021
[arXiv]
[Code]
We present a framework (DIML) to add interpretability to metric learning and improve the performance of deep metric learning models.
Group-aware Contrastive Regression for Action Quality Assessment
Xumin Yu
*,
Yongming Rao
*,
Wenliang Zhao
,
Jiwen Lu
,
Jie Zhou
IEEE International Conference on Computer Vision (
ICCV
)
, 2021
We propose a new contrastive regression (CoRe) framework to learn the relative scores by pair-wise comparison, which highlights the differences between videos and guides the models to learn the key hints for assessment.
2020 Outstanding Undergraduate, Tsinghua University
2018 Tang Lixin Scholarship, Tsinghua University
2019 Tsinghua Presidential Award Nomination, Tsinghua University
2018 Zheng Weimin Scholarship, Tsinghua University
2018 Jiang Nanxiang Scholarship, Tsinghua University
2018 1st prize in 36th Challenge Cup, Tsinghua University
2017 Qualcomm Scholarship
2017 National Scholarship, Tsinghua University