My general research interests cover the broad area of computer vision, machine learning and artificial intelligence, with special emphasis on building intelligent visual systems. My research goal is to utilize artificial intelligence techniques to make machines perceive, understand, imagine, and interact with the surrounding environment, and ultimately make high positive impacts on various fields. Our current research interests and focus include: 1. visual scene understanding, perception, reconstruction, representation learning, multimodal learning; 2. generative modeling, visual content creation, generation, and manipulation (image/video/3d); 3. autonomous driving, embodied ai, robot learning, LLM applications etc.
Our
Depth Anything
is integrated into
Apple Core ML Models
for fantastic applications.
Our
Depth Anything
won the CVPR 2024
Best Demo Honorable Mention
.
Our
Point Transformer V3
won the CVPR 2024
Waymo 3D Semantic Segmentation Challenge Champion
.
Invited talk at the CVPR 2024
Area Chair Workshop
.
Invited talk at the ECCV 2024 Workshop on
Large-scale Video Object Segmentation
.
Invited talk at the ECCV 2024 Workshop on
Synthetic Data for Computer Vision
.
We are organizing the CVPR 2024 Tutorial on
All You Need To Know About Point Cloud Understanding
.
We are organizing the ICML 2024 Workshop on
Multimodal Foundation Models for Embodied Agents
.
I am recognized as one of the most influential scholars in computer vision by AI 2000 in
2022, 2023, and 2024
.
I serve as an Area Chair for CVPR 2023, NeurIPS 2023, WACV 2023, CVPR 2024, ECCV 2024, NeurIPS 2024, ACMMM 2024.
I serve as a Senior Program Committee for AAAI 2023, AAAI 2024, and AAAI 2025.
I serve as an Associate Editor for Pattern Recognition, and a Guest Editor for IEEE TCSVT.
Pinned projects:
1. New innovations:
Depth Anything V1
&
V2
,
Point Transformer V3
,
GPT4Point
,
UniMODE
,
AnyDoor
,
LivePhoto
,
MimicBrush
,
Pixel-GS
; 2. Highly optimized codebase available for 3D scene understanding
Pointcept (PTv1&PTv2&PTv3&MSC&PPT)
; 3. Highly optimized codebase available for semantic segmentation
semseg (PSPNet&PSANet)
.
Depth Anything V2
Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng,
Hengshuang Zhao
.
arXiv, 2024.
[
Project
]
[
Paper
]
[
Code
]
[
Demo
]
[
Media
]
Zero-shot Image Editing with Reference Imitation
Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen,
Hengshuang Zhao
.
arXiv, 2024.
[
Project
]
[
Paper
]
[
Code
]
[
Demo
]
[
Media
]
LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence
Zhuoling Li, Xiaogang Xu, Zhenhua Xu, Ser-Nam Lim,
Hengshuang Zhao
.
arXiv, 2024.
[
Project
]
[
Paper
]
[
Demo
]
[
Video
]
Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
Zheng Zhang, Wenbo Hu, Yixing Lao, Tong He,
Hengshuang Zhao
.
European Conference on Computer Vision (
ECCV
), 2024.
[
Project
]
[
Paper
]
[
Code
]
LivePhoto: Real Image Animation with Text-guided Motion Control
Xi Chen, Zhiheng Liu, Mengting Chen, Yutong Feng, Yu Liu, Yujun Shen,
Hengshuang Zhao
.
European Conference on Computer Vision (
ECCV
), 2024.
[
Project
]
[
Paper
]
[
Code
]
[
Video
]
InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping
Zhenhua Xu, Kwan-Yee. K. Wong,
Hengshuang Zhao
.
European Conference on Computer Vision (
ECCV
), 2024.
[
Project
]
[
Paper
]
[
Code
]
[
Video
]
OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation
Zhenyu Wang, Yali Li, Taichi Liu,
Hengshuang Zhao
†
, Shengjin Wang. (†: corresponding)
European Conference on Computer Vision (
ECCV
), 2024.
[
Paper
]
LogoSticker: Inserting Logos into Diffusion Models for Customized Generation
Mingkang Zhu, Xi Chen, Zhongdao Wang,
Hengshuang Zhao
, Jiaya Jia.
European Conference on Computer Vision (
ECCV
), 2024.
[Project]
[Paper]
[Code]
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Zhening Huang, Xiaoyang Wu, Xi Chen,
Hengshuang Zhao
†
, Lei Zhu, Joan Lasenby. (†: corresponding)
European Conference on Computer Vision (
ECCV
), 2024.
[
Project
]
[
Paper
]
[
Code
]
[
Video
]
Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
Longxiang Tang, Zhuotao Tian, Kai Li, Chunming He, Hantao Zhou,
Hengshuang Zhao
, Xiu Li, Jiaya Jia.
European Conference on Computer Vision (
ECCV
), 2024.
[
Paper
]
[
Code
]
UniDetector: Towards Universal Object Detection with Heterogeneous Supervision
Zhenyu Wang, Yali Li, Xi Chen, Ser-Nam Lim, Antonio Torralba,
Hengshuang Zhao
†
, Shengjin Wang. (†: corresponding)
IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI
), 2024.
[
Paper
]
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee. K. Wong, Zhenguo Li,
Hengshuang Zhao
.
IEEE Robotics and Automation Letters (
RA-L
), 2024.
[
Project
]
[
Paper
]
[Code]
[
Video
]
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng,
Hengshuang Zhao
.
Computer Vision and Pattern Recognition (
CVPR
), 2024.
[
Project
]
[
Paper
]
[
Code
]
[
Demo
]
[
Media
]
AnyDoor: Zero-shot Object-level Image Customization
Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao,
Hengshuang Zhao
.
Computer Vision and Pattern Recognition (
CVPR
), 2024.
[
Project
]
[
Paper
]
[
Code
]
[
Demo
]
[
Media
]
Point Transformer V3: Simpler, Faster, Stronger
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He,
Hengshuang Zhao
.
Computer Vision and Pattern Recognition (
CVPR
), 2024.
Oral
Ranked 1st place in the CVPR 2024
Waymo 3D Semantic Segmentation Challenge
.
[
Paper
]
[
Code
]
Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training
Xiaoyang Wu, Zhuotao Tian, Xin Wen, Bohao Peng, Xihui Liu, Kaicheng Yu,
Hengshuang Zhao
.
Computer Vision and Pattern Recognition (
CVPR
), 2024.
[
Paper
]
[
Code
]
CVPR Highlight
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Zhangyang Qi, Ye Fang, Zeyi Sun, Xiaoyang Wu, Tong Wu, Jiaqi Wang, Dahua Lin,
Hengshuang Zhao
.
Computer Vision and Pattern Recognition (
CVPR
), 2024.
Highlight
[
Project
]
[
Paper
]
[
Code
]
UniMODE: Universal Monocular 3D Object Detection
Zhuoling Li, Xiaogang Xu, Ser-Nam Lim,
Hengshuang Zhao
.
Computer Vision and Pattern Recognition (
CVPR
), 2024.
Highlight
[
Paper
]
GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding
Chengyao Wang, Li Jiang, Xiaoyang Wu, Zhuotao Tian, Bohao Peng,
Hengshuang Zhao
†
, Jiaya Jia. (†: corresponding)
Computer Vision and Pattern Recognition (
CVPR
), 2024.
[
Paper
]
[
Code
]
OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation
Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen,
Hengshuang Zhao
, Zhuotao Tian, Jiaya Jia.
Computer Vision and Pattern Recognition (
CVPR
), 2024.
[
Paper
]
[
Code
]
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
Yunhan Yang, Yukun Huang, Xiaoyang Wu, Yuan-Chen Guo, Song-Hai Zhang,
Hengshuang Zhao
, Tong He, Xihui Liu.
Computer Vision and Pattern Recognition (
CVPR
), 2024.
[
Project
]
[
Paper
]
[
Code
]
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Zhihao Yuan, Jinke Ren, Chun-Mei Feng,
Hengshuang Zhao
, Shuguang Cui, Zhen Li.
Computer Vision and Pattern Recognition (
CVPR
), 2024.
[
Project
]
[
Paper
]
[
Code
]
[
Video
]
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang,
Hengshuang Zhao
, Qibo Qiu, Binbin Lin, Xiaofei He, Wanli Ouyang.
Computer Vision and Pattern Recognition (
CVPR
), 2024.
[
Paper
]
[
Code
]
Influencer Backdoor Attack on Semantic Segmentation
Haoheng Lan, Jindong Gu, Philip Torr,
Hengshuang Zhao
.
International Conference on Learning Representations (
ICLR
), 2024.
Highlight
[
Paper
]
[
Code
]
NeurIPS
FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models
Lihe Yang, Xiaogang Xu, Bingyi Kang, Yinghuan Shi,
Hengshuang Zhao
.
Neural Information Processing Systems (
NeurIPS
), 2023.
[
Paper
]
[
Code
]
Uni3DETR: Unified 3D Detection Transformer
Zhenyu Wang, Yali Li, Xi Chen,
Hengshuang Zhao
†
, Shengjin Wang. (†: corresponding)
Neural Information Processing Systems (
NeurIPS
), 2023.
[
Paper
]
[
Code
]
NeurIPS
TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation
Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao,
Hengshuang Zhao
.
Neural Information Processing Systems (
NeurIPS
), 2023.
[
Paper
]
[
Code
]
CorresNeRF: Image Correspondence Priors for Neural Radiance Fields
Yixing Lao, Xiaogang Xu, Zhipeng Cai, Xihui Liu,
Hengshuang Zhao
.
Neural Information Processing Systems (
NeurIPS
), 2023.
[
Project
]
[
Paper
]
[
Code
]
Open-vocabulary Panoptic Segmentation with Embedding Modulation
Xi Chen, Shuang Li, Ser-Nam Lim, Antonio Torralba,
Hengshuang Zhao
.
International Conference on Computer Vision (
ICCV
), 2023.
[
Project
]
[
Paper
]
[
Code
]
Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning
Lihe Yang, Zhen Zhao, Lei Qi, Yu Qiao, Yinghuan Shi,
Hengshuang Zhao
.
International Conference on Computer Vision (
ICCV
), 2023.
[
Paper
]
[
Code
]
SAM3D: Segment Anything in 3D Scenes
Yunhan Yang, Xiaoyang Wu, Tong He,
Hengshuang Zhao
, Xihui Liu.
International Conference on Computer Vision Workshop (
ICCVW
), 2023.
[
Paper
]
[
Code
]
Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning
Xiaoyang Wu, Xin Wen, Xihui Liu,
Hengshuang Zhao
.
Computer Vision and Pattern Recognition (
CVPR
), 2023.
[
Paper
]
[
Code
]
Detecting Everything in the Open World: Towards Universal Object Detection
Zhenyu Wang, Yali Li, Xi Chen, Ser-Nam Lim, Antonio Torralba,
Hengshuang Zhao
†
, Shengjin Wang. (†: corresponding)
Computer Vision and Pattern Recognition (
CVPR
), 2023.
[
Paper
]
[
Code
]
Point Transformer V2: Grouped Vector Attention and Partition-based Pooling
Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu,
Hengshuang Zhao
.
Neural Information Processing Systems (
NeurIPS
), 2022.
[
Paper
]
[
Code
]
MTFormer: Multi-Task Learning via Transformer and Cross-Task Reasoning
Xiaogang Xu*,
Hengshuang Zhao*
, Vibhav Vineet, Ser-Nam Lim, Antonio Torralba. (*: equal contribution)
European Conference on Computer Vision (
ECCV
), 2022.
[
Paper
]
[
Code
]
FocalClick: Towards Practical Interactive Image Segmentation
Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi,
Hengshuang Zhao
.
Computer Vision and Pattern Recognition (
CVPR
), 2022.
[
Paper
]
[
Code
]
Fully Convolutional Networks for Panoptic Segmentation with Point-based Supervision
Yanwei Li,
Hengshuang Zhao
, Xiaojuan Qi, Yukang Chen, Lu Qi, Liwei Wang, Zeming Li, Jian Sun, Jiaya Jia.
IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI
), 2022.
[
Paper
]
[
Code
]
Open World Entity Segmentation
Lu Qi, Jason Kuen, Yi Wang, Jiuxiang Gu,
Hengshuang Zhao
, Philip Torr, Zhe Lin, Jiaya Jia.
IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI
), 2022.
[
Project
]
[
Paper
]
[
Code
]
Do Different Tracking Tasks Require Different Appearance Models?
Zhongdao Wang,
Hengshuang Zhao
, Yali Li, Shengjin Wang, Philip Torr, Luca Bertinetto.
Neural Information Processing Systems (
NeurIPS
), 2021.
[
Project
]
[
Paper
]
[
Code
]
Point Transformer
Hengshuang Zhao
, Li Jiang, Jiaya Jia, Philip Torr, Vladlen Koltun.
International Conference on Computer Vision (
ICCV
), 2021.
Oral
[
Paper
]
[
Code
]
Bidirectional Projection Network for Cross Dimension Scene Understanding
Wenbo Hu*,
Hengshuang Zhao*
, Li Jiang, Jiaya Jia, Tien-Tsin Wong. (*: equal contribution)
Computer Vision and Pattern Recognition (
CVPR
), 2021.
Oral
[
Project
]
[
Paper
]
[
Code
]
Fully Convolutional Networks for Panoptic Segmentation
Yanwei Li,
Hengshuang Zhao
, Xiaojuan Qi, Liwei Wang, Zeming Li, Jian Sun, Jiaya Jia.
Computer Vision and Pattern Recognition (
CVPR
), 2021.
Oral
[
Paper
]
[
Code
]
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Sixiao Zheng, Jiachen Lu,
Hengshuang Zhao
, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip Torr, Li Zhang.
Computer Vision and Pattern Recognition (
CVPR
), 2021.
[
Project
]
[
Paper
]
[
Code
]
Exploring Self-attention for Image Recognition
Hengshuang Zhao
, Jiaya Jia, Vladlen Koltun.
Computer Vision and Pattern Recognition (
CVPR
), 2020.
[
Paper
]
[
Code
]
PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation
Li Jiang*,
Hengshuang Zhao*
, Shaoshuai Shi, Shu Liu, Chi-Wing Fu, Jiaya Jia. (*: equal contribution)
Computer Vision and Pattern Recognition (
CVPR
), 2020.
Oral
[
Paper
]
[
Code
]
PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing
Hengshuang Zhao*
, Li Jiang*, Chi-Wing Fu, and Jiaya Jia. (*: equal contribution)
Computer Vision and Pattern Recognition (
CVPR
), 2019.
[
Paper
]
[
Code
]
[
Video
]
UPSNet: A Unified Panoptic Segmentation Network
Yuwen Xiong*, Renjie Liao*,
Hengshuang Zhao*
, Rui Hu, Min Bai, Ersin Yumer, Raquel Urtasun. (*: equal contribution)
Computer Vision and Pattern Recognition (
CVPR
), 2019.
Oral
[
Paper
]
[
Code
]
PSANet: Point-wise Spatial Attention Network for Scene Parsing
Hengshuang Zhao*
, Yi Zhang*, Shu Liu, Jianping Shi, Chen Change Loy, Dahua Lin, Jiaya Jia. (*: equal contribution)
European Conference on Computer Vision (
ECCV
), 2018.
Ranked 1st place in the CVPR 2018
WAD Drivable Area Segmentation Challenge
.
[
Project
]
[
Paper
]
[
Caffe
]
[
PyTorch
]
[
Video
]
[
Supp
]
[
Slides in WAD @ CVPR 2018
]
SegStereo: Exploiting Semantic Information for Disparity Estimation
Guorun Yang*,
Hengshuang Zhao*
, Jianping Shi, Zhidong Deng, Jiaya Jia. (*: equal contribution)
European Conference on Computer Vision (
ECCV
), 2018.
[
Project
]
[
Paper
]
[
Code
]
[
Video
]
[
Supp
]
ICNet for Real-Time Semantic Segmentation on High-Resolution Images
Hengshuang Zhao
, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, Jiaya Jia.
European Conference on Computer Vision (
ECCV
), 2018.
[
Project
]
[
Paper
]
[
Code
]
[
Video
]
[
Supp
]
Pyramid Scene Parsing Network
Hengshuang Zhao
, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia.
Computer Vision and Pattern Recognition (
CVPR
), 2017.
Ranked 1st place in the ECCV 2016
ImageNet Scene Parsing Challenge
.
Ranked 1st place in the CVPR 2017
LSUN Semantic Segmentation Challenge
.
[
Project
]
[
Paper
]
[
Caffe
]
[
PyTorch
]
[
Video
]
[
Slides in ILSVRC2016@ECCV2016
]