添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
I am an Assistant Professor at the Department of Computer Science of The University of Hong Kong . Previously, I have spent wonderful times as a Postdoctoral Researcher at Computer Science and Artificial Intelligence Laboratory (CSAIL) of MIT , supervised by Prof. Antonio Torralba , at Torr Vision Group of University of Oxford (beautiful Oxford), supervised by Prof. Philip Torr . I obtained my Ph.D. degree from The Chinese University of Hong Kong , supervised by Prof. Jiaya Jia , and my bachelor's degree from Huazhong University of Science and Technology . During Ph.D., I have spent wonderful times as a Research Intern, working with Dr. Xiaohui Shen , Dr. Zhe Lin , Dr. Kalyan Sunkavalli , Dr. Brian Price at Adobe (San Jose), Prof. Raquel Urtasun at Uber (Toronto), and Dr. Vladlen Koltun at Intel (Santa Clara).

My general research interests cover the broad area of computer vision, machine learning and artificial intelligence, with special emphasis on building intelligent visual systems. My research goal is to utilize artificial intelligence techniques to make machines perceive, understand, imagine, and interact with the surrounding environment, and ultimately make high positive impacts on various fields. Our current research interests and focus include: 1. visual scene understanding, perception, reconstruction, representation learning, multimodal learning; 2. generative modeling, visual content creation, generation, and manipulation (image/video/3d); 3. autonomous driving, embodied ai, robot learning, LLM applications etc.

Prospective students: I am looking for self-motivated Ph.D. students, postdoctoral reseachers, research assistants, and visiting scholars, working together on exciting and cutting-edge computer vision, machine learning and artificial intelligence projects. If you are interested in working with me, please drop me an email with your resume. Available Ph.D. scholarships and opportunities include Hong Kong PhD Fellowship Scheme (HKPFS) , HKU Presidential PhD Scholar Programme (HKUPS) , and Postgraduate Scholarships (PGS) .
  • Our Depth Anything is integrated into Apple Core ML Models for fantastic applications.
  • Our Depth Anything won the CVPR 2024 Best Demo Honorable Mention .
  • Our Point Transformer V3 won the CVPR 2024 Waymo 3D Semantic Segmentation Challenge Champion .
  • Invited talk at the CVPR 2024 Area Chair Workshop .
  • Invited talk at the ECCV 2024 Workshop on Large-scale Video Object Segmentation .
  • Invited talk at the ECCV 2024 Workshop on Synthetic Data for Computer Vision .
  • We are organizing the CVPR 2024 Tutorial on All You Need To Know About Point Cloud Understanding .
  • We are organizing the ICML 2024 Workshop on Multimodal Foundation Models for Embodied Agents .
  • I am recognized as one of the most influential scholars in computer vision by AI 2000 in 2022, 2023, and 2024 .
  • I serve as an Area Chair for CVPR 2023, NeurIPS 2023, WACV 2023, CVPR 2024, ECCV 2024, NeurIPS 2024, ACMMM 2024.
  • I serve as a Senior Program Committee for AAAI 2023, AAAI 2024, and AAAI 2025.
  • I serve as an Associate Editor for Pattern Recognition, and a Guest Editor for IEEE TCSVT.
  • Pinned projects: 1. New innovations: Depth Anything V1 & V2 , Point Transformer V3 , GPT4Point , UniMODE , AnyDoor , LivePhoto , MimicBrush , Pixel-GS ; 2. Highly optimized codebase available for 3D scene understanding Pointcept (PTv1&PTv2&PTv3&MSC&PPT) ; 3. Highly optimized codebase available for semantic segmentation semseg (PSPNet&PSANet) .
  • Depth Anything V2
    Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao .
    arXiv, 2024. [ Project ] [ Paper ] [ Code ] [ Demo ] [ Media ] Zero-shot Image Editing with Reference Imitation
    Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, Hengshuang Zhao .
    arXiv, 2024. [ Project ] [ Paper ] [ Code ] [ Demo ] [ Media ] LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence
    Zhuoling Li, Xiaogang Xu, Zhenhua Xu, Ser-Nam Lim, Hengshuang Zhao .
    arXiv, 2024. [ Project ] [ Paper ] [ Demo ] [ Video ] Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
    Zheng Zhang, Wenbo Hu, Yixing Lao, Tong He, Hengshuang Zhao .
    European Conference on Computer Vision ( ECCV ), 2024.
    [ Project ] [ Paper ] [ Code ] LivePhoto: Real Image Animation with Text-guided Motion Control
    Xi Chen, Zhiheng Liu, Mengting Chen, Yutong Feng, Yu Liu, Yujun Shen, Hengshuang Zhao .
    European Conference on Computer Vision ( ECCV ), 2024.
    [ Project ] [ Paper ] [ Code ] [ Video ] InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping
    Zhenhua Xu, Kwan-Yee. K. Wong, Hengshuang Zhao .
    European Conference on Computer Vision ( ECCV ), 2024.
    [ Project ] [ Paper ] [ Code ] [ Video ] OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation
    Zhenyu Wang, Yali Li, Taichi Liu, Hengshuang Zhao , Shengjin Wang. (†: corresponding)
    European Conference on Computer Vision ( ECCV ), 2024.
    [ Paper ] LogoSticker: Inserting Logos into Diffusion Models for Customized Generation
    Mingkang Zhu, Xi Chen, Zhongdao Wang, Hengshuang Zhao , Jiaya Jia.
    European Conference on Computer Vision ( ECCV ), 2024.
    [Project] [Paper] [Code] OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
    Zhening Huang, Xiaoyang Wu, Xi Chen, Hengshuang Zhao , Lei Zhu, Joan Lasenby. (†: corresponding)
    European Conference on Computer Vision ( ECCV ), 2024.
    [ Project ] [ Paper ] [ Code ] [ Video ] Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
    Longxiang Tang, Zhuotao Tian, Kai Li, Chunming He, Hantao Zhou, Hengshuang Zhao , Xiu Li, Jiaya Jia.
    European Conference on Computer Vision ( ECCV ), 2024.
    [ Paper ] [ Code ] UniDetector: Towards Universal Object Detection with Heterogeneous Supervision
    Zhenyu Wang, Yali Li, Xi Chen, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao , Shengjin Wang. (†: corresponding)
    IEEE Transactions on Pattern Analysis and Machine Intelligence ( TPAMI ), 2024. [ Paper ] DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
    Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee. K. Wong, Zhenguo Li, Hengshuang Zhao .
    IEEE Robotics and Automation Letters ( RA-L ), 2024. [ Project ] [ Paper ] [Code] [ Video ] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
    Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao .
    Computer Vision and Pattern Recognition ( CVPR ), 2024. [ Project ] [ Paper ] [ Code ] [ Demo ] [ Media ] AnyDoor: Zero-shot Object-level Image Customization
    Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao .
    Computer Vision and Pattern Recognition ( CVPR ), 2024. [ Project ] [ Paper ] [ Code ] [ Demo ] [ Media ] Point Transformer V3: Simpler, Faster, Stronger
    Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, Hengshuang Zhao .
    Computer Vision and Pattern Recognition ( CVPR ), 2024. Oral Ranked 1st place in the CVPR 2024 Waymo 3D Semantic Segmentation Challenge . [ Paper ] [ Code ] Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training
    Xiaoyang Wu, Zhuotao Tian, Xin Wen, Bohao Peng, Xihui Liu, Kaicheng Yu, Hengshuang Zhao .
    Computer Vision and Pattern Recognition ( CVPR ), 2024. [ Paper ] [ Code ]
    CVPR Highlight
    GPT4Point: A Unified Framework for Point-Language Understanding and Generation
    Zhangyang Qi, Ye Fang, Zeyi Sun, Xiaoyang Wu, Tong Wu, Jiaqi Wang, Dahua Lin, Hengshuang Zhao .
    Computer Vision and Pattern Recognition ( CVPR ), 2024. Highlight [ Project ] [ Paper ] [ Code ] UniMODE: Universal Monocular 3D Object Detection
    Zhuoling Li, Xiaogang Xu, Ser-Nam Lim, Hengshuang Zhao .
    Computer Vision and Pattern Recognition ( CVPR ), 2024. Highlight [ Paper ] GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding
    Chengyao Wang, Li Jiang, Xiaoyang Wu, Zhuotao Tian, Bohao Peng, Hengshuang Zhao , Jiaya Jia. (†: corresponding)
    Computer Vision and Pattern Recognition ( CVPR ), 2024. [ Paper ] [ Code ] OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation
    Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Hengshuang Zhao , Zhuotao Tian, Jiaya Jia.
    Computer Vision and Pattern Recognition ( CVPR ), 2024. [ Paper ] [ Code ] DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
    Yunhan Yang, Yukun Huang, Xiaoyang Wu, Yuan-Chen Guo, Song-Hai Zhang, Hengshuang Zhao , Tong He, Xihui Liu.
    Computer Vision and Pattern Recognition ( CVPR ), 2024. [ Project ] [ Paper ] [ Code ] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
    Zhihao Yuan, Jinke Ren, Chun-Mei Feng, Hengshuang Zhao , Shuguang Cui, Zhen Li.
    Computer Vision and Pattern Recognition ( CVPR ), 2024. [ Project ] [ Paper ] [ Code ] [ Video ] UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
    Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang, Hengshuang Zhao , Qibo Qiu, Binbin Lin, Xiaofei He, Wanli Ouyang.
    Computer Vision and Pattern Recognition ( CVPR ), 2024. [ Paper ] [ Code ] Influencer Backdoor Attack on Semantic Segmentation
    Haoheng Lan, Jindong Gu, Philip Torr, Hengshuang Zhao .
    International Conference on Learning Representations ( ICLR ), 2024. Highlight [ Paper ] [ Code ]
    NeurIPS
    FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models
    Lihe Yang, Xiaogang Xu, Bingyi Kang, Yinghuan Shi, Hengshuang Zhao .
    Neural Information Processing Systems ( NeurIPS ), 2023. [ Paper ] [ Code ] Uni3DETR: Unified 3D Detection Transformer
    Zhenyu Wang, Yali Li, Xi Chen, Hengshuang Zhao , Shengjin Wang. (†: corresponding)
    Neural Information Processing Systems ( NeurIPS ), 2023. [ Paper ] [ Code ]
    NeurIPS
    TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation
    Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao .
    Neural Information Processing Systems ( NeurIPS ), 2023. [ Paper ] [ Code ] CorresNeRF: Image Correspondence Priors for Neural Radiance Fields
    Yixing Lao, Xiaogang Xu, Zhipeng Cai, Xihui Liu, Hengshuang Zhao .
    Neural Information Processing Systems ( NeurIPS ), 2023. [ Project ] [ Paper ] [ Code ] Open-vocabulary Panoptic Segmentation with Embedding Modulation
    Xi Chen, Shuang Li, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao .
    International Conference on Computer Vision ( ICCV ), 2023. [ Project ] [ Paper ] [ Code ] Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning
    Lihe Yang, Zhen Zhao, Lei Qi, Yu Qiao, Yinghuan Shi, Hengshuang Zhao .
    International Conference on Computer Vision ( ICCV ), 2023. [ Paper ] [ Code ] SAM3D: Segment Anything in 3D Scenes
    Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao , Xihui Liu.
    International Conference on Computer Vision Workshop ( ICCVW ), 2023. [ Paper ] [ Code ] Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning
    Xiaoyang Wu, Xin Wen, Xihui Liu, Hengshuang Zhao .
    Computer Vision and Pattern Recognition ( CVPR ), 2023. [ Paper ] [ Code ] Detecting Everything in the Open World: Towards Universal Object Detection
    Zhenyu Wang, Yali Li, Xi Chen, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao , Shengjin Wang. (†: corresponding)
    Computer Vision and Pattern Recognition ( CVPR ), 2023. [ Paper ] [ Code ] Point Transformer V2: Grouped Vector Attention and Partition-based Pooling
    Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, Hengshuang Zhao .
    Neural Information Processing Systems ( NeurIPS ), 2022. [ Paper ] [ Code ] MTFormer: Multi-Task Learning via Transformer and Cross-Task Reasoning
    Xiaogang Xu*, Hengshuang Zhao* , Vibhav Vineet, Ser-Nam Lim, Antonio Torralba. (*: equal contribution)
    European Conference on Computer Vision ( ECCV ), 2022. [ Paper ] [ Code ] FocalClick: Towards Practical Interactive Image Segmentation
    Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi, Hengshuang Zhao .
    Computer Vision and Pattern Recognition ( CVPR ), 2022. [ Paper ] [ Code ] Fully Convolutional Networks for Panoptic Segmentation with Point-based Supervision
    Yanwei Li, Hengshuang Zhao , Xiaojuan Qi, Yukang Chen, Lu Qi, Liwei Wang, Zeming Li, Jian Sun, Jiaya Jia.
    IEEE Transactions on Pattern Analysis and Machine Intelligence ( TPAMI ), 2022. [ Paper ] [ Code ] Open World Entity Segmentation
    Lu Qi, Jason Kuen, Yi Wang, Jiuxiang Gu, Hengshuang Zhao , Philip Torr, Zhe Lin, Jiaya Jia.
    IEEE Transactions on Pattern Analysis and Machine Intelligence ( TPAMI ), 2022. [ Project ] [ Paper ] [ Code ] Do Different Tracking Tasks Require Different Appearance Models?
    Zhongdao Wang, Hengshuang Zhao , Yali Li, Shengjin Wang, Philip Torr, Luca Bertinetto.
    Neural Information Processing Systems ( NeurIPS ), 2021. [ Project ] [ Paper ] [ Code ] Point Transformer
    Hengshuang Zhao , Li Jiang, Jiaya Jia, Philip Torr, Vladlen Koltun.
    International Conference on Computer Vision ( ICCV ), 2021. Oral [ Paper ] [ Code ] Bidirectional Projection Network for Cross Dimension Scene Understanding
    Wenbo Hu*, Hengshuang Zhao* , Li Jiang, Jiaya Jia, Tien-Tsin Wong. (*: equal contribution)
    Computer Vision and Pattern Recognition ( CVPR ), 2021. Oral [ Project ] [ Paper ] [ Code ] Fully Convolutional Networks for Panoptic Segmentation
    Yanwei Li, Hengshuang Zhao , Xiaojuan Qi, Liwei Wang, Zeming Li, Jian Sun, Jiaya Jia.
    Computer Vision and Pattern Recognition ( CVPR ), 2021. Oral [ Paper ] [ Code ] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
    Sixiao Zheng, Jiachen Lu, Hengshuang Zhao , Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip Torr, Li Zhang.
    Computer Vision and Pattern Recognition ( CVPR ), 2021. [ Project ] [ Paper ] [ Code ] Exploring Self-attention for Image Recognition
    Hengshuang Zhao , Jiaya Jia, Vladlen Koltun.
    Computer Vision and Pattern Recognition ( CVPR ), 2020. [ Paper ] [ Code ] PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation
    Li Jiang*, Hengshuang Zhao* , Shaoshuai Shi, Shu Liu, Chi-Wing Fu, Jiaya Jia. (*: equal contribution)
    Computer Vision and Pattern Recognition ( CVPR ), 2020. Oral [ Paper ] [ Code ] PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing
    Hengshuang Zhao* , Li Jiang*, Chi-Wing Fu, and Jiaya Jia. (*: equal contribution)
    Computer Vision and Pattern Recognition ( CVPR ), 2019. [ Paper ] [ Code ] [ Video ] UPSNet: A Unified Panoptic Segmentation Network
    Yuwen Xiong*, Renjie Liao*, Hengshuang Zhao* , Rui Hu, Min Bai, Ersin Yumer, Raquel Urtasun. (*: equal contribution)
    Computer Vision and Pattern Recognition ( CVPR ), 2019. Oral [ Paper ] [ Code ] PSANet: Point-wise Spatial Attention Network for Scene Parsing
    Hengshuang Zhao* , Yi Zhang*, Shu Liu, Jianping Shi, Chen Change Loy, Dahua Lin, Jiaya Jia. (*: equal contribution)
    European Conference on Computer Vision ( ECCV ), 2018. Ranked 1st place in the CVPR 2018 WAD Drivable Area Segmentation Challenge . [ Project ] [ Paper ] [ Caffe ] [ PyTorch ] [ Video ] [ Supp ] [ Slides in WAD @ CVPR 2018 ] SegStereo: Exploiting Semantic Information for Disparity Estimation
    Guorun Yang*, Hengshuang Zhao* , Jianping Shi, Zhidong Deng, Jiaya Jia. (*: equal contribution)
    European Conference on Computer Vision ( ECCV ), 2018. [ Project ] [ Paper ] [ Code ] [ Video ] [ Supp ] ICNet for Real-Time Semantic Segmentation on High-Resolution Images
    Hengshuang Zhao , Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, Jiaya Jia.
    European Conference on Computer Vision ( ECCV ), 2018. [ Project ] [ Paper ] [ Code ] [ Video ] [ Supp ] Pyramid Scene Parsing Network
    Hengshuang Zhao , Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia. Computer Vision and Pattern Recognition ( CVPR ), 2017. Ranked 1st place in the ECCV 2016 ImageNet Scene Parsing Challenge . Ranked 1st place in the CVPR 2017 LSUN Semantic Segmentation Challenge . [ Project ] [ Paper ] [ Caffe ] [ PyTorch ] [ Video ] [ Slides in ILSVRC2016@ECCV2016 ]