TitleLow-level vision processing : new approaches and sensors
Authors
Issue Date2023
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Wang, Z. [王州霞]. (2023). Low-level vision processing : new approaches and sensors. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractLow-level Vision processing aims to pixel-wisely process low-quality vision data, such as images and videos, to attain their high-quality ones. Low-level vision processing is complex since it contains a wide variety of low-quality data and involves many scenarios. In this thesis, we study low-level vision processing in three kinds of scenarios: scenarios with human faces only, natural scenarios, and an extremely challenging scenario. For each scenario, we delicately design a corresponding approach according to the property of the unprocessed data and scenarios to attain high-quality processing results.
Our studies of scenarios with human faces only mainly focus on blind face restoration. First, we propose a RestoreFormer++ for blind face image restoration. It introduces fully-spatial attention mechanisms to model the contextual information and the interplay with the priors, achieving high-quality face images with both realness and fidelity. Its priors are matched from a learned reconstruction-oriented high-quality dictionary which is more accordant to the face restoration task, leading to rich details in the restored face images. Moreover, it is more robust and general to real-world degradation since its well-designed extending degrading model alleviates the synthetic-to-real-world gap. Then, we extend our study to face video restoration. We systematically analyze the potential benefits and difficulties posed by current face image restoration algorithms when extended to real-world face video restoration and provide a viable solution to mitigate the analyzed difficulties.
Our study of natural scenarios is image deblurring. In this work, we introduce an event-based vision sensor, which can detect per-pixel brightness changes in microsecond resolution. Considering the complementary between the intensity images captured with a frame-based camera and event data captured with an event camera in temporal and spatial aspects, we propose to alternately enhance the quality of intensity image and even data with a DeblurNet and EventSRNet.
In addition, we study an extremely challenging scenario, whose dynamic range is extremely high, and focus on exposure bracketing selection. Exposure bracketing selection aims to predict a sequence of images captured in different exposure times to attain high dynamic range images. Our proposed exposure bracketing selection network (EBSNet) makes decisions according to the illumination distribution and semantic information extracted from only one auto-exposure preview image, releasing itself from a series of restrictions, such as camera response function and sensor noise model. EBSNet is learned with reinforcement learning and rewarded with a multi-exposure fusion network (MEFNet) used for fusing the images captured under the exposure time predicted by EBSNet. Joint training of EBSNet and MEFNet can improve the accuracy of exposure bracketing selection and the quality of multi-exposure fusion.
We have conducted experiments to evaluate the effectiveness of our proposed approaches and, to shed light on the development of low-level vision processing, we provide a real-world low-quality face video benchmark and an exposure bracketing selection benchmark.
DegreeDoctor of Philosophy
SubjectImage processing - Data processing
Dept/ProgramComputer Science
Persistent Identifier
http://hdl.handle.net/10722/335162
dc.contributor.authorWang, Zhouxia-
dc.contributor.author王州霞-
dc.date.accessioned2023-11-13T07:45:05Z-
dc.date.available2023-11-13T07:45:05Z-
dc.date.issued2023-
dc.identifier.citationWang, Z. [王州霞]. (2023). Low-level vision processing : new approaches and sensors. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/335162-
dc.description.abstractLow-level Vision processing aims to pixel-wisely process low-quality vision data, such as images and videos, to attain their high-quality ones. Low-level vision processing is complex since it contains a wide variety of low-quality data and involves many scenarios. In this thesis, we study low-level vision processing in three kinds of scenarios: scenarios with human faces only, natural scenarios, and an extremely challenging scenario. For each scenario, we delicately design a corresponding approach according to the property of the unprocessed data and scenarios to attain high-quality processing results.
Our studies of scenarios with human faces only mainly focus on blind face restoration. First, we propose a RestoreFormer++ for blind face image restoration. It introduces fully-spatial attention mechanisms to model the contextual information and the interplay with the priors, achieving high-quality face images with both realness and fidelity. Its priors are matched from a learned reconstruction-oriented high-quality dictionary which is more accordant to the face restoration task, leading to rich details in the restored face images. Moreover, it is more robust and general to real-world degradation since its well-designed extending degrading model alleviates the synthetic-to-real-world gap. Then, we extend our study to face video restoration. We systematically analyze the potential benefits and difficulties posed by current face image restoration algorithms when extended to real-world face video restoration and provide a viable solution to mitigate the analyzed difficulties.
Our study of natural scenarios is image deblurring. In this work, we introduce an event-based vision sensor, which can detect per-pixel brightness changes in microsecond resolution. Considering the complementary between the intensity images captured with a frame-based camera and event data captured with an event camera in temporal and spatial aspects, we propose to alternately enhance the quality of intensity image and even data with a DeblurNet and EventSRNet.
In addition, we study an extremely challenging scenario, whose dynamic range is extremely high, and focus on exposure bracketing selection. Exposure bracketing selection aims to predict a sequence of images captured in different exposure times to attain high dynamic range images. Our proposed exposure bracketing selection network (EBSNet) makes decisions according to the illumination distribution and semantic information extracted from only one auto-exposure preview image, releasing itself from a series of restrictions, such as camera response function and sensor noise model. EBSNet is learned with reinforcement learning and rewarded with a multi-exposure fusion network (MEFNet) used for fusing the images captured under the exposure time predicted by EBSNet. Joint training of EBSNet and MEFNet can improve the accuracy of exposure bracketing selection and the quality of multi-exposure fusion.
We have conducted experiments to evaluate the effectiveness of our proposed approaches and, to shed light on the development of low-level vision processing, we provide a real-world low-quality face video benchmark and an exposure bracketing selection benchmark.
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshImage processing - Data processing-
dc.titleLow-level vision processing : new approaches and sensors-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2023-
dc.identifier.mmsid991044736606203414-