I'm pursuing my Ph.D. degree in the University of HongKong, Faculty of Dentistry (Ranking 2nd in the world), specializing in Medical AI and MLLM, supervised by Prof. Kuo Feng Hung and Prof. Tsoi, James Kit Hon.
Previously, I worked as a Computer Vision Engineer on Baidu VIS from 2022.07 to 2024.08. I received my M.S. degree in Huazhong University of Science and Technology (HUST, 2022), and B.S. degree in Chinese University of Mining and Technology (CUMT, 2020).
My research interests span the area of computer vision, self-supervised pre-training, multimodal large language model (mllm), and AI4Science.
 / 
 / 
 / 
 / 
News
[Jun. 2026] Two papers have been accepted to MICCAI 2026 ! Many thanks to Jiamin Wu and Ming Hu.
[Feb. 2026] One paper OralGPT-Omni has been accepted to CVPR 2026. Congratulations ๐๐๐
[Feb. 2026] One paper OralGPT-Plus has been accepted to CVPR 2026. Congratulations ๐๐๐
We present OralAgent, the first dental-specialized AI agent that unifies multimodal reasoning, tool-based decision-making, and knowledge-grounded retrieval within an end-to-end automated framework.
We present OralGPT-Omni, the first dental-specialized MLLM designed for comprehensive and trustworthy analysis across diverse dental imaging modalities and clinical tasks. We also introduce MMOral-Uni, the first unified multimodal benchmark for dental image analysis.
We present OralGPT-Plus, an agentic visionโlanguage model designed to perform iterative and symmetry-aware diagnostic reasoning for panoramic dental radiograph analysis.
We introduce MMOral, the first large-scale multimodal instruction dataset and benchmark tailored for panoramic X-ray interpretation. We also propose OralGPT, a multimodal vision-language model for panoramic X-ray analysis.
Open-source oral-maxillofacial imaging datasets were identified through electronic databases and dataset platforms. 105 datasets with 437538 images and 100 intraoral videos from patients across twenty-one countries were included.
T-Mamba is the first work to introduce frequency-based features into vision mamba, its flexibility allows it to process both 2D and 3D tooth data without the need for separate modules.
We designed a FullAnno system, which is a data engine that can generate large-scale, high-quality, and fine-grained image caption datasets automatically.
This study proposed a novel semi-supervised transformer-based framework designed for automated tooth segmentation and identification on panoramic radiographs.