Hi, I obtained my master degree and bachelor degree from School of Computing, National University of Singapore (NUS) and Hong Kong Baptist University (HKBU), respectively. I am currently pursuing my PhD degree at Hong Kong University of Science and Technology under the supervision of Prof.Xuming HU.

Previously, I had extensive internship experience in both industry and academia, including NLP Team, ByteDance AI Lab (supervised by Mr.Yang WANG and Dr.Hang LI, director of ByteDance Research), AI Research, Squirrel AI (supervised by Dr.Qingsong WEN, head of AI and chief scientist), LLM Group, Institute for Advanced Algorithms Research (co-supervised by Dr.Zhiyu LI, Dr.Feiyu XIONG, and Prof.Weinan E), Reefknot Investment (co-supervised by Mr.Marc DRAGON, managing director of Reefknot, and Prof.Wei Ngan CHIN, vice dean of SoC, NUS), and University of California, Berkeley (supervised by Dr.Qing ZHU, research scientist at Lawrence Berkeleey National Laboratory). I also conducted research at CityMind Lab, HKUST(GZ) (led by Prof.Yuxuan LIANG) before.

My research interests include natural language processing , multimodal representation learning , data mining applications including urban computing and recommendation systems . Look forward to any academic collaboration.

đź“– Education

  • Now, PhD, Hong Kong University of Science and Technology
  • 2021 - 2023, Master, National University of Singapore
  • 2017 - 2021, Undergraduate, Hong Kong Baptist University (President’s Honour Roll )

📝 Selected Publications

Note: * as Co-first Author; † as Corresponding Author

WWW 2024
sym

UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web

Yibo Yan, Haomin Wen, Siru Zhong, Wei Chen, Haodong Chen, Qingsong Wen, Roger Zimmermann, Yuxuan Liang†

  • First-ever LLM-enhanced framework that integrates the knowledge of textual modality into urban imagery profiling.

The International World Wide Web Conference 2024, Singapore (WWW’24)

Oral Presentation

CIKM 2024
sym

GeoReasoner: Reasoning On Geospatially Grounded Context For Natural Language Understanding

Yibo Yan, Joey Lee†

  • A pipeline integrating linguistic and geospatial information, showcasing the advantages of an LLM-assisted workflow over conventional methods in geo-reasoning tasks.

33rd ACM International Conference on Information and Knowledge Management, Idaho, USA (CIKM’24)

Best Short Paper Award

ICLR Workshop 2025
sym

ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection

Yibo Yan, Shen Wang, Jiahao Huo, Hang Li, Boyan Li, Jiamin Su, Xiong Gao, Yi-Fan Zhang, Tianlong Xu, Zhendong Chu, Aoxiao Zhong, Kun Wang, Hui Xiong, Philip S. Yu, Xuming Hu†, Qingsong Wen†

  • First benchmark designed to assess MLLMs’ complex reasoning capabilities in multimodal error detection.

Workshop on Reasoning and Planning for LLMs, 13th International Conference on Learning Representations, Singapore (ICLR Workshop’25)

arXiv 2025
sym

A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges

Yibo Yan, Jiamin Su, Jianxiang He, Fangteng Fu, Xu Zheng, Yuanhuiyi Lyu, Kun Wang, Shen Wang, Qingsong Wen, Xuming Hu†

  • First comprehensive analysis of mutlimodal mathematical reasoning in the era of MLLMs.

Under Review

arXiv 2025
sym

MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection

Yibo Yan, Shen Wang, Jiahao Huo, Philip S. Yu, Xuming Hu†, Qingsong Wen†

  • A novel Mixture-of-Math-Agent framework designed specifically to address mutlimodal error detection.

Under Review

arXiv 2025
sym

Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning

Yibo Yan, Shen Wang, Jiahao Huo, Jingheng Ye, Zhendong Chu, Xuming Hu†, Philip S. Yu, Carla Gomes, Bart Selman, Qingsong Wen†

  • This position paper argues that MLLMs can significantly advance scientific reasoning across disciplines such as mathematics, physics, chemistry, and biology.

Under Review

InfoFusion 2024
sym

Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy, Advances, and Outlook

Xingchen Zou*, Yibo Yan*, Xixuan Hao, Yuehong Hu, Haomin Wen, Erdong Liu, Junbo Zhang, Yong Li, Tianrui Li, Yu Zheng, Yuxuan Liang†

  • First comprehensive survey that systematically reviews studies on deep learning-based multimodal and multi-source data fusion models in urban computing.

Information Fusion Journal (IF=15)

arXiv 2025
sym

MMUnlearner: Reformulating Multimodal Machine Unlearning in the Era of Multimodal Large Language Models

Jiahao Huo*, Yibo Yan*, Xu Zheng, Yuanhuiyi Lyu, Xin Zou, Zhihua Wei, Xuming Hu†

  • We propose to reformulate the task of multimodal MU in the era of MLLMs, and develop a novel geometry-constrained gradient descent method MMUnlearner.

Under Review

arXiv 2025
sym

EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models

Jiamin Su*, Yibo Yan*, Fangteng Fu, Han Zhang, Jingheng Ye, Xiang Liu, Jiahao Huo, Huiyu Zhou, Xuming Hu†

  • First multimodal benchmark to evaluate Automated Essay Scoring (AES) capabilities across lexical-, sentence-, and discourse-level traits.

Under Review

ICLR 2025
sym

Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality

Guanyu Zhou, Yibo Yan, Xin Zou, Kun Wang, Aiwei Liu, Xuming Hu†

  • A causal inference framework that applies structural causal modeling to MLLMs, treating modality priors as a confounder between attention mechanisms and output.

13th International Conference on Learning Representations, Singapore (ICLR’25)

EMNLP 2024
sym

MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model

Jiahao Huo, Yibo Yan, Boren Hu, Yutao Yue, Xuming Hu†

  • Investigation of the distribution of domain-specific neurons and the mechanism of how MLLMs process features from diverse domains.

Conference on Empirical Methods in Natural Language Processing 2024, Florida, USA (EMNLP’24)

arXiv 2024
sym

Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models

Kening Zheng*, Junkai Chen*, Yibo Yan, Xin Zou, Xuming Hu†

  • A comprehensive benchmark specifically targeting relation hallucinations, consisting of over 20k samples derived from real-world scenarios.

Under Review

arXiv 2024
sym

Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models

Xin Zou*, Yizhou Wang*, Yibo Yan, Sirui Huang, Kening Zheng, Junkai Chen, Chang Tang, Xuming Hu†

  • A novel hallucination mitigation paradigm that without the need for external knowledge retrieval or additional fine-tuning.

Under Review

AAAI 2025
sym

UrbanVLP: A Multi-Granularity Vision-Language Pre-Trained Model for Urban Indicator Prediction

Xixuan Hao*, Wei Chen*, Yibo Yan, Siru Zhong, Kun Wang, Qingsong Wen, Yuxuan Liang†

  • First urban region representation learning framework that explores multi-granularity cross-modal alignment.

39th Annual AAAI Conference on Artificial Intelligence, Philadelphia, USA (AAAI’25)

ACM MM 2024
sym

UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation

Siru Zhong, Xixuan Hao, Yibo Yan, Ying Zhang, Yangqiu Song, Yuxuan Liang†

  • First cross-domain framework that integrates the power of LMM and SAM into satellite image-text retrieval.

32nd ACM Multimedia Conference, Melbourne, Australia (ACM MM’24)

đź’» Work Experience

  • Jun 2024 - Apr 2025, AI Research, Squirrel AI, Remote.
    • Focus: Multimodal LLM for Education
    • Supervisors: Dr.Shen WANG (Staff Research Scientist) and Dr.Qingsong WEN (Head of AI Research & Chief Scientist)
    • Achievement: Developed benchmark and algorithms for MLLM-based reasoning in real-life educational scenarios
  • Feb 2024 - May 2024, LLM Group, Institute for Advanced Algorithms Research, Shanghai, Remote.
    • Focus: LLM Hallucination Mitigation
    • Supervisors: Dr.Zhiyu LI (Principal Researcher of LLM Team) and Dr.Feiyu XIONG (Director of LLM Group)
    • Achievement: Mitigated the entity-level hallucination in real-life news corpora from Xinhua News Agency
  • Nov 2022 - Jul 2023, AI Lab, ByteDance, Singapore.
    • Focus: NLP (esp. user intent recognition and conversation modelling) and Recommendation Systems
    • Supervisors: Mr.Yang WANG (Leader of Conversation Team) and Dr.Hang LI (Director of ByteDance Research, Fellow of ACM/ACL/IEEE)
    • Achievement: Successfully designed and deployed multiple models in real-life applications such as Tiktok Intelligence Customer Service and Douyin E-commerce Platform
  • May 2022 - Sep 2022, Reefknot Investment, Singapore.
    • A joint venture between Temasek and Kuehne+Nagel
    • Focus: Graph Analytics, NLP (esp. entity resolution), Federated Learning
    • Supervisors: Mr.Marc DRAGON (Managing Director of Reefknot) and Prof.Wei Ngan CHIN (Associate Professor and Vice Dean of SoC, NUS)
    • Achievement: Comprehensive tech analysis for target deep-tech start-ups
  • Jul 2020 - Sep 2020, UC Berkeley, Remote.
    • Focus: Casual Modelling for Earth Science
    • Supervisor: Dr.Qing ZHU (Research Scientist at Institute for Data Science)
    • Achievement: Developed a transfer entropy-based climate diagnostic tool for Pearl River Delta

🎖 Honors and Awards

  • 2024-2025, Postgraduate Scholarship, HKUST(GZ)
  • 2024, Best Short Paper Award , CIKM 2024
  • 2023, Silver Medal , OTTO - Multi-Objective Recommender System, Kaggle Competition
  • 2023, Silver Medal , Stable Diffusion - Image to Prompts, Kaggle Competition
  • 2021, Best Undergraduate Thesis (Remote Sensing Track)
  • 2019-2020, 2020-2021, First-class Academic Award , HKBU
  • 2017-2018, 2018-2019, Second-class Academic Award , HKBU