A Novel Multimodal Large Language Model-Based Approach for Urban Flood Detection Using Open-Access Closed Circuit Television in Bandung, Indonesia

Monitoring urban pluvial floods remains a challenge, particularly in dense city environ- ments where drainage overflows are localized, and sensor-based systems are often im- practical. Physical sensors can be costly, prone to theft, and difficult to maintain in areas with high human activity. To address this, we developed an innovative flood detection framework that utilizes publicly accessible CCTV imagery and large language models (LLMs) to classify flooding conditions directly from images using natural language prompts. The system was tested in Bandung, Indonesia, across 340 CCTV locations over a one-year period. Four multimodal LLMs, ChatGPT-4.1, Gemini 2.5 Pro, Mistral Pixtral, and DeepSeek-VL Janus, were evaluated based on classification accuracy and operational cost. ChatGPT-4.1 achieved the highest overall accuracy at 85%, with higher performance during the daytime (89%) and lower accuracy at night (78%). A cost analysis showed that deploying GPT-4.1 every 15 min across all locations would require approximately USD 59,568 per year. However, using compact models like GPT-4 nano could reduce costs by up to seven times, with minimal loss of accuracy. These results highlight the trade-off between performance and affordability, especially in developing regions. This approach offers a scalable, passive flood monitoring solution that can be integrated into early warn- ing systems. Future improvements may include multi-frame image analysis, automated confidence filtering, and multi-level flood classification for enhanced situational aware- ness.

Welcome to the lab.

Both the Chinese and English websites are currently being updated.
The information may not be up to date and please feel free to contact
thsyang@nycu.edu.tw for more details.