CityScape Segmentation

Overview

This project studied urban-scene semantic segmentation using Cityscapes imagery and a U-Net baseline enhanced with Convolutional Block Attention Modules. The model work was paired with an application layer that uploads an image, runs inference, renders class-selectable overlays, and produces deterministic scene-analysis output from the predicted mask so the result can be reviewed as evidence instead of only as a color overlay.

Key Features

Trained U-Net and CBAM-enhanced U-Net segmentation workflows on OSC resources for advanced AI coursework.
Compared local experimental results across U-Net, U-Net plus CBAM, SwinV2B, FCN, YOLOv11, MobileNet V3, and DeepLabV3.
Reported the CBAM-enhanced U-Net as strongest in the local comparison, with 0.903 pixel accuracy, 0.686 IoU, and 0.824 FWIoU in the report table.
Built a Flask application surface for uploading street-scene images, selecting models, and rendering semantic overlays.
Added explainable analysis derived from the segmentation mask: class coverage, semantic groups, approximate object counts, layout priors, scene tags, warnings, and summary text.
Published the application workflow as a Hugging Face Space for public demonstration.

Evidence

Cityscape image, ground truth, and model prediction comparison — Sample segmentation comparison from the report showing input image, ground truth, and prediction panels.

Training IoU comparison graph across segmentation models — Training IoU comparison showing the CBAM-enhanced U-Net finishing strongest in the local model study.

Live Cityscape segmentation app with controls, output, and reasoning panel — Application interface with class controls, semantic overlay output, and reasoning panel.

Future pipeline diagram connecting RGB input, segmentation, depth estimation, and language reasoning — Future-work diagram from the report connecting segmentation to richer perception and explainable decision support.

Interactive Slide Deck

Cityscape Segmentation Presentation

Download PPTX

Slide 1 of 16

The report explicitly frames the results as implementation-grounded local comparisons rather than official benchmark leaderboard claims. The strongest defensible takeaway is that the CBAM-enhanced U-Net outperformed the compared architectures in this project setup and produced a stronger basis for the explainable scene-analysis layer.

What I Learned

This project connected model architecture, compute operations, evaluation, and deployment. The key lesson was that segmentation quality is only one part of the system: reproducible OSC training, clear metric reporting, model comparison, and explainable post-processing all matter when turning an AI model into a usable application.