Overview
This project studied urban-scene semantic segmentation using Cityscapes imagery and a U-Net baseline enhanced with Convolutional Block Attention Modules. The model work was paired with an application layer that uploads an image, runs inference, renders class-selectable overlays, and produces deterministic scene-analysis output from the predicted mask so the result can be reviewed as evidence instead of only as a color overlay.
Key Features
- Trained U-Net and CBAM-enhanced U-Net segmentation workflows on OSC resources for advanced AI coursework.
- Compared local experimental results across U-Net, U-Net plus CBAM, SwinV2B, FCN, YOLOv11, MobileNet V3, and DeepLabV3.
- Reported the CBAM-enhanced U-Net as strongest in the local comparison, with 0.903 pixel accuracy, 0.686 IoU, and 0.824 FWIoU in the report table.
- Built a Flask application surface for uploading street-scene images, selecting models, and rendering semantic overlays.
- Added explainable analysis derived from the segmentation mask: class coverage, semantic groups, approximate object counts, layout priors, scene tags, warnings, and summary text.
- Published the application workflow as a Hugging Face Space for public demonstration.
Evidence
Interactive Slide Deck
Cityscape Segmentation Presentation
The report explicitly frames the results as implementation-grounded local comparisons rather than official benchmark leaderboard claims. The strongest defensible takeaway is that the CBAM-enhanced U-Net outperformed the compared architectures in this project setup and produced a stronger basis for the explainable scene-analysis layer.
What I Learned
This project connected model architecture, compute operations, evaluation, and deployment. The key lesson was that segmentation quality is only one part of the system: reproducible OSC training, clear metric reporting, model comparison, and explainable post-processing all matter when turning an AI model into a usable application.