Week 3

Week 3 Progress Report #

Executive Summary #

In Week 3, the Kolobok team achieved major technical integration milestones while simultaneously pushing the boundaries of data engineering and model experimentation. The MVP is now operational for tread depth estimation and spike condition classification across both Telegram and web interfaces. Brand recognition, the third major component, is in active R&D, with promising results achieved using GPT-4o-mini.

This week also marked a shift in engineering philosophy — moving from static databases and hand-constructed logic to a more dynamic, learning-based system design. From synthetic dataset generation in Unity to CLAHE-enhanced unwrapping for OCR, we continuously prioritized robustness, real-world performance, and system modularity. Our architecture now accommodates user feedback and model corrections, reflecting a maturing product vision that emphasizes trust, usability, and ML-aided transparency.


Feature Implementation #

End-to-End Flow #

The core pipeline now works across both Telegram and web UI:

  • User submits a photo (tread-side or sidewall)
  • Backend authenticates and processes using deployed ML models
  • Results include tread depth, spike count, and condition (brand OCR pending)
  • Users may correct predictions manually (bot and site)

Only brand/model recognition is pending deployment. The LLM-based OCR pipeline is developed and tested but not yet integrated.


OCR Research and Integration #

Investigation and Evaluation #

The team explored and benchmarked six OCR pipelines, focusing on accuracy and preprocessing sensitivity. Models included:

  • Tesseract (Google OCR)
  • MMOCR (OpenMMLab) variants: DBNet++, PSENet, PANet, TextSnake
  • GPT-4o-mini (OpenAI Vision Language Model)

Each was tested with raw images, polar unwrapping, and CLAHE enhancement.

OCR Evaluation Results #

OCR PipelineRawUnwrappedCLAHE
Tesseract498
DBNet++ + ABINet61015
PSENet + ABINet51114
TextSnake + ABINet71216
PANet + ABINet389
GPT-4o-mini374545

GPT-4o-mini achieved perfect accuracy (45/45) on the benchmark using unwrapped CLAHE-enhanced images. It was selected for integration in Week 4.


ML Pipeline Development #

Tread Depth Estimation #

  • Ensemble regression with Swin Transformer, DenseNet, ConvNeXt (in development)
  • Unity-generated dataset used for pretraining and edge case augmentation
  • MAE on test set: ~0.6 mm
  • Augmentations: rotations, crops

Spike Classification #

  • Binary classifier using ResNet-like CNN
  • Dataset expanded with 6000+ bootstrapped samples
  • Tires without spikes used for hard negatives
  • Final FP + FN on test set: 10

Model Architecture and Configuration #

Regression Model #

  • Loss: MSE + MAE monitoring
  • Optimizer: AdamW
  • LR Scheduler: CosineAnnealingLR
  • Batch size: 16
  • Epochs: 40

Spike Classification #

  • Loss: CrossEntropy + hard-negative mining
  • Augmentations: crop, rotate, CLAHE

Logging #

  • TensorBoard: MAE trends, misclassified visualizations, histograms
  • Checkpointing: per-epoch with val metrics

Experimental Insights #

ExperimentFindingOutcome
CLAHE vs HECLAHE outperformed consistentlyStandardized CLAHE
Hard negativesReduced false positives by ~20%Included in training
GPT OCR vs MMOCRGPT superior on real-world samplesAdopted GPT-4o-mini
Ensemble vs single model0.2mm better MAEFinal model is stacked ensemble
Unwrapping for OCRBoosted recognition by 3×Pipeline requirement

API, Bot, and Web UI #

Backend (FastAPI) #

  • Auth: Bearer token
  • Endpoints: /analyze/tread, /identify_tire
  • Error codes: 400 (bad image), 401 (auth)

Telegram Bot #

  • Manual correction of predictions
  • Robust state handling

Web Interface #

  • Functional MVP with drag-and-drop upload
  • Connects to same API backend
  • Design aligns with Telegram UX

Data Handling & Privacy #

  • No user data stored
  • Brand/model database replaced by GPT queries
  • Only temporary logs used for diagnostics

Testing and Feedback #

IssueFix
Spike false positivesAdded negative tire images
Lighting sensitivityUnity-based shadow samples
OCR failuresSwitched to GPT-4o-mini

Roadmap #

  • Integrate GPT-based OCR into full pipeline
  • Conduct small user study (10 users)
  • Add admin dashboard for request analysis
  • Finalize dataset with versioning and backups
  • Fine-tune depth model using correction feedback

Lessons Learned #

  • Real-world variance must drive training strategy
  • ML + UX = user trust
  • LLMs simplify pipelines previously requiring deep tuning
  • Good error design prevents user frustration

Team Contributions #

Team MemberContributions
Nikita MenshikovWrote the report, set tasks, pitched dataset augmentation techniques, setup labelling for new spikes dataset
Nikita ZagainovConducted OCR research for brand and parameter recognition, built tire segmentator ( 1, 2, 3)
Dmitry TetkinModeled synthetic tires in Unity, expanded training data
Vladislav StrelkovConnected endpoints to tg bot, refined user paths, enhanced it with functionality for the demo
Darya StepanovaDeveloped the first version of site frontend, basic back and landing
Sergey AitovImproved spikes labelling by running pretrained model on it, implemented spikes counter ( 1, 2)
Ekaterina PetrovaPerformed research on tire thread depth and implemented a module

Confirmation of the code’s operability #

We confirm that the code in the main branch:

  • In working condition
  • Run via method described in README.md