Week 3 Progress Report #

Executive Summary #

In Week 3, the Kolobok team achieved major technical integration milestones while simultaneously pushing the boundaries of data engineering and model experimentation. The MVP is now operational for tread depth estimation and spike condition classification across both Telegram and web interfaces. Brand recognition, the third major component, is in active R&D, with promising results achieved using GPT-4o-mini.

This week also marked a shift in engineering philosophy — moving from static databases and hand-constructed logic to a more dynamic, learning-based system design. From synthetic dataset generation in Unity to CLAHE-enhanced unwrapping for OCR, we continuously prioritized robustness, real-world performance, and system modularity. Our architecture now accommodates user feedback and model corrections, reflecting a maturing product vision that emphasizes trust, usability, and ML-aided transparency.

Feature Implementation #

End-to-End Flow #

The core pipeline now works across both Telegram and web UI:

User submits a photo (tread-side or sidewall)
Backend authenticates and processes using deployed ML models
Results include tread depth, spike count, and condition (brand OCR pending)
Users may correct predictions manually (bot and site)

Only brand/model recognition is pending deployment. The LLM-based OCR pipeline is developed and tested but not yet integrated.

OCR Research and Integration #

Investigation and Evaluation #

The team explored and benchmarked six OCR pipelines, focusing on accuracy and preprocessing sensitivity. Models included:

Tesseract (Google OCR)
MMOCR (OpenMMLab) variants: DBNet++, PSENet, PANet, TextSnake
GPT-4o-mini (OpenAI Vision Language Model)

Each was tested with raw images, polar unwrapping, and CLAHE enhancement.

OCR Evaluation Results #

OCR Pipeline	Raw	Unwrapped	CLAHE
Tesseract	4	9	8
DBNet++ + ABINet	6	10	15
PSENet + ABINet	5	11	14
TextSnake + ABINet	7	12	16
PANet + ABINet	3	8	9
GPT-4o-mini	37	45	45

GPT-4o-mini achieved perfect accuracy (45/45) on the benchmark using unwrapped CLAHE-enhanced images. It was selected for integration in Week 4.

ML Pipeline Development #

Tread Depth Estimation #

Ensemble regression with Swin Transformer, DenseNet, ConvNeXt (in development)
Unity-generated dataset used for pretraining and edge case augmentation
MAE on test set: ~0.6 mm
Augmentations: rotations, crops

Spike Classification #

Binary classifier using ResNet-like CNN
Dataset expanded with 6000+ bootstrapped samples
Tires without spikes used for hard negatives
Final FP + FN on test set: 10

Model Architecture and Configuration #

Regression Model #

Loss: MSE + MAE monitoring
Optimizer: AdamW
LR Scheduler: CosineAnnealingLR
Batch size: 16
Epochs: 40

Spike Classification #

Loss: CrossEntropy + hard-negative mining
Augmentations: crop, rotate, CLAHE

Logging #

TensorBoard: MAE trends, misclassified visualizations, histograms
Checkpointing: per-epoch with val metrics

Experimental Insights #

Experiment	Finding	Outcome
CLAHE vs HE	CLAHE outperformed consistently	Standardized CLAHE
Hard negatives	Reduced false positives by ~20%	Included in training
GPT OCR vs MMOCR	GPT superior on real-world samples	Adopted GPT-4o-mini
Ensemble vs single model	0.2mm better MAE	Final model is stacked ensemble
Unwrapping for OCR	Boosted recognition by 3×	Pipeline requirement

API, Bot, and Web UI #

Backend (FastAPI) #

Auth: Bearer token
Endpoints: /analyze/tread, /identify_tire
Error codes: 400 (bad image), 401 (auth)

Telegram Bot #

Manual correction of predictions
Robust state handling

Web Interface #

Functional MVP with drag-and-drop upload
Connects to same API backend
Design aligns with Telegram UX

Data Handling & Privacy #

No user data stored
Brand/model database replaced by GPT queries
Only temporary logs used for diagnostics

Testing and Feedback #

Issue	Fix
Spike false positives	Added negative tire images
Lighting sensitivity	Unity-based shadow samples
OCR failures	Switched to GPT-4o-mini

Roadmap #

Integrate GPT-based OCR into full pipeline
Conduct small user study (10 users)
Add admin dashboard for request analysis
Finalize dataset with versioning and backups
Fine-tune depth model using correction feedback

Lessons Learned #

Real-world variance must drive training strategy
ML + UX = user trust
LLMs simplify pipelines previously requiring deep tuning
Good error design prevents user frustration

Team Contributions #

Team Member	Contributions
Nikita Menshikov	Wrote the report, set tasks, pitched dataset augmentation techniques, setup labelling for new spikes dataset
Nikita Zagainov	Conducted OCR research for brand and parameter recognition, built tire segmentator ( 1, 2, 3)
Dmitry Tetkin	Modeled synthetic tires in Unity, expanded training data
Vladislav Strelkov	Connected endpoints to tg bot, refined user paths, enhanced it with functionality for the demo
Darya Stepanova	Developed the first version of site frontend, basic back and landing
Sergey Aitov	Improved spikes labelling by running pretrained model on it, implemented spikes counter ( 1, 2)
Ekaterina Petrova	Performed research on tire thread depth and implemented a module

Confirmation of the code’s operability #

We confirm that the code in the main branch:

In working condition
Run via method described in README.md