Week3

Week #3 #

Description of implemented MVP features and the functional user journey(s). #

  • Component reactive drop down list - for fans, coaches and analytics
  • Banner for next match - for fans
  • Setup state management in React application
  • Pixel perfect design implementation - for fans, coaches and journalists/analysts
  • Tournament page with statistics - for coaches, journalist and fans
  • Built database architecture for teams, players, etc.
  • Connected backend method to frontend
  • ML for anticipating the match result - for coaches, fans, journalists

Detailed user stories #

User story for coaches #

  • As a football team coach, I want to see basic statistics of any team and a simple match prediction between two selected teams so I can get a quick overview of potential opponents.

User story for football fans #

  • As a football fan, I want to check the upcoming matches and see a simple win probability for any two selected teams so I can discuss possible outcomes with friends.

User story for journalist and analyst #

  • As a sports journalist and analyst, I want to access basic team statistics and a simple match prediction to reference in quick articles or social media posts.

Screenshots/GIFs demonstrating the working MVP. #

Link to Demo

Updated API documentation. #

Link to updated API documentation

ML #

Research Objective #

It shows the main stages of work on the preparation of training data and the design of models and neural networks for classifying the outcomes of football matches. Also

Dataset Overview #

Data Sources Integration #

DatabaseContentRecords
European Soccer DatabaseMatch results, team/player attributes, leaguesPrimary source
World Soccer DatabaseBetting odds, complementary statisticsSecondary
FIFA Players DatasetDetailed player attributes, skills, market valuesPlayer data
International Football ResultsHistorical matches (1872-2025)Historical context

Dataset Characteristics #

MetricValue
Total Matches25,979
Features (Original)331
Features (Cleaned)296
Data Completeness75.5%
Complete Records19,603

Data Processing Pipeline #

Feature Categories #

CategoryFeaturesDescription
Match Information2Tournament stage, season encoding
Team Tactical Attributes16Build-up play, chance creation, defensive metrics
Betting Market Data3Bet365 odds (home/draw/away)
Player Statistics38Overall ratings, technical skills, physical attributes

Data Cleaning Results #

ProcessOriginalAfter Cleaning
Features331296
Missing columns removed-35
NaN records24,2176,376
Complete records1,76219,603

Machine Learning Models #

Model Architectures #

ModelArchitectureKey Parameters
MLP Neural NetworkInput β†’ Hidden(32) β†’ Hidden(16) β†’ Output(3)Dropout: 0.5/0.3, Early stopping
Logistic RegressionGrid Search optimizationC: [0.01-100], Solvers: lbfgs/saga

Training Configuration #

AspectMLPLogistic Regression
Class WeightingWeighted random samplingInverse frequency weights
ValidationEarly stopping (patience=10)8-fold cross-validation
OptimizationAdamW, weight decay=1e-4Grid search hyperparameters

Results and Performance #

Model Performance Comparison #

ModelAccuracyBest Hyperparameters
MLP Neural Network41%Hidden: 32/16, LR: 5e-4
Logistic Regression48%C=0.01, solver=β€˜lbfgs’

Detailed Classification Results #

OutcomeMLP PrecisionMLP RecallMLP F1LR PrecisionLR RecallLR F1
Home Win0.780.230.360.640.510.57
Away Win0.490.540.510.490.560.52
Draw0.280.590.380.300.360.33

Research Limitations and Future Directions #

Current Limitations #

AreaLimitationImpact
Data QualityMissing player lineups (some matches)Reduced feature completeness
Feature CoverageNo formation/tactical dataLimited tactical analysis
Real-time DataStatic historical datasetNo live match integration

Enhancement Opportunities #

CategoryImprovement
Data SourcesAPI integration for live data, weather conditions
FeaturesTeam formations, player injury status, referee statistics
ModelsEnsemble methods, deep learning architectures
DeploymentReal-time prediction pipeline, web interface

Conclusion #

Logistic regression showed 48% accuracy and provided more balanced forecasts than the neural network. The dataset provides a solid foundation for match analytics, although it needs to be improved in terms of completeness and updating the data in real time. Adding streaming data and using ensemble methods can improve the accuracy of models in the future.

logistic regression has been integrated into the product for MVP.


Weekly Commitments #

Individual Contribution of Each Participant #

  • Arina Zimina:

    • Final configuring a Django project with PostgreSQL.| Link to commit
    • Django REST Framework endpoints. | Add endpoints for GET and POST requests| Link to commit
    • Upload initial data in DB. | Create and upload initial data into database | Link to commit
  • Artem Panov:

  • Karina Siniatullina:

    • Tournament page: Add list of team components. | Implementation of Team model and components for teams that participate in the tournament | Link to commit
    • Tournament page: Add statistics tabs to commands. | Implementation of a statistics panel for each team with tabs that make statistics by groups | Link to commit
    • Tournament page: Connect the API from the backend. | Connecting to the backend via the API to display a list of the team and its statistics | Link to commit
    • Forecast: Add a tab for the forecast. | Implementation of a panel for predicting the result of a match | Link to commit
    • Forecast: Connect the API for the forecast. | Connecting to the server side via the API to output match result prediction | Link to commit
  • Egor Sergeev:

    • Report
    • Started working with DeepSeek API for future implementation
  • Egor Agapov:

    • Home Page: Next match tab | Implementation of the upper part of the homepage with major information about the upcoming match. | Link to commit
    • Home Page: List of upcoming matches. | An implementation of the bottom of the homepage with the four closest matches. | Link to commit
    • Home page: Connect the API from the backend. | Full connection of the home page with the backend. | Link to commit | Link to another commit

Plan for Next Week #

Sprint Goal #

Testing, CI/CD & Deployment Setup

Frontend #

Testing & Quality Assurance #

  • Unit Tests for React components

    - Write unit tests for the MatchCard component

    - Write unit tests for the TeamCard component

    - Write unit tests for PredictionPanel component

    - Write unit tests for UpcomingMatches component

    - Acceptance Criteria: Test coverage of all critical components using the Jest/React Testing Library

    - Assignee: Karina/Egor A

  • Integration of tests with API

    - Tests to verify the correctness of working with the endpoints API

    - Mock data for isolating tests

    - Error handling testing

    - Acceptance Criteria: All API calls are covered by integration

    by tests

    - Assignee: Karina/Egor A

  • E2E tests of the main user journey

    - Configure Cypress for E2E testing

    - Test: home page β†’ team selection β†’ getting a forecast

    - Test: navigation between pages

    - Acceptance Criteria: Basic user scenarios are covered by E2E tests

    - Assignee: Karina/Egor A

  • Setting up test coverage

    - Configure the generation of test coverage reports

    - Integration with CI pipeline

    - Acceptance Criteria: Coverage reports are generated automatically, the goal is 60%+

    - Assignee: Karina/Egor A

Backend #

API Testing & Data Integrity #

  • Unit tests for models

    - Tests for the Team model

    - Tests for the Match model

    - Tests for the Tournament model

    - Acceptance Criteria: All models are covered by unit tests, including validation rules

    - Assignee: Arina

  • Integration tests for the API

    - Tests for all GET endpoints

    - Tests for POST endpoints with validation

    - Testing with a real test database

    - Acceptance Criteria: All endpoints APIs work correctly with different data scenarios

    - Assignee: Arina

  • Middleware and utils tests

    - Test coverage of auxiliary functions

    - Custom middleware testing

    - Acceptance Criteria: Auxiliary functions are covered by tests with edge cases

    - Assignee: Arina

ML #

Model Validation & Testing #

  • ML Model Validation

    - Implementation of cross-validation to improve the reliability of the model

    - Create a confusion matrix for performance analysis

    - Add precision/recall/F1 metrics

    - Acceptance Criteria: The model is validated with detailed performance metrics

    - Assignee: Artem

  • Tests for ML endpoints

    - Unit tests for prediction endpoint

    - Tests for model loading and initialization

    - Error handling testing for incorrect data

    - Acceptance Criteria: ML API endpoints work stably and handle errors

    - Assignee: Artem

  • Validation of input data

    - Checking the correctness of the data before submitting it to the model

    - Processing of missing values

    - Validation of team IDs and tournament data

    - Acceptance Criteria: Robust input validation prevents model failures

    - Assignee: Artem

CI/CD & Deployment #

Automation & Infrastructure #

  • GitHub Actions CI Pipeline

    - Set up automatic push/PR tests

    - Configuration for frontend and backend tests

    - Setup test database for CI

    - Acceptance Criteria: All tests run automatically and block merge in case of errors

    - Assignee: Egor S

  • Setting up the Staging Environment

    - Deploying the application on Heroku staging

    - Configuration of environment variables

    - Setup production-like database

    - Acceptance Criteria: The working application is available at a public URL

    - Assignee: Arina

  • Environment Variables Management

    - Configure environment variables for staging/production

    - Secure storage of API keys and secrets

    - Acceptance Criteria: Confidential data is protected and properly configured

    - Assignee: Arina

  • Continuous Deployment (Bonus)

    - Automatic deployment when merging to main

    - Setup staging β†’ production pipeline

    - Acceptance Criteria: CD pipeline will automatically deposit upon successful merge

     - Assignee: Artem

Acceptance criteria for sprint: #

  • Test Coverage

  • CI Pipeline: Automatic tests are performed on every PR

  • Staging Environment: a working application is available at a public URL

  • Documentation: updated documentation on testing and deployment