Week2

Week #2 #

Detailed Requirements Elaboration #

Detailed user stories with acceptance criteria for the MVP #

User story for coaches #

  • As a football team coach, I want to see basic statistics of any team and a simple match prediction between two selected teams so I can get a quick overview of potential opponents.

Acceptance Criteria #

  • On the team list page, you can select any two teams and get a forecast
  • When the team is expanded, basic statistics are displayed

User story for football fans #

  • As a football fan, I want to check the upcoming matches and see a simple win probability for any two selected teams so I can discuss possible outcomes with friends.

Acceptance Criteria #

  • The main page displays a list of upcoming matches
  • On the teams page, you can select two teams and see the percentage of each team’s probability of winning

User story for journalist and analyst #

  • As a sports journalist and analyst, I want to access basic team statistics and a simple match prediction to reference in quick articles or social media posts.

Acceptance Criteria #

  • User can select two teams and get a forecast
  • User can open any command and see the minimum statistics

Prioritized Backlog #

Project Specific Progress #

Design #

Layout of the main pages of the website and basic user flow diagrams for key interactions #

Frontend #

Frontend project structure #

frontend/              # React frontend
β”œβ”€β”€ src/               # Source files
β”‚   β”œβ”€β”€ pages/         # Application pages
β”‚   β”œβ”€β”€ components/    # Reusable components
β”‚   β”œβ”€β”€ api/           # API clients and requests
β”‚   β”œβ”€β”€ styles/        # Global styles and themes
β”‚   β”œβ”€β”€ utils/         # Helper functions
β”‚   β”œβ”€β”€ types/         # TypeScript types and interfaces
β”‚   β”œβ”€β”€ constants/     # Constants and configurations
β”‚   β”œβ”€β”€ assets/       # Static resources (images, icons)
β”‚   └── index.tsx      # Application entry point
β”œβ”€β”€ public/            # Static files
β”œβ”€β”€ package.json       # Node.js dependencies
β”œβ”€β”€ tsconfig.json      # TypeScript configuration
β”œβ”€β”€ tailwind.config.js # Tailwind CSS configuration
└── postcss.config.js  # PostCSS configuration

Skeleton components based on initial design #

Backend #

Initial API contract #

Initial database schema #

Each team can participate in multiple matches as both the home team (home_team) and the away team (away_team), which creates two one-to-many relationships between Team and Match. Additionally, each team can take part in multiple tournaments, and each tournament can include multiple teams, forming a many-to-many relationship between Team and Tournament.

Implement one or two basic, non-functional (dummy data) endpoints. #

ML #

Research Objective #

It shows the main stages of work on the preparation of training data and the design of models and neural networks for classifying the outcomes of football matches.

Dataset Overview #

Data Sources Integration #

DatabaseContentRecords
European Soccer DatabaseMatch results, team/player attributes, leaguesPrimary source
World Soccer DatabaseBetting odds, complementary statisticsSecondary
FIFA Players DatasetDetailed player attributes, skills, market valuesPlayer data
International Football ResultsHistorical matches (1872-2025)Historical context

Dataset Characteristics #

MetricValue
Total Matches25,979
Features (Original)331
Features (Cleaned)296
Data Completeness75.5%
Complete Records19,603

Data Processing Pipeline #

Feature Categories #

CategoryFeaturesDescription
Match Information2Tournament stage, season encoding
Team Tactical Attributes16Build-up play, chance creation, defensive metrics
Betting Market Data3Bet365 odds (home/draw/away)
Player Statistics38Overall ratings, technical skills, physical attributes

Data Cleaning Results #

ProcessOriginalAfter Cleaning
Features331296
Missing columns removed-35
NaN records24,2176,376
Complete records1,76219,603

Machine Learning Models #

Model Architectures #

ModelArchitectureKey Parameters
MLP Neural NetworkInput β†’ Hidden(32) β†’ Hidden(16) β†’ Output(3)Dropout: 0.5/0.3, Early stopping
Logistic RegressionGrid Search optimizationC: [0.01-100], Solvers: lbfgs/saga

Training Configuration #

AspectMLPLogistic Regression
Class WeightingWeighted random samplingInverse frequency weights
ValidationEarly stopping (patience=10)8-fold cross-validation
OptimizationAdamW, weight decay=1e-4Grid search hyperparameters

Results and Performance #

Model Performance Comparison #

ModelAccuracyBest Hyperparameters
MLP Neural Network41%Hidden: 32/16, LR: 5e-4
Logistic Regression49%C=0.01, solver=β€˜lbfgs’

Detailed Classification Results #

OutcomeMLP PrecisionMLP RecallMLP F1LR PrecisionLR RecallLR F1
Home Win0.780.230.360.640.510.57
Away Win0.490.540.510.490.560.52
Draw0.280.590.380.300.360.33

Research Limitations and Future Directions #

Current Limitations #

AreaLimitationImpact
Data QualityMissing player lineups (some matches)Reduced feature completeness
Feature CoverageNo formation/tactical dataLimited tactical analysis
Real-time DataStatic historical datasetNo live match integration

Enhancement Opportunities #

CategoryImprovement
Data SourcesAPI integration for live data, weather conditions
FeaturesTeam formations, player injury status, referee statistics
ModelsEnsemble methods, deep learning architectures
DeploymentReal-time prediction pipeline, web interface

Conclusion #

Logistic regression showed 49% accuracy and provided more balanced forecasts than the neural network. The dataset provides a solid foundation for match analytics, although it needs to be improved in terms of completeness and updating the data in real time. Adding streaming data and using ensemble methods can improve the accuracy of models in the future.


Individual Contribution of Each Participant #

  • Arina Zimina:

    • ERD implementation. | Django models created, migration files generated | Link to commit
    • DB design as ERD diagram. | Tournament, Match, and Team entities | Link to ERD
    • (Backend) Links to PRs/Issues for initial endpoints, link to API documentation. | Link to API doc
  • Artem Panov:

  • Karina Siniatullina:

    • Updated/detailed user stories with acceptance criteria.
    • Low-fi prototype(UX). | A low-fidelity prototype is a simplified and rough version of a product or design concept, used early in the design process to test ideas, gather feedback, and validate concepts. (UX) | Link to Figma.
    • Development of figma prototype based UI design preprocessing and Low-fi prototype. | A figma prototype based on low-fi prototype that displays user stories. | Link to Figma.
    • (Designers) Wireframes/mockups. | Clickable prototypes of the two main pages of the site | Link to Figma.
    • (Frontend) Links to PRs/Issues for skeleton pages/components. | Basic implemented skeletons of two pages | Link to commit
  • Egor Sergeev:

    • UI design preprocessing. | Choosing a color scheme for a page, fonts, and searching for references for a web page. | Link to Figma.
    • Development of figma prototype based UI design preprocessing and Low-fi prototype. | A figma prototype based on low-fi prototype that displays user stories. | Link to Figma.
    • (Designers) Wireframes/mockups. | Clickable prototypes of the two main pages of the site | Link to Figma.
  • Egor Agapov:

    • Report
    • User flow diagrams | Three main flow diagrams | Link to Figma.
    • Feature selection. | Selection of the most influential features based on football data. | Link to google disc
    • Selection of parameters for sorting data in tables. | Link to google disc
    • Selection of parameters for representation in tables. | Write list of parameters for representation in tables. | Link to google disc
    • Selection of filters in tables. | Write list of filters in tables. | Link to google disc

Weekly Commitments #

Plan for Next Week #

Frontend (React Components and Communication setup with the Backend) #

  • React project with TypeScript and Integration with Tailwind CSS
    • Home page:
    1. List of upcoming matches:
      • Make a tab with the nearest match
      • Make a component for upcoming matches
      • Make a list with upcoming matches
        • Fill the list with custom data
    • Tournament page:
      1. List of commands
        • Component for each team
      • The list of commands is custom
        • Add statistics tabs to commands
      • Add basic statistics
        • Connect the API from the backend
      1. Forecast
      • Add a checkbox to the command component
      • Add a tab for the forecast
      • Connect the API for the forecast

Backend #

  • Final configuring a Django project with PostgreSQL
  • Django REST Framework endpoints
    • Endpoints for frontend and ML
      • Connect the API for frontend
      • Connect the API for ML
  • Upload initial data in DB

ML #

  • Integration ML models with Django apps
    • ML project structure | Create files structure for Django ML app in structure of project
    • Store trained models | .pkl files in project
    • Create structure of API endpoints | Create one API endpoint for return prediction of the outcome of football match
    • Function of uploading models with project start
  • Research of articles on data preparation for training, ensemble approaches, decision tree approaches. (low priority on this week)
    • Summary of data preparation and ensemble methods