Week5

Week #5 #

Feedback #

Sessions #

We conducted three feedback sessions with external users:

External userReviewMain problemOverall score
Dmitriy Mistrikov, Master studentThis solution is very convenient for testing RL, but I was shown a very short emergency stop wire, which needs to be made longerEmeregency Buttom wire4.5/5
Dmitriy Vizitei, Bachelor StudnetIt’s an interesting thing, but the only inconvenient thing is that the position is not in meters, but in some kind of ticks, and the force on the motor is also measured in parrots, not in N/m, and you have to add coefficients in the programUnits of measurement4.2/5
Yaroslav Gorbunov, Student of Tomsk Polytechnic UniversityGiven my limited experience in robotics (which I have none of), the most difficult and confusing part for me was determining the connection port, and I still can’t replicate it myselfAuto connect5/5

Analyze #

Key insights and actions:

FeedbackPriorityAction Taken
Solve problem with unitsMediumCreated issue to this problem
A longer wire is neededLowA better wire was ordered
Needs Auto connectMediumImplemented automatic connection

Iteration & Refinement #

Implemented features based on feedback #

  • A better wire was ordered
  • Implemented automatic connection

Performance & Stability #

The main measure of our solution’s performance is the communication frequency between the controller and the library, which is also considered the sampling frequency for the entire system.

Currently, the frequency: 100 Hz.

There are several main approaches for optimizing the solution (although this is not necessary at the moment):

  • Simplifying the transmitted data

  • Replacing the controller with a more powerful one

  • Simplifying the communication concept (moving the control function directly to the controller)

We use standard Python libraries for development and performance/stability:

  • PyTorch – For implementing and training the DQN agent
  • NumPy – For efficient numerical operations
  • Matplotlib (WIP) – Planned for visualizing training performance
  • serial / pyserial – For hardware communication with the motor controller

Basic logging and exception handling are implemented. Training is currently stable for hardware episodes up to 300+ iterations.

Documentation #

All documentation is hosted at:
📘 https://iu-capstone-project-2025.github.io/total_control/

The documentation contains:

  • Firmware Documentation – Auto-generated documentation for deep understnading how firmware works
  • API Reference – Auto-generated documentation for interacting with the system

API Reference Structure: #

  • LabDevice
    Base class for communication with lab hardware over serial interface. Provides methods for:

    • connect() / disconnect() for managing serial connection
    • Context manager support (__enter__ / __exit__)
  • CartPole (inherits from LabDevice)
    Interface for controlling the Cart-Pole hardware:

    • get_joint_state() – Returns position and velocity as a string
    • set_joint_efforts(effort) – Applies a force to the cart (input: int or str)
    • start_experimnet() – Begins data flow and operation mode
    • stop_experiment() – Gracefully stops the experiment
    • get_state() – [WIP: internal state snapshot method]

This API is actively used by the Reinforcement Learning integration and will be further expanded to include reward shaping and safety checks.

This approach to documentation is chosen because it represents the industry standard and provides an extremely convenient interface for automatically generating documentation based on properly written code with documentation.

ML Model Refinement #

We developed and integrated a Deep Q-Network (DQN) agent into the real CartPole system using the CartPole Python API. The RL agent observes position and velocity of the cart and applies motor effort as an action.

Improvements made:

  • Switched to a real-time environment using get_joint_state() and set_joint_efforts() methods
  • Refactored training loop to work asynchronously with hardware delays and serial communication
  • Normalized the input state and clipped motor outputs to avoid unsafe operations
  • Adjusted reward structure to favor longer balance time and penalize large accelerations

Planned refinements:

  • Add reward shaping for smoother convergence
  • Implement checkpoint saving/loading during training
  • Tune hyperparameters: learning rate, epsilon decay, and discount factor

Weekly commitments #

Individual contribution of each participant #

Anastasia - Library development

Evgenii - Board design and printing

Artyom - Add firmware auto-generated documentation

Petr - Write report, RL RnD

Marat - Port scan feature

Plan for Next Week #

  • Test RL
  • Test Controll
  • Fix code style and bags
  • Implement final features

Confirmation of the code’s operability #

We confirm that the code in the main branch:

  • In working condition.
  • Run via docker-compose (or another alternative described in the README.md).