Week5

Week #5 #

Feedback #

Sessions #

User 1: Network Administrator at a small business

  • Feedback:

    Wanted real-time alerting instead of batch processing.

    Liked the file upload simplicity.

    Mentioned that interpreting the probability score was unclear — suggested using labels like “Low”, “Medium”, “High Threat”.

User 2: Cybersecurity Analyst (Freelancer)

  • Feedback:

    Liked the simplicity of the web interface.

    Wanted support for Mac syslogs or Apache logs (not just HDFS/BGL).

    Mentioned the need for visual analytics.

User 3: IT Support Technician (Kazan Federal University)

  • Feedback:

    Wanted command-line automation examples in the documentation.

    The colors were not quite well chosen, which makes the text difficult to see at times.

Analyze #

  • The highest priority issue was support for Mac os system log files.
  • The medium priority issues are visual analytics and inappropriate colors.
  • The lowest priority issues are interpretation of the probability estimate, command-line automation examples in the documentation, and real-time alerting support

Iteration & Refinement #

Implemented features based on feedback #

  • Added support for Mac os system log files
  • Changed user colors and user interface

Performance & Stability #

MetricDescriptionCurrent Status
Inference TimeTime from file upload to prediction~2s for avg log file (~300 lines)
Accuracy - F1 ScoreML model performance (on HDFS and BGL logs)Accuracy (HDFS: 98%, BGL: >99%), Precision (HDFS: 88%, BGL: >99%), Recall (HDFS: 36%, BGL: >99%), F1 Score (HDFS: 51%, BGL: >99%)

Documentation #

  • README.md file explains the idea of the application and the problem it solves, project setup, dependencies, and usage.

ML Model Refinement #

  • This week was implemented model for Mac OS system log files. Until Friday we preprocessed data and created parser for raw log files. Then we trained unsupervised Isolation Forest model and got distribution of anomaly degrees of each log line in training data. Based on it was chosen threshold = 0 for classifying anomaly. Next weeks we’ll add 1 or 2 models for other log file types (Windows/Hadoop) and improve HDFS model performance (need at least 70 for accuracy, precision, recall and f1-scroe).

Weekly commitments #

Individual contribution of each participant #

Paramon:

Bulat:

Plan for Next Week #

Next week we have plans to:

  • Make a new model and parser for Windows logs.
  • Test the system for any potential bugs.
  • Improve the web user inferface.
  • Improve HDFS model performance.

Confirmation of the code’s operability #

We confirm that the code in the main branch:

  • In working condition.
  • Run via docker-compose (or another alternative described in the README.md).