Week #2 #

Week 2 - Choosing the Tech Stack, Designing the Architecture #

Tech Stack Selection #

Main AI pipeline: Python - PyTorch and NumPy - are main libraries that will be used.

Frontend:

React - JS framework for web pages;
CSS - Web page formatting and design;
Java Script - Dynamic behavior of web pages;

Backend:

FastAPI - API generation;
Node JS - Connection between database and backend;
Firebase - Main Database.

Architecture Design #

Component Breakdown: Let’s start with the core of our solution, AI pipeline:

Diffusion model (text-to-image);
Multi-view diffusion model (image-to-multiple images);
Neural Radiance Field (NeRF) model (multiple images-to-3D); Finally, the whole application pipeline will look like this:
User text is sent to the server;
Prompt is processed by the AI pipeline;
3D model visualized in the main page;
User may download the model from a server.

Data Management: The chosen database: Firebase. We will store any necessary user information here.
User Interface (UI) Design: The main page of our application (initial version):
Integration and APIs: As of now, we are not planning to use any external APIs in our application.
Scalability and Performance: NeRF is not “state-of-the-art” algorithm anymore due to increasing number of analogs (e.g. TensoRF) that work even faster. Right choice of the core database and dataflow will handle the increasing number of users and data volumes.
Security and Privacy: For the start, we are planning to use OAuth2 due to ease of implementation using FastAPI. Additional security measures will be integrated to ensure the security and privacy of our and users’ data.
Error Handling and Resilience: We will implement logging of all users’ actions, prompts that are written by users, and 3D meshes being generated by our application.
Deployment and DevOps: For a text-to-3D deep learning application, we will use Git for version control and set up CI/CD. This streamlined DevOps approach ensures efficient development and deployment.

Week 2 questionnaire: #

Tech Stack Resources: To gain more insight on how the model architectures in our AI pipeline should work, we are using the arxiv that contains any necessary research papers. These are the main research papers we will use to build our AI pipeline:

Mentorship Support: We already got the contact of a person who worked with NeRF architecture before. It there will be any technical questions related to the core pipeline, we will consider meeting (offline or online) with this person.
Exploring Alternative Resources: As we specified earlier, the main source of information for us - research papers. However, there might be a need to use the documentation related to our technology stack. Here are some links across the most used documentation:

Identifying Knowledge Gaps: The primary knowledge gaps identified by our team include:

Insufficient expertise in comprehending the mathematical components of scientific research;
Limited practical experience in implementing the architectural design appropriately;
Inadequate understanding of the operational principles of 3D models.

Engaging with the Tech Community: As of now, we have not engaged in any broader tech community to seek guidance and learn from experienced professionals in our tech stack. If there will be any need to expand our knowledge via meeting with specific groups of people, we will outline it in our future reports.
Learning Objectives: During this week we were focused on studying the following concepts:

Neural Radiance Fields;
Tensorial Radiance Fields;
Multi-Layer Perceptron (MLP) and shperical harmonics in neural rendering of 3D scenes;
Basics of Diffusion models;
Basics of ray tracing.

Sharing Knowledge with Peers: We incorporated the idea of daily meetings. It is something that we have taken from SCRUM methodology.
Leveraging AI: Every member in our team uses different LLMs (e.g. ChatGPT, YandexGPT) to minimize any knowledge gaps that may occur during the development process.

Tech Stack and Team Allocation #

Team Member	Track	Responsibilities
Arthur Gubaidullin	ML-engineer	Discovery of new research papers, reports
Amir Bikineyev	ML-engineer	Work with Multi-View Diffusion models
Makar Brednikov	ML-engineer	Diffusion model configuration
Dmitry Dydalin	ML-engineer	Work with Neural Rendering models
Nikita Borisov	ML-engineer	MLOps
Leonid Novikov	Backend	API creation
Denis Nesterov	Fullstack developer	Application design, Backend development

Weekly Progress Report #

During Week #2 we fomulated our main AI pipeline, defined the scope of the models and architectures that will be used in our project, drew the initial “sketch” of the primary user interface.

Challenges & Solutions #

The main challenge that we faced during the second week is understanding the main concept of all models in our AI pipeline.

To gain the sufficient knowledge, we read a few research papers that contain any necessary information.

Conclusions & Next Steps #

The most important thing to have during the next (third) week is some kind of working UI and Backend of our application. Additionally, we are planning to connect some models to form the initial (real) version or our AI pipeline.