Bring Your Own Model Guide

These are the steps for developers to write and modify code for machine learning models to be trained, tested, and validated according to Matrice.ai guidelines.

Step 0: Load a Sample Dataset, Run the Model, and Visualize Predictions (Recommended step to have a basic understanding)

To get started, you need to load a sample dataset, run a model (eg:ResNet) , and visualize the predictions. The complete code for these tasks using ResNet is available in a Jupyter notebook. Click here to view the notebook.

Tips : Use Google Colab or Kaggle if you do not have a GPU in your local machine.

Loading the Dataset

First, you need to load a sample dataset. Here is an example of how to load the CIFAR-10 dataset using PyTorch:

import torch
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

Running the Model

Next, you need to run the ResNet model on the loaded dataset:

import torch.nn as nn
import torch.optim as optim
import torchvision.models as models

net = models.resnet18(pretrained=True)
net.fc = nn.Linear(net.fc.in_features, 10)  # CIFAR-10 has 10 classes

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

for epoch in range(2):  # loop over the dataset multiple times
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        if i % 2000 == 1999:
            print(f'[Epoch: {epoch + 1}, Iter: {i + 1}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0

print('Finished Training')

Tips: Keep the number of epochs low when experimenting to save time.

Visualizing Predictions

Finally, you can visualize the predictions made by the model:

import matplotlib.pyplot as plt
import numpy as np

dataiter = iter(testloader)
images, labels = dataiter.next()

outputs = net(images)
_, predicted = torch.max(outputs, 1)

# function to show an image
def imshow(img):
    img = img / 2 + 0.5  # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()
    plt.save()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print('GroundTruth: ', ' '.join(f'{classes[labels[j]]}' for j in range(4)))
print('Predicted: ', ' '.join(f'{classes[predicted[j]]}' for j in range(4)))

Warning: Ensure that you have installed `matplotlib` and other required libraries before running the visualization code.

Step 1: Creating a `license.txt`

Before deploying your model, it’s essential to determine the appropriate license for your work. A license.txt file should be included in your project directory, outlining the licensing terms. Here are some common types of licenses you can use:

Open Source Licenses:
- MIT License: Permissive and allows for reuse, even in proprietary software.
- Apache 2.0 License: Similar to MIT, but with an explicit grant of patent rights.
- GPL License: Requires that any derivative work also be open source under the same license.
Tips: It is recommended to use open-source models as they allow for broader distribution and usage.
Commercial or Paid Licenses:
- Proprietary License: Restricts usage and distribution unless a license fee is paid.
- Custom Paid License: Created for specific use cases, usually involving a contractual agreement with the model creator.
Warning: If you use a commercial or paid model, you must have a valid license agreement. Ensure you document this in your `license.txt` file.

How to Create a `license.txt`

Identify the License: Choose the license type that fits your use case (open-source, commercial, or custom).
Write the License: Include the complete text of the chosen license in a file named license.txt.
Include Proper Attribution: If you are using a pre-trained model or any third-party libraries, make sure to include the appropriate attributions as required by the license.
Update Your Documentation: Mention the licensing terms in your project’s README file or documentation.

Here’s an example of a basic license.txt for an MIT license:

MIT License

Copyright (c) [Year] [Your Name or Company]

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
...

Once you have created the license.txt file, place it in the root of your project directory.

Step 2: Create the Config Files

These are the steps for developers to write and modify code for machine learning models to be trained, tested, and validated according to Matrice.ai guidelines. The BYOM integration process includes the following JSON file templates:

family_info.json: Defines the family information for the models.
train-config.json: Configuration information for training models.
export-config.json: Configuration information for exporting models.

These templates ensure all required fields are correctly filled according to specified formats and guidelines. Developers can modify these config JSON files according to their requirements for model training, evaluation, and export.

Tips: Make a copy of the original config files before making modifications to ensure you have a backup in case of errors.

For detailed instructions, please refer to the Config Files Guide file.

Step 3: Modify the Code for Matrice Integration

For the Matrice integration, you need to follow several steps to ensure your model is compatible with the platform. Detailed instructions are provided in the BYOM.md file. Here is an overview of the important steps:

Tips : It is always beneficial to have Docker to run the code as it will not lead to package conflicts. If not Docker , try to create a virtual environment.

Overview of Important Steps

Getting the actionTracker and job parameters
- Create an actionTracker and get all the configuration parameters for training your model using the action ID.
- Update the action status and get the required parameters for training.
Loading Dataset and Saving the Class Mapping
- Prepare the dataset in a standard format supported by the platform.
- Load the dataset and update the index-to-label mapping for later use.
Creating Model from Scratch or Checkpoint
- Create the model using the correct checkpoint if one exists, otherwise from scratch.
- Modify the last layer to match the number of classes in the dataset.
Model Training and Epoch Logging
- Create the training method using parameters from the model_config.
- Log epoch results using valid metrics for each epoch.
Saving the Best Model
- Save the model and its complete state using the actionTracker.
- Save necessary models for running evaluation, exporting, and deployment.
Running Evaluation Using Best Model
- Evaluate the best model on test and validation sets.
- Save the evaluation results.

Error: Ensure that the number of classes in the last layer matches the number of classes in your dataset to avoid dimension mismatch errors.

For detailed instructions, please refer to the Code Modifications Guide file.

Step 4: Test Using Testing Action Tracker

Testing with Testing Action Tracker

The code changes and configuration updates will now be tested using the TestingActionTracker. This tracker works locally without the need for an action ID, allowing seamless testing of the training, evaluation, and export code before deploying it to the development server. The TestingActionTracker requires the model family path , model info path of the configs that were created and action type (train , eval , deploy or export) to initiate testing.

Tips: Ensure all configurations and code modifications are correctly done before testing with the `TestingActionTracker` to avoid repetitive debugging.

Guidelines to use TestingActionTracker:

Before we run the tests using the Testing Action Tracker , we need to have the model configs created as specified in the steps above. It is recommneded that the configs be created in a folder called model_configs in your project directory.

It is also required to have a python file setup with the following code for seamless and easy testing. Changes maybe required depending on the folder path and the type of action that needs to be performed. This is the sample codecontaining all the run commands and users may simply copy and run this code changing only the folder path and run commands if required.

To test your configurations, run the following commands in the file depending on the action type:

run("train.py", model_family_info_path, model_info_path, train_config_path)

run("eval.py", model_family_info_path, model_info_path, "eval")

run("deploy.py", model_family_info_path, model_info_path, "deploy")

run("export.py", model_family_info_path, model_info_path, export_config_path)

Why is this a required step?

This approach ensures that all code changes function correctly and meet the necessary requirements before being pushed to the development server.

For detailed examples and implementation, refer to the Testing Action Tracker Code.

Also to check the entire code on how to run , refer to the Testing code.

Step 5: Add the Config Files Using BYOM

This step outlines how to add newly created configuration files for any model to the Matrice platform using BYOM. The process involves three main steps: adding the model family info, train config, export config, code base and run test cases and integrate the actions. please refer to the Add Models Guide file.

Summary

By following these steps, you will successfully add newly created config files for any model to the Matrice platform. This process ensures that your models are properly configured and available for deployment and use on the platform.

Follow the steps outlined above to integrate your model with Matrice and ensure smooth deployment and usage on the platform.