Overview

One problem with getting started with ML nowadays is how daunting some of the ‘interesting’ things can be. The models are large, often too large for regular computers. This is a two-part blog on how one might work with a larger model in a manageable way. I’m on a Macbook Pro M2 with 16gb of RAM, which admittedly is quite powerful, but this method should be repeatable on most hardware. This should be able to serve as somewhat of an introduction to ML. This will be especially helpful if you have prior programming experience, but have yet to apply itto ML.

What this two part post series will cover:

Formatting and loading data for training an ML model
The general process of training an image detection ML model (using fine-tuning)
How to use the model for inference/prediction after it’s trained on your pc
How to deploy the model to a website so that anyone can use it easily

We will be using resnet18 as the example model in this demonstration, as well as PyTorch.

All the code for this and future parts can be found here.

Getting your PyTorch environment setup

We will assume you have installed: python and virtualenv on your computer.

Setup your virtual environment

Note your virtualenv setup might differ based on your operating system:

$ virtualenv venv
$ source venv/bin/activate

Install dependencies

$ (venv) pip3 install torch torchvision torchaudio
$ (venv) pip install matplotlib numpy onnx

We will be using matplotlib for visualizations, and numpy to help make the inference output easier to parse.

Getting the data

This is always the hardest part of building or training an ML system. In this case we will be using a freely available dataset found on kaggle here. This dataset has seven different types of things to recognize: bike, car, cat, dog, human, horse, flower.

Organizing our workspace

We start by organization the data folder in a way that is structured like:

├── data
│   ├── bike
│   ├── cars
│   ├── cats
│   ├── dogs
│   ├── flowers
│   ├── horses
│   ├── humans
├── inference
│   ├── bike
│   ├── cars
│   ├── cats
│   ├── dogs
│   ├── flowers
│   ├── horses
│   ├── humans
├── train.py
├── inference.py
├── labels.py
├── export.py
├── venv/

The above dataset will unzip into data/, I then manually selected 5-10 images from each folder and placed in a new inference/ directory maintaining the label names in sub-directories (bike, cars, cats, etc..). This will make sense later, but we use the inference/ data to test our inference script and qualitatively validate the fine-tuned output.

Finally, I created three python files at the root level: train.py, inference.py and labels.py.

train.py

This is responsible for the actual training/fine-tuning of our resnet50 model. Here we will:

Create the model and replace the output layer with one compatible with our image classification task
Split our dataset from above into a test and a train set.
Pass our entire train set through the resnet50 model, calculating the loss and updating the final layer gradients to optimize our network.
Pass our entire test set through the resnet50 model to calculate the loss on the test set to validate for over/under fitting.
Repeat steps 3 and 4 for as many ’epochs’ as you’d like, in this blog we do it 15 times.

inference.py

We use this to demonstrate how to get results from our trained model, and qualitatively observe those results.

labels.py

This is a slim file just used to keep track of the mapping between the label string and index number.

export.py

This is another slim file which sole purpose is to export the PyTorch model to a format called ONNX. Having it in ONNX format allows us to host is on the internet and run in a browser context using onnxruntime-web.

Finetuning the model

Now that we have our directory structure, dependencies and high level setup we can delve into the actual code part.

There are typically two ways to fine-tune a model. You start with a pre-trained model in both cases, but you either train all the weights, or just the last layer.

The benefit to training just the last layer is that it is much faster and (usually) easier on consumer grade hardware.

In the example here we will be training just the last layer of the resnet18 model. Normally it has 1000 classes, we are going to reduce that to just 7, the same ones from the dataset.

Step 1: Replace the pretrained model’s last layer with one compatible with our problem


    from torchvision import models
    from torchvision.models import ResNet18_Weights
    model = models.resnet18(weights=ResNet18_Weights.DEFAULT)
    print(model)
    # This gives:
    '''

    ResNet(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), 
      ... [omitted middle content for clarity] ...
      (fc): Linear(in_features=512, out_features=1000, bias=True)
    )

    '''

So we can see the last layer is: (fc): Linear(in_features=512, out_features=1000, bias=True).

We can replace the last layer in the model like so:

    # Get input size 
    input_size = model.fc.in_features
    #Pass input_size as well as 7 for the 7 classes we will be detecting
    model.fc = nn.Linear(input_size, 7)

There we have it, a modified resnet18 model. This process is repeatable for most ML models, some might require different considerations in particular if they have multiple output layers.

Step 2: Load the data into a train and test set

PyTorch has some wonderful utilities for transforming and loading data, in this case we are using it for images. We also use a train_frac=0.8 in order to determine how much of our dataset is used for testing, and how much for training.

Important here is the transformation pipeline we create below. We perform random image flips as well as normalization. The normalization means and standard deviation were borrowed from imagenet’s. In practise you might be better off calculating them for your own image dataset, but that’s out of scope for this blog. Here’s more information on that if you want to read it link.


    from torchvision import datasets, transforms

    data_transforms = transforms.Compose([
        transforms.RandomResizedCrop(input_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

    image_dataset = datasets.ImageFolder(data_dir transform=data_transforms)

    dataset_size = len(image_dataset.imgs)
    print("{0} images in dataset".format(dataset_size))
    indices = list(range(dataset_size))
    split = int(math.floor(train_frac * dataset_size))
    random.shuffle(indices)

    test_indices, train_indices = indices[split:], indices[:split]

    print("{0} images in training dataset".format(len(train_indices)))
    print("{0} images in test dataset".format(len(test_indices)))

    test_set = torch.utils.data.Subset(
        image_dataset, test_indices,
    )

    train_set = torch.utils.data.Subset(
        image_dataset, train_indices,
    )

    train_loader = torch.utils.data.DataLoader(
        train_set, batch_size=batch_size
    )

    test_loader = torch.utils.data.DataLoader(
        test_set, batch_size=batch_size,
    )

    dataloaders_dict = {
        "train": train_loader,
        "test": test_loader
    }

Great, we have our data ready for training, now what?

Step 3: Create our training loop, updating only the final layer parameters

The whole code is available in the git repo, but for blog purposes has been separated out into more discrete training and inference portions.

Note where seen we use the following criterion and optimizer:


criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(params_to_update, lr=0.001, momentum=0.9)

Now onto the main training loop,


    for epoch in range(num_epochs):
        model.train()

        for inputs, labels in dataloaders["train"]:
            inputs = inputs.to(device)
            labels = labels.to(device)

            optimizer.zero_grad()

            with torch.set_grad_enabled(True):
                outputs = model(inputs)
                loss = criterion(outputs, labels)
                _, preds = torch.max(outputs, 1)
                loss.backward()
                optimizer.step()

Now the code to run validation against our test set. We use the performance to determine whether to save the new model as ‘best ever’ or disregard it.

    for epoch in range(num_epochs): 

        #training code from above might go here

        best_model_wts = None
        model.eval()
        for inputs, labels in dataloaders["test"]:
            inputs = inputs.to(device)
            labels = labels.to(device)

            optimizer.zero_grad()

            with torch.set_grad_enabled(False):
                outputs = model(inputs)
                loss = criterion(outputs, labels)
                _, preds = torch.max(outputs, 1)

        running_loss += loss.item() * inputs.size(0)
        running_corrects += torch.sum(preds == labels.data)

        epoch_loss = running_loss / 
            len(dataloaders[mode].dataset)

        epoch_acc = running_corrects.double() / 
            len(dataloaders[mode].dataset)

        # We want to save the model if it's the best ever performing one
        if epoch_acc > best_acc:
            best_acc = epoch_acc
            best_model_wts = copy.deepcopy(model.state_dict())

Step 4:

Now that we have everything more or less in place, we can run the training and validation loop for as many epochs as we’d like, saving the best model result. In half-pseudo half-python it might look like this:

** don’t forget the full working code is in the git repo linked above


    # We want to use the GPU if it's available
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    model = get_model(num_classes=7)

    # Send the model to the right device
    model = model.to(device)
    dataloaders = get_data()

    # Setup the loss and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(params_to_update, lr=0.001, momentum=0.9)

    # Train and evaluate
    model, _ = train(
        model, dataloaders, criterion,
        optimizer, num_epochs=num_epochs
    )

    # Save the highest performing one (measured against test data set)
    torch.save(model, "best.pt")

Thanks for reading, a reminder is the focus of this series is deploying to the web, not fine-tuning, supervised learning or anything in between. We are just setting up the stage for the next post (Part 2) which will have information how to deploy and use the model in a web or nodejs context (with demo!).

Inference and getting useful results is the focus of the next post.

How to finetune a model then deploy it to the web: Part 1 - 'Training by finetuning'