Overview
One problem with getting started with ML nowadays is how daunting some of the ‘interesting’ things can be. The models are large, often too large for regular computers. This is a two-part blog on how one might work with a larger model in a manageable way. I’m on a Macbook Pro M2 with 16gb of RAM, which admittedly is quite powerful, but this method should be repeatable on most hardware. This should be able to serve as somewhat of an introduction to ML. This will be especially helpful if you have prior programming experience, but have yet to apply itto ML.
What this two part post series will cover:
- Formatting and loading data for training an ML model
- The general process of training an image detection ML model (using fine-tuning)
- How to use the model for inference/prediction after it’s trained on your pc
- How to deploy the model to a website so that anyone can use it easily
We will be using resnet18
as the example model in this demonstration, as well as PyTorch.
All the code for this and future parts can be found here.
Getting your PyTorch environment setup
We will assume you have installed: python
and virtualenv
on your computer.
- Setup your virtual environment
Note your virtualenv setup might differ based on your operating system:
$ virtualenv venv
$ source venv/bin/activate
- Install dependencies
$ (venv) pip3 install torch torchvision torchaudio
$ (venv) pip install matplotlib numpy onnx
We will be using matplotlib
for visualizations, and numpy
to help make the inference output easier to parse.
Getting the data
This is always the hardest part of building or training an ML system. In this case we will be using a freely available dataset found on kaggle here. This dataset has seven different types of things to recognize: bike
, car
, cat
, dog
, human
, horse
, flower
.
Organizing our workspace
We start by organization the data folder in a way that is structured like:
├── data
│ ├── bike
│ ├── cars
│ ├── cats
│ ├── dogs
│ ├── flowers
│ ├── horses
│ ├── humans
├── inference
│ ├── bike
│ ├── cars
│ ├── cats
│ ├── dogs
│ ├── flowers
│ ├── horses
│ ├── humans
├── train.py
├── inference.py
├── labels.py
├── export.py
├── venv/
The above dataset will unzip into data/
, I then manually selected 5-10 images from each folder and placed in a new inference/
directory maintaining the label names in sub-directories (bike, cars, cats, etc..). This will make sense later, but we use the inference/
data to test our inference script and qualitatively validate the fine-tuned output.
Finally, I created three python files at the root level: train.py
, inference.py
and labels.py
.
train.py
This is responsible for the actual training/fine-tuning of our resnet50
model. Here we will:
- Create the model and replace the output layer with one compatible with our image classification task
- Split our dataset from above into a
test
and atrain
set. - Pass our entire
train
set through the resnet50 model, calculating the loss and updating the final layer gradients to optimize our network. - Pass our entire
test
set through the resnet50 model to calculate the loss on the test set to validate for over/under fitting. - Repeat steps 3 and 4 for as many ’epochs’ as you’d like, in this blog we do it 15 times.
inference.py
We use this to demonstrate how to get results from our trained model, and qualitatively observe those results.
labels.py
This is a slim file just used to keep track of the mapping between the label string and index number.
export.py
This is another slim file which sole purpose is to export the PyTorch model to a format called ONNX
. Having it in ONNX
format allows us to host is on the internet and run in a browser context using onnxruntime-web
.
Finetuning the model
Now that we have our directory structure, dependencies and high level setup we can delve into the actual code part.
There are typically two ways to fine-tune a model. You start with a pre-trained model in both cases, but you either train all the weights, or just the last layer.
The benefit to training just the last layer is that it is much faster and (usually) easier on consumer grade hardware.
In the example here we will be training just the last layer of the resnet18
model. Normally it has 1000 classes, we are going to reduce that to just 7, the same ones from the dataset.
Step 1: Replace the pretrained model’s last layer with one compatible with our problem
from torchvision import models
from torchvision.models import ResNet18_Weights
model = models.resnet18(weights=ResNet18_Weights.DEFAULT)
print(model)
# This gives:
'''
ResNet(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2),
... [omitted middle content for clarity] ...
(fc): Linear(in_features=512, out_features=1000, bias=True)
)
'''
So we can see the last layer is: (fc): Linear(in_features=512, out_features=1000, bias=True)
.
We can replace the last layer in the model like so:
# Get input size
input_size = model.fc.in_features
#Pass input_size as well as 7 for the 7 classes we will be detecting
model.fc = nn.Linear(input_size, 7)
There we have it, a modified resnet18
model. This process is repeatable for most ML models, some might require different considerations in particular if they have multiple output layers.
Step 2: Load the data into a train and test set
PyTorch has some wonderful utilities for transforming and loading data, in this case we are using it for images. We also use a train_frac=0.8
in order to determine how much of our dataset is used for testing, and how much for training.
Important here is the transformation pipeline we create below. We perform random image flips as well as normalization. The normalization means and standard deviation were borrowed from imagenet’s. In practise you might be better off calculating them for your own image dataset, but that’s out of scope for this blog. Here’s more information on that if you want to read it link.
from torchvision import datasets, transforms
data_transforms = transforms.Compose([
transforms.RandomResizedCrop(input_size),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
image_dataset = datasets.ImageFolder(data_dir transform=data_transforms)
dataset_size = len(image_dataset.imgs)
print("{0} images in dataset".format(dataset_size))
indices = list(range(dataset_size))
split = int(math.floor(train_frac * dataset_size))
random.shuffle(indices)
test_indices, train_indices = indices[split:], indices[:split]
print("{0} images in training dataset".format(len(train_indices)))
print("{0} images in test dataset".format(len(test_indices)))
test_set = torch.utils.data.Subset(
image_dataset, test_indices,
)
train_set = torch.utils.data.Subset(
image_dataset, train_indices,
)
train_loader = torch.utils.data.DataLoader(
train_set, batch_size=batch_size
)
test_loader = torch.utils.data.DataLoader(
test_set, batch_size=batch_size,
)
dataloaders_dict = {
"train": train_loader,
"test": test_loader
}
Great, we have our data ready for training, now what?
Step 3: Create our training loop, updating only the final layer parameters
The whole code is available in the git repo, but for blog purposes has been separated out into more discrete training and inference portions.
Note where seen we use the following criterion and optimizer:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(params_to_update, lr=0.001, momentum=0.9)
Now onto the main training loop,
for epoch in range(num_epochs):
model.train()
for inputs, labels in dataloaders["train"]:
inputs = inputs.to(device)
labels = labels.to(device)
optimizer.zero_grad()
with torch.set_grad_enabled(True):
outputs = model(inputs)
loss = criterion(outputs, labels)
_, preds = torch.max(outputs, 1)
loss.backward()
optimizer.step()
Now the code to run validation against our test set. We use the performance to determine whether to save the new model as ‘best ever’ or disregard it.
for epoch in range(num_epochs):
#training code from above might go here
best_model_wts = None
model.eval()
for inputs, labels in dataloaders["test"]:
inputs = inputs.to(device)
labels = labels.to(device)
optimizer.zero_grad()
with torch.set_grad_enabled(False):
outputs = model(inputs)
loss = criterion(outputs, labels)
_, preds = torch.max(outputs, 1)
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss /
len(dataloaders[mode].dataset)
epoch_acc = running_corrects.double() /
len(dataloaders[mode].dataset)
# We want to save the model if it's the best ever performing one
if epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
Step 4:
Now that we have everything more or less in place, we can run the training and validation loop for as many epochs as we’d like, saving the best model result. In half-pseudo half-python it might look like this:
** don’t forget the full working code is in the git repo linked above
# We want to use the GPU if it's available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = get_model(num_classes=7)
# Send the model to the right device
model = model.to(device)
dataloaders = get_data()
# Setup the loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(params_to_update, lr=0.001, momentum=0.9)
# Train and evaluate
model, _ = train(
model, dataloaders, criterion,
optimizer, num_epochs=num_epochs
)
# Save the highest performing one (measured against test data set)
torch.save(model, "best.pt")
Thanks for reading, a reminder is the focus of this series is deploying to the web, not fine-tuning, supervised learning or anything in between. We are just setting up the stage for the next post (Part 2) which will have information how to deploy and use the model in a web or nodejs context (with demo!).
Inference and getting useful results is the focus of the next post.