Part 2: Deep Learning FPGA Acceleration with Python - 'Inference'

illustrations illustrations illustrations illustrations illustrations illustrations

Part 2: Deep Learning FPGA Acceleration with Python - 'Inference'

Published on May 06, 2023 by Dominik Kaukinen

post-thumb

Overview

In this we develop out our hardware acceleration library to handle training for an MLP model using the Fisher Iris dataset. We will go through each layer to determine what modifications are needed for training and gradient calculations.

The biggest difficulty I forsee is the softmax layer since large numbers and division won’t work well in our hardware contrained environment.

See the our last post for details on inference.

Designing the Ops for Inference

Linear Layer

ReLU Layer

Softmax Layer

I spent a lot of time here trying out different techniques. Ultimately, at a high level for this layer we want somethign which can:

  1. Calculate a probability distribution from a set of numbers
  2. Each value should be positive
  3. The sum of all values should be 1 (or some other known value like 255 in our case).

Recall a traditional softmax function might look like:


    def softmax(X):
        exps = np.exp(X - np.max(X))
        return exps / np.sum(exps)

I’m going to be converting it to the (Taylor series expansion)[https://en.wikipedia.org/wiki/Taylor_series] of order 2 which looks like:


    def softmax(X):
        s = sum([(1 + i + (0.5*(i**2))) for i in X])
        return [(((1 + i + (0.5*(i**2)))) / s) for i in X]

What we get is a function which is much easier to implement in hardware. It has the properties we want: values summing to 255 (in 8 bit math). Minus the division at the end, it also uses very easy operations involving additions and multiplication and raising to the power 2.

The division is a bit more tricky. We can use a lookup table to approximate the division. We can also use bit operators to perform it. I’m going to try the lookup table first and see how it goes.

Training an Equivalent Model with PyTorch

Since the focus of these posts isn’t PyTorch will be quick about this.

The model code used was:


    class IrisNet(nn.Module):
        # define nn
        def __init__(self):
            super(IrisNet, self).__init__()
            self.fc1 = nn.Linear(4, 30)
            self.fc2 = nn.Linear(30, 30)
            self.fc3 = nn.Linear(30, 3)
            self.softmax = nn.Softmax(dim=1)

        def forward(self, X):
            X = F.relu(self.fc1(X))
            X = self.fc2(X)
            X = F.relu(self.fc2(X))
            X = self.fc3(X)
            X = self.softmax(X)

            return X

We trained it on 80% of the Fisher Iris dataset for 1000 epochs and got the following results on the test set:


    # Accuracy  0.9833333333333333
    # Precision 0.9844961240310077
    # Recall    0.983739837398374

Unlike in the last post for inference, we will use these metrics to compare our hardware accelerated model to the PyTorch model.

The Model Re-Built in Our Libaray

Comparing the Results

Next Steps

We will be building out the training functionality for our library in the next post. This will include the backpropagation algorithm and the training loop. We will also be adding the ability to save and load models.

Resources