TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial…

Follow publication

Getting Started

The math behind Gradient Descent and Backpropagation

Enghin Omer
TDS Archive
Published in
15 min readNov 5, 2020
Image by author

Gradient Descent

Fig.1: f(x, y)=x²+y² (Image by author)

Directional derivative

Partial derivative of C with respect to x
The directional derivative of C in the direction of v
Fig.2 Steepest descent direction (Image by author)

Batch gradient descent

Stochastic Gradient Descent

Neural Networks

Fig.3 Sigmoid function (Image by author)
Fig.4 Neural network (Image by author)

Backpropagation

Cross-entropy

Image by author

Code Example

/**
* The constructor takes as input and array of integers
* representing the number of nodes in each layer
* */
public Network(int[] layerSizes) {
this.layerSizes = Arrays.copyOf(layerSizes, layerSizes.length);
this.layers = layerSizes.length;

//initialise biases
for (int i = 1; i<layerSizes.length; i++) {
biases.add(Nd4j.randn(layerSizes[i], 1));
}

//initialise weights
for (int i = 1; i<layerSizes.length; i++) {
weights.add(Nd4j.randn(layerSizes[i], layerSizes[i-1]));
}
}
/**
* Performs mini-batch gradient descent to train the network. If test data is provided
* it will print the performance of the network at each epoch
*
@param trainingData data to train the network
*
@param epochs number of epochs used to train the network
*
@param batchSize the size of batch
*
@param eta the learning rate
*
@param testData data to test the network
* */
public void SGD(DataSet trainingData, int epochs, int batchSize, double eta, DataSet testData) {
int testSize=0;
if (testData != null) {
testSize = testData.numExamples();
}
int trainingSize = trainingData.numExamples();
for (int i=0; i<epochs; i++) {
trainingData.shuffle();
for(int j=0; j<trainingSize; j+=batchSize) {
DataSet miniBatch = trainingData
.getRange(j,j+batchSize<trainingSize ? j+batchSize : trainingSize-1);
this.updateMiniBatch(miniBatch, eta);
}
if (testData != null) {
System.out.printf("Epoch %s: %d / %d ", i, this.evaluate(testData), testSize);
System.out.println();
}
}
}

/**
* Updates the weights un biases of the network using backpropagation for a single mini-batch
*
@param miniBatch the mini batch used to train the network
*
@param eta the learning rate
* */
public void updateMiniBatch(DataSet miniBatch, double eta) {
INDArray [] gradientBatchB = new INDArray[layers];
INDArray [] gradientBatchW = new INDArray[layers];
for (int i=0; i < this.biases.size(); i++) {
gradientBatchB[i+1] = Nd4j.zeros(this.biases.get(i).shape());
}
for (int i=0; i < this.weights.size(); i++) {
gradientBatchW[i+1] = Nd4j.zeros(this.weights.get(i).shape());
}
List<INDArray[]> result;
for(DataSet batch : miniBatch) {
result = this.backpropagation(batch.getFeatures(), batch.getLabels());
for(int i=1; i<layers; i++) {
gradientBatchB[i] = gradientBatchB[i]
.add(result.get(0)[i]);
gradientBatchW[i] = gradientBatchW[i]
.add(result.get(1)[i]);
}
}
for (int i=0; i<this.biases.size(); i++) {
INDArray b = this.biases.get(i).sub(gradientBatchB[i+1]
.mul(eta/miniBatch.numExamples()));
this.biases.set(i, b);
INDArray w = this.weights.get(i).sub(gradientBatchW[i+1]
.mul(eta/miniBatch.numExamples()));
this.weights.set(i, w);
}
}
//feedforward
INDArray activation = x;
INDArray [] activations = new INDArray[layers];
INDArray [] zs = new INDArray[layers];
activations[0] = x;
INDArray z;
for (int i=1; i<layers; i++) {
z = this.weights
.get(i-1).mmul(activation).add(this.biases.get(i-1));
zs[i] = z;
activation = sigmoid(z);
activations[i] = activation;
}
//back pass
INDArray sp;
INDArray delta = costDerivative(activations[layers-1], y).mul(sigmoidPrime(zs[layers-1]));
gradientB[layers - 1] = delta;
gradientW[layers - 1] = delta.mul(activations[layers2].transpose());
for (int i=2; i<layers; i++) {
z = zs[layers-i];
sp = sigmoidPrime(z);
delta = (this.weights
.get(layers - i).transpose().mmul(delta)).mul(sp);
gradientB[layers - i] = delta;
gradientW[layers - i] = delta
.mmul(activations[layers - i - 1].transpose());
}

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

TDS Archive
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Enghin Omer
Enghin Omer

Software Developer, passionate about programming, Java, Machine Learning and Math

No responses yet

Write a response