DL = Deep Learning

NN = Neural Network

P = Philosophy

Gradient descent

DL: Optimization algorithm used for training NN.

P: You can reach the place you want one step, in the right direction, at a time.

Keep weights small

DL: Helps NN learn new data faster

P: Keep your identity small so that you can adjust easily.

Local and Global Minima

DL: Local Minima means your NN has learned something but it is not optimal. We always aim for Global Minima.

P: Your views and opinions are not perfect. They are probably just your perspective.

Randomize Weights

DL: Helps you explore the landscape and avoids getting stuck in local minima.

P: Put yourself in random situations to learn from it.

Exponentially Weighted Average

DL: A type of average where more weightage is given to past numbers than latest number.

P: Do you change your path completely if one outcome goes wrong? What about the last n outcomes that were right.


DL: Used to check how well a NN has trained. Further training will be adjusted accordingly.

P: Use metrics to see what’s working and not working. Adjust plans accordingly.

Test Data and Testing

DL: Tests NN on unseen data.

P: Test assumption/beliefs to see if they are true or not


DL: NN memorized only certain things completely and cannot variations of unseen things.

P: What you don’t yet know is more important that what you already know.