Correlation between Deep Learning Concepts and Philosophy
DL = Deep Learning
NN = Neural Network
P = Philosophy
Gradient descent
DL: Optimization algorithm used for training NN.
P: You can reach the place you want one step, in the right direction, at a time.
Keep weights small
DL: Helps NN learn new data faster
P: Keep your identity small so that you can adjust easily.
Local and Global Minima
DL: Local Minima means your NN has learned something but it is not optimal. We always aim for Global Minima.
P: Your views and opinions are not perfect. They are probably just your perspective.
Randomize Weights
DL: Helps you explore the landscape and avoids getting stuck in local minima.
P: Put yourself in random situations to learn from them.
Exponentially Weighted Average
DL: A type of average where more weightage is given to past numbers than the latest number.
P: Do you change your path completely if one outcome goes wrong? What about the last n outcomes that were right?
Metrics
DL: Used to check how well a NN has trained. Further training will be adjusted accordingly.
P: Use metrics to see what’s working and not working. Adjust plans accordingly.
Test Data and Testing
DL: Tests NN on unseen data.
P: Test assumptions/beliefs to see if they are true or not
Overfitting
DL: NN memorized only certain things completely and cannot variations of unseen things.
P: What you don’t yet know is more important than what you already know.
Here are some more from my good friend Jayesh
Batch size
DL: Number of items processed before the DL model is updated.
P: Try to understand different perspectives at any given time to avoid any biases
Regularisation
DL: Add penalty of complexity to complex model parameters to reduce overfitting.
P: Penalize overconfidence to reach the correct place. If we think of large weights as ego, we come to the famous stoic quote - Ego is the enemy - this can lead to dead neurons i.e. hampered learning/growth