Correlation between Deep Learning Concepts and Philosophy

DL = Deep Learning

NN = Neural Network

P = Philosophy

Gradient descent

DL: Optimization algorithm used for training NN.

P: You can reach the place you want one step, in the right direction, at a time.

Keep weights small

DL: Helps NN learn new data faster

P: Keep your identity small so that you can adjust easily.

Local and Global Minima

DL: Local Minima means your NN has learned something but it is not optimal. We always aim for Global Minima.

P: Your views and opinions are not perfect. They are probably just your perspective.

Randomize Weights

DL: Helps you explore the landscape and avoids getting stuck in local minima.

P: Put yourself in random situations to learn from them.

Exponentially Weighted Average

DL: A type of average where more weightage is given to past numbers than the latest number.

P: Do you change your path completely if one outcome goes wrong? What about the last n outcomes that were right?

Metrics

DL: Used to check how well a NN has trained. Further training will be adjusted accordingly.

P: Use metrics to see what’s working and not working. Adjust plans accordingly.

Test Data and Testing

DL: Tests NN on unseen data.

P: Test assumptions/beliefs to see if they are true or not

Overfitting

DL: NN memorized only certain things completely and cannot variations of unseen things.

P: What you don’t yet know is more important than what you already know.

Here are some more from my good friend Jayesh

Source: LinkedIn

Batch size

DL: Number of items processed before the DL model is updated.

P: Try to understand different perspectives at any given time to avoid any biases

Regularisation

DL: Add penalty of complexity to complex model parameters to reduce overfitting.

P: Penalize overconfidence to reach the correct place. If we think of large weights as ego, we come to the famous stoic quote - Ego is the enemy - this can lead to dead neurons i.e. hampered learning/growth