Notes from the brilliant book, Deep learning with R, which you can buy -> https://www.manning.com/books/deep-learning-with-r
Deep learning is a specific subfield of machine learning, which has an emphasis on learning successive layers of increasingly meaningful representations; the deep aspect refers to this idea of successive layers of representations. The number of layers that contribute to a model of the data is called the depth of the model.
To understand and define deep learning, we first need an idea of what machine-learning algorithms do.
- A concise definition of artificial intelligence would be: the effort to automate intellectual tasks normally performed by humans.
- Machine learning arises from this question: could a computer go beyond "what we know how to order it to perform" and learn on its own how to perform a specified task?
- To perform machine learning, we need:
- Input data points
- Examples of the expected output
- A way to measure whether the algorithm is doing a good job; this measurement is used as a feedback signal to adjust the way the algorithm works, i.e. learning
Therefore, the central problem in machine learning and deep learning is to meaningfully transform data; in other words, to learn useful representations of the input data at hand - representations that get us closer to the expected output.
All machine-learning algorithms consist of automatically finding such transformations that turn data into more useful representations for a given task. These operations can be coordinate changes, or linear projections (which may destroy information), translations, nonlinear operations, etc.
The primary reason deep learning took off so quickly is that it offered better performance on many problems. In addition, it makes problem-solving much easier because it completely automates feature engineering.
- Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. Feature engineering is fundamental to the application of machine learning, and is both difficult and expensive. The need for manual feature engineering can be obviated by automated feature learning. https://en.wikipedia.org/wiki/Feature_engineering
In deep learning, these layered representations are almost always learning via models called neural networks, structured in literal layers stacked one after the other.
Deep learning is transformative because it allows a model to learn all layers of representations jointly. With joint feature learning, whenever the model adjusts one of its internal features, all other features that depend on it automatically adapt to the change, without human intervention. This allows for complex, abstract representations to be learned by breaking them down into long series of intermediate layers.
In summary, deep learning has several properties that justify its status as an AI revolution.
- Simplicity - Deep learning removes the need for feature engineering, replacing complex, brittle, engineering-heavy pipelines with simple, end-to-end trainable models that are typically bulit using only five or six different tensor operations
- Scalability - Deep learning is highly amenable to parallelisation on GPUs or TPUs, so it can take full advantage of Moore's law. In addition, deep-learning models are train by iterating over small batches of data, allowing them to be trained on datasets of arbitrary size.
- Versatility and reusability - Unlike many prior machine-learning approaches, deep-learning models can be trained on additional data without restarting from scratch, making them viable for continuous online learning - an important property for very large production models. Furthermore, trained deep-learning models are repurposable and thus reusable: for instance, it's possible to take a deep-learning model trained for image classification and use it for video-processing. This allows us to reinvest previous work into increasingly complex and powerful models.
Understanding deep learning requires familiarity with many simple mathematical concepts, such as tensors, tensor operations, differentiation, gradient descent, and others.