An Alchemist’s Notes on Deep Learning

An Alchemist’s Notes on Deep Learning#

I have recently had the opportunity to spend lots of time learning with the excuse of pursing a PhD. These Alchemist’s Notes are a byproduct of that process. Each page contains notes and ideas related broadly to deep learning, generative modelling, and practical engineering. I’ve actually been writing these things since 2016 on my personal website, but this site should be a more put-together version.

Who is this site for? For you, I hope. The ideal reader has at least an undergraduate-level understanding of machine learning, and is comfortable with Python. The rest you can figure out as you go.

What do the contents look like? The main goal of these notes is to provide definitions and examples. I have found these to be the most critical bits to convey when introducing new concepts. Each page will give a brief overview and example implementation, then we will mostly be answering common questions. Wherever possible, we will utilize concrete code examples. This website is compiled from a set of Jupyter notebooks, so you can go and play through the code on every page. (Click the ‘Colab’ button on the top-right).

What’s with ‘Alchemy’ in the name? In deep learning, we have not arrived at a unifying theory. What we do have are snippets of evidence and intutions, insights from mathematical foundations, and a rich body of literature and open-source code. Yet, it is an open question how all these ideas should come together. Deep learning is still in the alchemical age, and even well-tested techniques should be seen as a reference guide and not a ground-truth solution. Perhaps you will come to your own conclusions.

These notes are not finished. I am planning to continuously update it, but no guarantees that the content will stay on track. If you see any issues or have suggestions, send me a message at kvfrans@berkeley.edu, or submit an issue to the repo.

The Misc section contains pages that are rougher and unrefined. You should view them more as experimental notes rather than a thorough explanation. Don’t blame me if things are messy there.

More Resources#

This guide takes inspiration and information from many sources. Listed here are some great additional resources for learning.

Planned Table of Contents#

  • About

  • Numerical Toolkit (JAX)

    • Overview of JAX

    • Training neural nets with Flax

    • Logging

    • Checkpointing

  • Training Neural Networks

    • Neural Networks

    • Backpropagation

    • Optimizers

  • Building Blocks

    • Linear Layers

    • Activations

    • Convolutional Layers

    • Recurrent Networks

    • Attention

    • Residual Layers

    • The Transformer

  • Models

    • Classifiers (overfitting)

    • Variational Auto-Encoders

    • Generative Adversarial Networks

    • Diffusion Models

    • Autoregressive Models (LLMs)

    • Representation Learning

  • Practical Scaling

    • Structure of the GPU

    • Dataloading

    • Multi-GPU Parallelism

    • Mixed Precision

  • Reinforcement Learning (Maybe?)

    • Markov Decision Processes

    • Policy Gradients

    • Temporal Difference Learning

    • Offline RL

    • Goal-Conditioned RL

    • Model-Based RL

    • RL for LLMs

  • Maths

    • Probablistic Models

    • Matrices and Tensors