foundations of computational agents
Artificial neural networks are parametrized models for predictions, typically made of multiple layers of parameterized linear functions and nonlinear activation functions.
The output is typically a linear function for a real-valued prediction, with a sigmoid for a Boolean prediction, or with a softmax for a categorical prediction. Other outputs, such as sequences or structured predictions, use specialized methods.
Neural networks that use ReLU for all hidden units define piecewise linear functions if they have a linear output, or piecewise linear separators if they have a sigmoid output.
Backpropagation can be used for training parameters of differentiable (almost everywhere) functions.
Gradient descent is used to train by making steps proportional to the negation of the gradient; many variants improve the basic algorithm by adjusting the step size and adding momentum.
Convolutional neural networks apply learnable filters to multiple positions on a grid.
Recurrent neural networks can be used for sequences. An LSTM is a type of RNN that solves the vanishing gradients problem.
Attention for text is used to compute the expected embedding of words based on their relationship with other words. Attention is also used for speech, vision, and other tasks.
Transformers, using layers of linear transformation and attention, are the workhorse for modern language processing, computer vision, and biology.
Neural networks are used for generative AI, for the generation of images, text, code, molecules, and other structured output.
Neural networks are very successful for applications where there are large training sets, or where training data can be generated from a model.
It can be dangerous to make decisions based on data of dubious quality; large quantity and high quality are difficult to achieve together.