8 Neural Networks and Deep Learning 8.7 Social Impact 8.9 References and Further Reading

8.8 Review

•

Artificial neural networks are parametrized models for predictions, typically made of multiple layers of parameterized linear functions and nonlinear activation functions.
•

The output is typically a linear function for a real-valued prediction, with a sigmoid for a Boolean prediction, or with a softmax for a categorical prediction. Other outputs, such as sequences or structured predictions, use specialized methods.
•

Neural networks that use ReLU for all hidden units define piecewise linear functions if they have a linear output, or piecewise linear separators if they have a sigmoid output.
•

Backpropagation can be used for training parameters of differentiable (almost everywhere) functions.
•

Gradient descent is used to train by making steps proportional to the negation of the gradient; many variants improve the basic algorithm by adjusting the step size and adding momentum.
•

Convolutional neural networks apply learnable filters to multiple positions on a grid.
•

Recurrent neural networks can be used for sequences. An LSTM is a type of RNN that solves the vanishing gradients problem.
•

Attention for text is used to compute the expected embedding of words based on their relationship with other words. Attention is also used for speech, vision, and other tasks.
•

Transformers, using layers of linear transformation and attention, are the workhorse for modern language processing, computer vision, and biology.
•

Neural networks are used for generative AI, for the generation of images, text, code, molecules, and other structured output.
•

Neural networks are very successful for applications where there are large training sets, or where training data can be generated from a model.
•

It can be dangerous to make decisions based on data of dubious quality; large quantity and high quality are difficult to achieve together.

Artificial Intelligence 3E

8.8 Review

Artificial
Intelligence 3E