B.2 Deep Learning

Table B.2 gives the defaults for two common Python-based deep learning frameworks, Keras [Chollet, 2021], a high-level interface to tensorflow, and PyTorch. For documentation on Keras, see https://keras.io. For documentation on PyTorch, see https://pytorch.org.

This Book Keras PyTorch
Algorithm Page Name Name Default Name Default
Dense 8.3 Dense Dense Linear
no units out_features
ni (implicit) in_features
update 8.3 update SGD SGD
η learning_rate 0.01 lr
momentum 8.2.1 α momentum 0 momentum 0
RMS-Prop 8.2.2 RMSprop RMSprop
η learning_rate 0.001 lr 0.01
ρ rho 0.9 alpha 0.99
ϵ epsilon 107 eps 108
Adam 8.2.3 Adam Adam
η learning_rate 0.001 lr 0.01
β1 beta_1 0.9 betas[0] 0.9
β2 beta_2 0.999 betas[1] 0.999
ϵ epsilon 107 eps 108
Dropout 8.7 Dropout Dropout Dropout
rate rate p 0.5
2D Conv 8.9 Conv2D Conv2D Conv2D
k kernel_size kernel_size
# output channels filters out_channels
# input channels (implicit) in_channels
Table B.2: Hyperparameters for two deep learning packages

In Keras and PyTorch, the optimizers are specified separately. The one corresponding to the update of Figure 8.9 is SGD (stochastic gradient descent). In both, momentum is a parameter of SGD.

In Keras, the number of input features is implicit, matching the output of the lower layer that the layer is connected to.

Our definition of RMS-Prop follows the original and Keras. In PyTorch, the RMS-Prop update has ϵ outside of the square root (Line 5 of the method update for RMP-Prop on page 5), similar to Adam.