Index – Deep Reinforcement Learning in Action

Index

[SYMBOL][A][B][C][D][E][F][G][H][I][J][K][L][M][N][O][P][Q][R][S][T][U][V][W][X][Y]

SYMBOL

@ operator
0-tensor (scaler)
1D Ising model
2D Ising model

A

absolute (nonrelational) attention models
action_repeats parameter
actions
  calculating probability of
  reinforcing
actions array
action-value function2nd
actor-critic methods
  advantages of
  combining value function with policy function
  distributed training
  N-step actor-critic
actor-critic model2nd
actor_loss function
ad placements with bandits
  actions
  contextual bandits
  rewards
  states
AddBackward function
additive attention
add_spots function
add_to_replay function
adjacency matrices
advantage actor-critic
afn parameter

agents
  defined
  interacting with environment
  multi-agent reinforcement learning
    1D Ising model
    2D Ising model
    mean field Q-learning
    mixed cooperative-competitive games
    neighborhood Q-learning
AGI (artificial general intelligence)
AlphaGo algorithm2nd3rd4th
AlphaZero algorithm
Amazon SageMaker
arange function
argmax function2nd
artificial general intelligence (AGI)
assigning credit
Atari2nd3rd4th5th
Atari Freeway
attention
  double Q-learning
  implementing self-attention for MNIST
    Einstein notation
    relational modules
    tensor contractions
    training relational modules
    transformed MNIST
  machine learning interpretability with biases
    equivariance
    invariance
  multi-head attention
  relational reasoning with
    attention models
    relational reasoning
    self-attention models
  training
    curriculum learning
    maximum entropy learning
  visualizing attention weights2nd
attention models
automatic differentiation
AvgPool operation

B

backpropagation2nd3rd4th
backward() method2nd

bandits
  contextual
  contextual, solving
  multi-arm bandit, solving
    epsilon-greedy strategy
    exploitation
    exploration
    softmax selection policy
  optimizing ad placements with
    actions
    rewards
    states
batch normalization (BatchNorm)
Bayesian framework
Bayesian inference
bell-curve distribution2nd
Bellman equation
biases2nd
  equivariance
  invariance
bimodal distribution
Boltzmann distribution
bootstrapping2nd3rd
Box2D
Breakout game2nd
breeding population

C

CartPole2nd3rd4th5th6th7th8th9th

catastrophic forgetting
  overview of
  preventing
category theory
center of mass
central processing units (CPUs)2nd
choose_arm(...) method
CNNs (convolutional neural networks)2nd3rd4th5th
coding, predictive
collections library
commutativity
computational graphs
ContextBandit class2nd
contextual bandits2nd
continuous distribution
continuous probability distribution
contractions
control tasks
convolutional filters
convolutional neural networks (CNNs)2nd3rd4th5th
cooperative-competitive games
CPUs (central processing units)2nd
credit, assigning
crossing
CUDA-enabled GPU
cum_rewards list
curiosity module
curiosity.
    See ICM.
curriculum learning

D

DA2C (distributed advantage actor-critic)2nd3rd
data, simulated
data-efficient algorithms
DDQN (double deep Q-network)
deductive reasoning
Deep Blue, IBM2nd
deep neural networks
deep reinforcement learning (DRL)2nd3rd
DeepMind2nd3rd4th5th
degenerate distribution2nd3rd
degenerate probability distribution2nd
deque2nd3rd
.detach() method
detach() method2nd
deterministic policy gradient (DPG)
differentiation
direct teaching method
discount factor
discrete distribution2nd3rd
display method
Dist-DQN (distributional deep Q-network)2nd3rd4th
  implementing
  on simulated data
dist_dqn function
distributed advantage actor-critic (DA2C)2nd3rd
distributed computing
distributed training
distribution of probabilities
distributional Bellman equation
distributional deep Q-network.
    See Dist-DQN.
distributional Q-learning
  Bellman equation
  comparing probability distributions
  Dist-DQN on simulated data
  implementing Dist-DQN
  probabilities
    distribution in Python
    expectation
    posteriors
    priors
    variance
  to play Atari Freeway
  weaknesses of
done variable2nd3rd4th
dot product2nd
double Q-learning
downscale_obs function
DP (dynamic programming)
DPG (deterministic policy gradient)
DQN (deep Q-networks)
  improving stability with target networks
  preventing catastrophic forgetting
  Q function
  Q-learning
    building networks
    discount factor
    Gridworld
    Gridworld game engine
    hyperparameters
    neural networks as Q function
    overview of
  relational.
    See also Dist-DQN.
DRL (deep reinforcement learning)2nd3rd
dynamic programming (DP)
dynamics, inverse

E

edge matrices
edges
efficiency of scaling
Einops package2nd
Einstein notation
ELU (exponential linear unit)
empowerment
encoder function
encoder model
entropy learning
enumerate function
env object
env variable
environment
environment, agents interacting with
env.step(action) method
episodic games2nd
epsilon-greedy policy2nd3rd4th5th6th
epsilon-greedy strategy
equivariance
ES (evolutionary strategies)2nd
evaluate_population function
evolutionary algorithms
  as scalable alternative
    communicating between nodes
    parallel vs. serial processing
    scaling
    scaling efficiency
    scaling gradient-based approaches
    scaling linearly
  genetic algorithm for CartPole
  in practice
  in theory
  pros and cons of
    exploration
    sample intensive
    simulators
  reinforcement learning with
evolutionary strategies (ES)2nd
expectation
expectation value
expected value
expected-value Q-learning
experience replay
  preventing catastrophic forgetting with
  prioritized
exploitation2nd

exploration
  curiosity-driven
    alternative intrinsic reward mechanisms
    ICM
    inverse dynamics prediction
    predictive coding for sparse rewards
    preprocessing
    setting up policy function
    setting up Q-networks
    setting up Super Mario Bros
  of evolutionary algorithms
  of multi-arm bandits
  of policy function
exponential linear unit (ELU)
extrinsic reward2nd

F

feature engineering
field of view (FOV)2nd3rd
fixed-size memory buffer
flattened parameter vectors
for loop2nd3rd
forward-model prediction error
forward-prediction model
FOV (field of view)2nd3rd
frames_per_state parameter
frameworks
Freeway game2nd3rd4th
future rewards

G

GANs (generative adversarial networks)2nd
gated recurrent unit (GRU)2nd
Gaussian distribution2nd3rd
generative adversarial networks (GANs)2nd
generative models
genetic algorithms2nd
gen_params function
.get() method
get_action function
get_batch(...) method
get_best_arm function2nd
get_coords function
get_mean_field function
get_neighbors function
get_reward function
get_reward_2d function
get_state() method
get_substate function
get_target_dist function
GNNs (graph neural networks)2nd
Go game2nd3rd
goal decomposition
Google Cloud
Google Colab
GPUs (graphics processing units)2nd
gradient-based approaches
gradient-based optimization
gradient-free algorithm
Grand Theft Auto game
graph neural networks (GNNs)2nd
graphs
Gridworld game2nd3rd4th5th6th7th8th9th10th11th12th
GRU (gated recurrent unit)2nd
GWagent.att_map

H

hierarchical reinforcement learning
hyperparameters2nd3rd4th
hyperthreading

I

ICM (intrinsic curiosity module)2nd
IL-Q (independent Q-learning)2nd
Indirect teaching method
inductive biases
inductive reasoning
infer_acts function2nd
info parameter
info variable
init_grid function
inner product2nd
instabilities
interpretability with biases
  equivariance
  invariance
intrinsic curiosity module (ICM)2nd
intrinsic rewards2nd3rd
invariance
inverse dynamics
inverse model

Ising model
  1D
  2D

J

.join() method
Jupyter Notebook

K

k-dimensional vector
key-query multiplication
keys
KL (Kullback-Leibler) divergence
k-tensor

L

LayerNorm layers
learned distributions

learning
  curriculum learning
  instabilities in
  maximum entropy learning
learning rate
likelihood ratio (LR)
linear scaling
linspace function
load_state_dict method
log probabilities
logarithms2nd
log_softmax function2nd3rd
long short-term memory (LSTM) network2nd
lookup table
loss function2nd3rd
lossfn function
LR (likelihood ratio)
LSTM (long short-term memory) network2nd

M

MAgent game2nd
makeMove method2nd
Markov decision processes
  building networks with PyTorch
    automatic differentiation
    building models
  Markov property
  optimizing ad placements with bandits
    actions
    contextual bandits
    rewards
    states
  policy function
  solving contextual bandits
  solving multi-arm bandits
    epsilon-greedy strategy
    exploitation
    exploration
    softmax selection policy
  string diagrams
  value function
MARL (multi-agent reinforcement learning)2nd3rd
mass, center of
master node
matplotlib scatter plot
matrix multiplication
max_episode_len parameter
maximum entropy learning
maxlen attribute2nd
MaxPool operation
MCTS (Monte Carlo tree search)
m-dimensional vector2nd
MDP (Markov decision process)2nd3rd4th5th
mean field approximation
mean field Q-learning
mean squared error (MSE)2nd
measure theory
memory tables
message passing
MF-Q (mean field Q-learning)2nd
MHDPA (multi-head dot product attention)
MI (mutual information)
MiniGrid library2nd
MiniGrid-DoorKey environment
min_progress parameter
mixed cooperative-competitive games

MNIST
  self-attention for
    Einstein notation
    relational modules
    tensor contractions
    training relational modules
  transformed
model-based planning
model-free learning

models
  building
  training
    backpropagating
    calculating future rewards
    calculating probability of actions
    loss function
Monte Carlo methods2nd3rd
Monte Carlo tree search (MCTS)
mp.cpu_count() function
MSE (mean squared error)2nd
MuJoCo
multi-agent reinforcement learning
  1D Ising model
  2D Ising model
  mean field Q-learning
  mixed cooperative-competitive games
  neighborhood Q-learning
multi-agent reinforcement learning (MARL)2nd3rd

multi-arm bandits
  epsilon-greedy strategy
  exploitation
  exploration
  softmax selection policy
  solving
multi-head attention
multi-head dot product attention (MHDPA)
multiprocessing library
multithreading
mutate function
mutation
mutual information (MI)
MXNet

N

n-armed bandit algorithm2nd
natural language processing (NLP)
n-dimensional vector
neighborhood Q-learning

networks
  building
  building with PyTorch
    automatic differentiation
    building models
Neural Network Layer
neural networks
  as policy function
  as Q function
  defined
  policy function using
    exploration
    stochastic policy gradient, 2nd.
    See policy networks; target networks.
    See policy networks; target networks.
next_generation function
NLP (natural language processing)
nn module2nd
node matrices
node-feature dimension

nodes
  communicating between
  graph neural networks and
  overview
nonparametric models
nonrelational (absolute) attention models
nonstationary problems
NO-OP (no-operation or do nothing)2nd3rd4th
normal distribution
NP (noun phrase)
NPCs (non-player characters)
np.expand_dims(...) function
np.random.choice function
N-step actor-critic
N-step learning
num_iter parameter

O

objectives, defining
obs array
ologs (ontological logs)
one-hot vectors2nd3rd4th
online training
OpenAI Gym2nd3rd4th
  API for
  CartPole
optimal policy
options framework
outcomes array
outer product operation

P

parallel computations
parallel processing
parameter vector
parametric methods
parametric models2nd3rd
params dictionary
partial observability
PDF (probability density function)
PE (prediction errors)2nd3rd
planning, model-based
PMF (probability mass function)
policies
policy function
  neural networks as
  optimal policy
  setting up
  using neural networks
    exploration
    stochastic policy gradient
  with value function
policy gradient function2nd
policy gradient methods
  OpenAI Gym
    CartPole
    OpenAI Gym API
  policy function using neural networks
    exploration
    neural networks as policy function
    stochastic policy gradient
  policy gradient algorithm
    assigning credit
    defining objectives
    log probabilities
    reinforcing actions
  REINFORCE algorithm
    agents interacting with environment
    creating policy networks
    full training loop
    training models
policy networks2nd3rd
posterior probability distribution
posteriors
PPO (proximal policy optimization)2nd
Pr(A) function
pred array
prediction errors (PE)2nd3rd
predictive coding
prepare_images function2nd
prepare_initial_state(...) function
prepare_multi_state function
prepare_state function
preprocessing
prior probability distribution
prioritized replay
priors
probabilities
  distribution of
    comparing
    computing expected value from
    in Python
  expectation
  of actions
  posteriors
  priors
  variance
probability density
probability density function (PDF)
probability mass function (PMF)
probability theory
probs array2nd
proximal policy optimization.
    See PPO.
Python Gym library
Python, representing probability distribution in
PyTorch
  automatic differentiation
  building models

Q

Q function2nd3rd4th5th6th7th8th9th10th11th12th
q vector
Q-learning2nd
  building networks
  discount factor
  double
  Gridworld
  Gridworld game engine
  hyperparameters
  mean field
  neighborhood
  neural networks as Q function
  overview of
  weaknesses of
Q-networks
quantile regression
queries2nd

R

random module
random parameter vectors
random variable
rearrange function2nd
recombine function
recombining
record array2nd
rectified linear units (ReLU)2nd3rd4th5th6th7th
recurrent neural networks (RNNs)2nd3rd4th
reduce function
regression
regularized models
REINFORCE algorithm2nd3rd
  agents interacting with environment
  creating policy networks
  full training loop
  training models
    backpropagating
    calculating future rewards
    calculating probability of actions
    loss function
reinforcement learning2nd
  deep reinforcement learning2nd
  dynamic programming versus Monte Carlo
  framework for
  future of
  hierarchical
  multi-agent
    1D Ising model
    2D Ising model
    mean field Q-learning
    mixed cooperative-competitive games
    neighborhood Q-learning
  string diagrams
  uses for
reinforcement of actions2nd
relational block
relational DQN
relational models
  double Q-learning
  machine learning interpretability with biases
    equivariance
    invariance
  relational DQN
  relational reasoning with attention
    attention models
    relational reasoning
    self-attention models
  self-attention for MNIST
    Einstein notation
    relational modules
    tensor contractions
    training relational modules
    transformed MNIST
  training
    curriculum learning
    maximum entropy learning
relational modules2nd
relational reasoning2nd
Rel-DQN
ReLU (rectified linear units)2nd3rd4th5th6th7th
replay buffers
requires_grad attribute
requires_grad=True argument
reset method
reshape method
resize function
return
rewards2nd3rd4th
  future
  intrinsic
  sparse
RGBA value
RL algorithms2nd3rd4th5th
RNNs (recurrent neural networks)2nd3rd4th
run_episode function2nd

S

s vector
SageMaker, Amazon
sample method
sample space
SAMs (self-attention model)2nd3rd4th5th
scaler (0-tensor)

scaling
  evolutionary algorithms
    communicating between nodes
    efficiency of
    parallel vs. serial processing
  gradient-based approaches
  linearly

self-attention
  for MNIST
    Einstein notation
    relational modules
    tensor contractions
    training relational modules
    transformed MNIST
  models
self-attention model (SAMs)2nd3rd4th5th
SequenceMatcher module
serial processing
serially running programs
shared indices
sigma squared
simulated data
simulators
soft attention
softmax function2nd3rd4th5th
softmax selection policy
sparse rewards
spin property
square function2nd
squeeze(...) method
stability
.start() method
state spaces2nd3rd
state vector
state1_hat tensor
state2_hat tensor
state-action values
state_dict() method
states2nd3rd4th
states array
state-value function
stationary environments
stationary problem
step() method2nd3rd
stochastic policy gradient
string diagrams2nd
subspace Q-learning
Super Mario Bros game
supervised learning

T

tanh function2nd
target function
target networks
team_step function
temperature parameter
tensor contractions
tensor order
TensorBoard
TensorFlow
tensors
terminal state
.terminate() method
test_model function
theta vector
torch optimizer
torch.einsum function
torch.no_grad() method
total return
tournament-style selection
train function2nd
training
  curriculum learning
  distributed training
  maximum entropy learning
  models
    backpropagating
    calculating future rewards
    calculating probability of actions
    loss function
  relational modules
training loop for REINFORCE
training procedure
traits
transformed MNIST
transition probability
tree-search algorithm

U

unpacked_params function
unsqueeze(...) method
.unsqueeze(dim=) method
update_dist function2nd
update_dist(z, reward) function
update_params function2nd
update_rate function
update_replay function
use_extrinsic=False function
use_extrinsic=True function
utility function

V

value distribution
value function2nd3rd4th
value-matrix multiplication
values
vanilla gradient descent2nd
variance2nd
visualizing attention weights2nd
VP (verb phrase)

W

weighted average
weighted log-likelihood ratio
while loop2nd3rd
wiring diagrams
worker function2nd
worker nodes

X

x_pos key

Y

y_correct variable
y.detach() method