DQN to Double DQN | Jared Young

Overview

This artifact compares the original DQN breakthrough with the Double DQN correction. DQN shows that convolutional Q-networks can learn control directly from raw Atari frames, while Double DQN addresses the way DQN can overestimate action values when action selection and action evaluation use the same noisy estimates.

Key Features

Explains the Q-learning objective, Bellman target, bootstrapping, and Atari control setting.
Breaks down DQN architecture, frame stacking, replay memory, reward clipping, and pixel-based representation learning.
Shows why the DQN max operator can create overoptimistic value estimates.
Explains Double DQN's target correction by separating action selection and action evaluation.
Compares the two papers as a progression from feasibility to calibration and reliability.

Evidence

DQN to Double DQN slide deck title slide — Slide deck preview introducing DQN, Double DQN, and the Atari benchmark domain.

First page preview of the DQN to Double DQN report — Technical report with equations, method comparison, empirical evidence, and conclusion.

Interactive Slide Deck

DQN to Double DQN Slides

Download Slides PDF

Slide 1 of 13

The main technical takeaway is that deep reinforcement learning progress came from both better visual representations and better target design. DQN made pixel-based value learning practical; Double DQN made the same value learning more trustworthy.