Overview
This artifact compares the original DQN breakthrough with the Double DQN correction. DQN shows that convolutional Q-networks can learn control directly from raw Atari frames, while Double DQN addresses the way DQN can overestimate action values when action selection and action evaluation use the same noisy estimates.
Key Features
- Explains the Q-learning objective, Bellman target, bootstrapping, and Atari control setting.
- Breaks down DQN architecture, frame stacking, replay memory, reward clipping, and pixel-based representation learning.
- Shows why the DQN max operator can create overoptimistic value estimates.
- Explains Double DQN's target correction by separating action selection and action evaluation.
- Compares the two papers as a progression from feasibility to calibration and reliability.
Evidence
Interactive Slide Deck
DQN to Double DQN Slides
The main technical takeaway is that deep reinforcement learning progress came from both better visual representations and better target design. DQN made pixel-based value learning practical; Double DQN made the same value learning more trustworthy.