Back to Projects

Pong Learn

Started December 10, 2025Updated April 28, 2026
Learning simulation

A browser simulation that turns Pong into a visible reinforcement-learning lesson: the paddle tries actions, receives rewards, updates memory, asks the viewer to predict the next policy choice, and then freezes the learned behavior for inspection.

ProductReinforcement learning lab
StatusEducational browser lab
StackJavaScript, Canvas, Q-learning, Node tests
View code
PongLearn lab showing a timed run, frozen policy demo metrics, current lesson, mastery path, and active recall check
The latest interface turns the dashboard into a small learning loop: watch the update, predict what memory should choose, then compare against the strongest visible value.
Q-learningVisible memoryDeepMind-inspired
Live training loop

The card animation comes from the actual PongLearn demo

The learner paddle moves, the ball state changes, and the side panel shows hit rate, exploration, learned states, and action values in the same moment.

Animated PongLearn demo
The GIF is useful on the project card because motion is the clearest way to communicate that the agent is learning through repeated play.

Pong Learn is now a browser lab for reinforcement learning. Instead of adding quiz questions to Pong, the project shows a paddle learning from experience while the interface explains what changed inside the policy.

The point is not to make Pong harder. The point is to make learning visible.

The left paddle is the learning agent. It observes a simplified game state, chooses up, hold, or down, receives a reward from the rally, and updates a stored action value. Over time, the hit rate, exploration level, learned states, memory tiles, Q-value cards, and policy map make that process visible.

The app starts with an educational walkthrough, then runs a timed training session. A viewer can choose a 30, 60, or 120 second run and watch the same loop repeat at a readable pace: observe the state, act, receive feedback, and update one remembered value.

Every rally becomes one small lesson: observe, act, receive reward, update memory, then test whether the policy makes sense.

The latest version adds an active recall check. Instead of only watching the dashboard, the viewer predicts whether memory should choose up, hold, or down, then compares that answer with the strongest visible Q-value. That small prompt turns the interface from passive observation into a retrieval-practice loop.

The training view also has a practical rhythm. The lesson card explains one reward update in plain language, the Next Lesson control jumps to the next useful feedback moment, accelerated practice compresses less interesting repetition, and the demo mode freezes learning so the user can see how the trained policy performs without more exploration.

PongLearn borrows the teaching principle from DeepMind-style reinforcement learning, then makes it small enough to inspect in a browser.

That framing matters because reinforcement learning is often explained with either abstract equations or finished demos. Pong Learn sits in the middle. It keeps the environment simple enough to understand while still showing the mechanics that make learning from reward work.

The project is inspired by the learning loop made famous by DeepMind's AlphaGo work: estimate useful actions, estimate future outcomes, improve from repeated experience, and then act from the improved policy. Pong Learn is not a recreation of AlphaGo. It uses tabular Q-learning instead of neural networks so the process is inspectable.