Reinforcement Learning
RL from first principles to deep policy gradients. Gymnasium environments, Q-learning, DQN, PPO, SAC, multi-agent systems, and applying RL to real-world problems.
FundamentalsTopics 1–10
- ·What is RL
- ·Agent and Environment
- ·Markov Decision Processes
- ·States and Actions
- ·Gymnasium Setup
- ·Exploration vs Exploitation
- ·Q-Learning
- ·SARSA
- ·Monte Carlo Methods
- ·First RL Program
Start Fundamentals →
IntermediateTopics 1–10
- ·Deep Q-Networks
- ·Experience Replay
- ·Target Networks
- ·Policy Gradient
- ·Actor-Critic Methods
- ·Advantage Functions
- ·PPO Overview
- ·Reward Shaping
- ·Observation Preprocessing
- ·Training Stability
Start Intermediate →
AdvancedTopics 1–10
- ·PPO in Depth
- ·Soft Actor-Critic
- ·Multi-Agent RL
- ·Hierarchical RL
- ·Model-Based RL
- ·RLHF
- ·Offline RL
- ·Curriculum Learning
- ·Sim-to-Real Transfer
- ·Evaluating RL Agents
Start Advanced →
AppliedTopics 1–10
- ·RL in Production
- ·Environment Design
- ·Reward Engineering
- ·Scaling RL Training
- ·RL for Recommendations
- ·RL for Robotics
- ·RL Safety & Alignment
- ·Evaluation & Benchmarking
- ·Debugging RL Systems
- ·RL Infrastructure
Start Applied →