CSCA 5912: Deep Reinforcement Learning: From Theory to Practice

Get a head start on program admission

ÌýÌýPreview this courseÌýin the non-credit experience today!Ìý
Start working toward program admission and requirements right away.ÌýWork you complete in the non-credit experience will transfer to the for-credit experience when you upgrade and pay tuition. See How It Works for details.

Course Type: MS-AI Breadth, MS-CS Elective

Specialization: Reinforcement Learning

Instructor:ÌýDr. Ashutosh Trivedi, Associate Professor of Computer Science

Prior knowledge needed: TBD

Learning Outcomes

  • Explain how neural-network-based function approximation extends reinforcement learning beyond finite tabular settings.
  • Implement and evaluate value-based deep reinforcement learning algorithms, including Deep Q-Networks and stabilizing techniques such as replay buffers and target networks.
  • Derive and implement policy-gradient methods, including REINFORCE, baselines, and advantage-based updates.
  • Explain and analyze actor-critic methods that combine policy optimization with value estimation.
  • Compare deep reinforcement learning algorithms in terms of stability, scalability, sample efficiency, and suitability for different decision-making tasks. Ìý Ìý

Course Grading Policy

AssessmentPercentage of GradeAI Usage Policy
Quizzes (5)50% (10% each)Conditional
Final Exam50%Conditional

Course Content

Duration: TBD

This module introduces function approximation as the transition point from tabular reinforcement learning to deep reinforcement learning. The central message is that deep RL is not merely supervised learning applied to RL data: the targets are noisy, bootstrapped, policy-dependent, and often moving as the parameters change.Ìý

Duration: TBD

This module develops value-based deep reinforcement learning as bootstrapped regression. Learners study fitted value iteration, understand why approximation can break contraction arguments, and then study DQN and its stabilizers: replay buffers, target networks, double DQN, dueling networks, and prioritized replay.Ìý

Duration: TBD

This module introduces direct policy optimization. The main idea is to optimize a parameterized policy by estimating the gradient of expected return from sampled trajectories. Learners derive the likelihood-ratio estimator, understand causality and baselines, implement REINFORCE, and then move to actor-critic methods.Ìý

Duration: TBD

This module surveys modern deep reinforcement learning algorithms through the lens of stability, exploration, and continuous control. Learners study PPO as a conservative policy-gradient method, DDPG as deterministic actor-critic for continuous control, and SAC as entropy regularized actor-critic.Ìý

Duration: TBD

This module has two lessons. The first focuses on stable policy updates through trust-region ideas and PPO clipping. The second focuses on continuous control and entropy-regularized learning through DDPG and SAC.

Notes

  • Cross-listed Courses: CoursesÌýthat are offered under two or more programs. Considered equivalent when evaluating progress toward degree requirements. You may not earn credit for more than one version of a cross-listed course.
  • Page Updates: This page is periodically updated. Course information on the Coursera platform supersedes the information on this page. Click theÌýView on CourseraÌýbuttonÌýabove for the most up-to-date information.