ECE 493 - Reinforcement Learning






NOTE: For the latest version of this course see

  • Contact:
  • Office: E5 4114
  • Office Hour: Regular online office hours on Skype for Business

Teaching Assistants:




The field of Artificial Intelligence intersects many areas of knowledge which engineers can utilize for building robust, dynamic systems in a world filled with large amounts of data yet also containing uncertainty and hidden information. In this course we focus from the ground up on the concepts and skills needed to build systems that can reason, learn and make decisions using probabilistic reasoning. This begins with reviewing concepts from Bayesian probability and learning how to perform probabilistic inference via approximate sampling methods such as MCMC. We will then look at how to use a probabilistic approach to optimize decisions based on data collected through experimentation or interaction with the environment. A basic form of this often used for A/B testing is Thompson Sampling for solving Multi-Armed Bandit (MAB) problems where probability distributions and decision making are combined in the simplest way. 

Reinforcement Learning (RL) is a much more general framework for decision making where we agents learn how to act from their environment without any prior knowledge of how the world works or possible outcomes. We will explore the classic definitions and algorithms for RL and see how it has been revolutionized in recent years through the use of Deep Learning. Recently, impressive AI algorithms have been demonstrated which combine all of these concepts along with Monte-Carlo Tree Search to learn to play video games (such as Star Craft) and board games (such as Go and chess) from scratch. More practical applications of these methods are used regularly in areas such as customer behaviour modelling, traffic control, automatic server configuration, autonomous driving and robotics.

Required Background

The course will use concepts from ECE 203 and ECE 307 on Bayesian Probability and Statistics, these will be reviewed but familiarity will help significantly. All other concepts needed for the course will be introduced directly. Examples, assignments and projects will depend on programming ability in Python.

Learning Objectives

This course complements other AI courses in ECE by focussing on the methods for representation and reasoning about uncertain knowledge for the purposes of analysis and decision making. At each stage of the course we will look at relevant applications of the methods being discussed. 

For example, in 2016 the AI program “AlphaGO” defeated human world class players of the game Go for the first time. This system requires many different methods to enable reasoning, probabilistic inference, planning and decision optimization. In this course we will build up the fundamental knowledge about these components and how they combine together to make such systems possible.

  • Explain, evaluate and implement Reinforcement Learning algorithms for given problem descriptions 

Weekly Schedule

  • Prof Office hour : Wednesdays 3-4pm (one-on-one or group, in LEARN Virtual Classroom and via Piazza Live Q&A)
  • Live Weekly Wrapup: Fridays 9:30am-10:30am (LEARN virtual classroom, or Webex if needed)
  • Schedule for Assignments and Quizzes: To be determined.


  1. Motivation and Context: Importance of reasoning and decision making about uncertainty
  2. Probabilistic Modelling (Bayesian vs Frequentist approaches, conditional probability rules, Bayes rule, expectation, variance, etc.) 
  3. Methods of approximate inference: marginal, Maximum a posteriori (MAP), Monte-Carlo Markov Chain (MCMC) estimation 
  4. Identifying generalization error, risk, regret 
  5. Probabilistic Graphical Models (PGMs) (Bayesian Networks, Markov Random Fields, Conditional Random Fields) 
  6. Probabilistic programming as an alternative approach to PGMs 
  7. Causation vs Correlation: how to model probabilistic causal relationships, relation to decision making 
  8. Bayesian Optimization (Upper Confidence Bounds, Multi-armed bandits, Thompson Sampling
  9. Decision making under uncertainty: Markov Decision Processes (MDPs), Influence Diagrams, Multi-armed bandits (MAB), Monte-Carlo Tree Search
  10. Basics of Neural Networks (training, back-propagation, gradient descent, regularization methods)
  11. Deep Learning (training methods, relevant architectures for Reinforcement Learning, fully connected feed forward networks)
  12. Reinforcement Learning (RL) (theory, Bellman equations, Value/Policy Iteration, TD methods, Q-learning, SARSA, policy gradients, actor-critic methods)
  13. Function approximation for RL (classic methods, Deep Learning)
  14. Deep RL : Deep Q- Networks (DQN), A3C, A2C, …

Grade Breakdown (TBD)


Assignments will be programming based with some theory quetsions included as part of the report. Programming will be in python using straight python and later using tensorflow or keras. Each assignment will build up to more complex simulated decision making domains as we go and utilizing the more powerful algorithms we cover in class.

  • Assignment 1: 15% 
  • Assignment 2: 15% 
  • Assignment 3: 15% 
  • Assignment 4: 15% 

Quizzes and Take Home Exams

  • 40% - some smaller quizzes spread across the term and a takehome exam at the end. Online multilpe choice/short answers from a test bank.


General University of Waterloo Guidelines:

Academic Integrity: In order to maintain a culture of academic integrity, members of the University of Waterloo community are expected to promote honesty, trust, fairness, respect and responsibility. Check for more information.

Grievance: A student who believes that a decision affecting some aspect of his/her university life has been unfair or unreasonable may have grounds for initiating a grievance. Read Policy 70, Student Petitions and Grievances, Section 4, Policies/policy70.htm. When in doubt please be certain to contact the departments administrative assistant who will provide further assistance.

Discipline: A student is expected to know what constitutes academic integrity—check http: // to avoid committing an academic offence, and to take responsibility for his/her actions. A student who is unsure whether an action constitutes an offence, or who needs help in learning how to avoid offences (e.g., plagiarism, cheating) or about rules for group work/collaboration should seek guidance from the course instructor, academic advisor, or the undergraduate Associate Dean. For information on categories of offences and types of penalties, students should refer to Policy 71, Student Discipline, For typical penalties check Guidelines for the Assessment of Penalties,

Appeals: A decision made or penalty imposed under Policy 70 (Student Petitions and Grievances) (other than a petition) or Policy 71 (Student Discipline) may be appealed if there is a ground. A student who believes he/she has a ground for an appeal should refer to Policy 72 (Student Appeals)

Note for Students with Disabilities: The Office for Persons with Disabilities (OPD), located in Needles Hall, Room 1132, collaborates with all academic departments to arrange appropriate accommodations for students with disabilities without compromising the academic integrity of the curriculum. If you require academic accommodations to lessen the impact of your disability, please register with the OPD at the beginning of each academic term.