Data Mining

CSC 503/SENG 474, Spring 2023

Lectures: Tuesdays, Wednesdays, and Fridays 11:30am - 12:20pm, MAC D288
Instructor: Nishant Mehta
TAs: Jonas Buro (<lastname>,
        Andrea Nguyen (t<lastname>
        Quan Nguyen (manhquan233 [at],

Labs: Fridays in ELW B215

Nishant's office hours: Wednesdays, 3pm - 5pm


           **Information about the Project**

What this course is about
This course is an introduction to Data Mining/Machine Learning, a sub-field of artificial intelligence that is all about how algorithms can use experience to improve their performance on tasks. This course will introduce you to many foundational machine learning methods and give you both a theoretical grounding as well as ample practical experience in implementing and using these methods on real data.
The objective of this course is to give students a foundation in machine learning, including important problems like classification, regression, clustering, and dimension reduction. The emphasis will be on understanding the design of various machine learning methods, learning how to use them in practice, and learning principled ways to evaluate their performance. The (optional) labs will complement the lecture topics by offering practical experience in experimenting with machine learning methods. The assignments will revolve around implementing machine learning algorithms and analyzing their results on data, with most of the emphasis on the analysis. Assignments might also involve some theoretical component (especially for graduate students).

In the schedule below, any information about future lectures is just a rough guide and might change.

Readings are required unless indicated as optional. The lectures supplement the readings, and to do well in this course (and learn machine learning) you should do the readings and attend the lectures. Some readings are marked as optional. In many cases, this is because they are more advanced, so please do not be frustrated if you have trouble understanding the material.

Date Topics Lectures and Assignments Reading
1/10 Introduction Lecture 1: slides (Mitchell) Chapter 1
(Murphy) Chapter 1 (optional)
1/11 Decision Trees I Lectures 2–3: slides (Mitchell) Chapter 3
1/13 Decision Trees II
1/17 Decision Trees and Random Forests Lecture 4: slides Random Forests chapter of ESL (optional) - reading guide
1/18 Neural Networks I: Intro Lectures 5–9: slides (Mitchell) Chapter 4
(Murphy) Chapter 13 (optional) - reading guide
1/20 Neural Networks II: Linear separators
1/24 Neural Networks III: Perceptron, Gradient descent, SGD
1/25 Neural Networks IV: Sigmoid units, Multi-layer networks, Backprop
1/27 Neural Networks V: Dealing with overfitting
1/31 SVMs I: Large margin separation Lectures 10–11: slides SVM tutorial - reading guide
2/1 SVMs II: Soft-margin SVM Andrew Ng's SVM lecture notes (optional)
2/3 Learning with kernels
Probability Review
Lecture 12: probability slides
2/7 Maximum Likelihood Estimation Estimating Probabilities: MLE and MAP
(Murphy) Chapter 4 (optional) - reading guide
2/8 MAP Estimation (including MDL) Lecture 14: slides
2/10 Midterm
2/14 Naive Bayes Lectures 15–16: slides Generative and Discriminative Classifiers:
Naive Bayes and Logistic Regression

(Murphy) Chapters 9 and 10 (optional) - reading guide
(Mitchell) Section 6.6: MDL Principle
2/15 Logistic Regression
2/17 Evaluating the performance of hypotheses and Model selection Lecture 17: slides (Mitchell) Chapter 5
Reading Break
2/28 No class because of "snow" (Mitchell) Chapter 7 (up to and including Section 7.4.3)
2/29 Learning Theory: PAC Learning Lecture 18: slides/notes
3/3 Learning Theory: Agnostic Learning Lecture 19: slides/notes
3/5 Learning Theory: VC Dimension (make-up lecture) Lecture 20: slides/notes, video (Spring 2021)
3/7 Instance-based Learning: k-NN and recommender systems Lecture 21: slides (Mitchell) Chapter 8 (Sections 8.1 and 8.2)
3/8 Instance-based Learning continued
3/10 Clustering I: K-means problem Lecture 23–24: slides
3/14 Clustering II: Hierarchical clustering
3/15 Gaussian mixture models and EM
Lectures 25–26: slides/notes (from Spring 2022) (Murphy, 2012) Chapter 11 - reading guide
3/17 Gaussian mixture models and EM continued Jupyter notebook for EM
3/21 Dimension Reduction/Feature Transformation: PCA I Lectures 27–28: slides/notes Jonathon Shlens's PCA tutorial (Sections I through V)
3/22 Dimension Reduction/Feature Transformation: PCA II Jupyter notebook for Eigenfaces
3/24 Dimension Reduction/Feature Transformation: ICA Lecture 29: slides
Jupyter notebook on statistical independence
3/28 Boosting I Lectures 30–31: slides Boosting survey (optional reading)
3/29 Boosting II
3/31 Fairness and Machine Learning Lecture 32: slides
4/4 Project Presentations (in class)
4/5 Project Presentations (in class)