Data Mining

CSC 503/SENG 474, Spring 2025

Lectures: Tuesdays, Wednesdays, and Fridays 11:30am - 12:20pm, BWC A104
Instructor: Nishant Mehta
TAs: Reia Drucker (<firstname><lastname> [at] uvic.ca)
        Yibo Liu (<last name><first name>97 [at] outlook.com)
        Ali Mortazavi (<firstname>themorty@gmail.com),
        Quan Nguyen (manhquan233 [at] gmail.com)

Labs: Wednesdays and Thursdays in ELW B215

Nishant's office hours: Wednesdays and Thursdays, 4pm - 5pm

Textbooks:

Machine Learning, Tom Mitchell
Probabilistic Machine Learning: An Introduction, Kevin Murphy

**Information about the Project**

What this course is about

This course is an introduction to Data Mining/Machine Learning, a sub-field of artificial intelligence that is all about how algorithms can use experience to improve their performance on tasks. This course will introduce you to many foundational machine learning methods and give you both a theoretical grounding as well as ample practical experience in implementing and using these methods on real data.
The objective of this course is to give students a foundation in machine learning, including important problems like classification, regression, clustering, and dimension reduction. The emphasis will be on understanding the design of various machine learning methods, learning how to use them in practice, and learning principled ways to evaluate their performance. The (optional) labs will complement the lecture topics by offering practical experience in experimenting with machine learning methods. The assignments will revolve around implementing machine learning algorithms and analyzing their results on data, with most of the emphasis on the analysis. Assignments might also involve some theoretical component (especially for graduate students).

In the schedule below, any information about future lectures is just a rough guide and might change.

Readings are required unless indicated as optional. The lectures supplement the readings, and to do well in this course (and learn machine learning) you should do the readings and attend the lectures. Some readings are marked as optional. In many cases, this is because they are more advanced; you are always welcome to ask the instructor questions about reading material, either via discussion forum (Ed Discussion; see Brightspace for the signup link) or office hours (or email if needed).

Lectures

Date	Topics	Lecture Slides/Notes	Reading

1/7	Introduction	Lecture 1: slides	(Mitchell) Chapter 1 (Murphy) Chapter 1 (optional)
1/8	Decision Trees I	Lecture 2: slides	(Mitchell) Chapter 3
1/10	Decision Trees II
1/14	Random Forests	Lecture 4: slides	Random Forests chapter of ESL (optional) - reading guide
1/15	Evaluation and Model Selection	Lecture 5: slides	(Mitchell) Chapter 5
1/17	Boosting	Lecture 6: slides	Boosting book - Chapter 1 (optional)
1/21	Neural Networks I: Intro	Lectures 7–11: slides	(Mitchell) Chapter 4 (Murphy) Chapter 13 (optional) - reading guide
1/22	Neural Networks II: Linear separators
1/24	Neural Networks III: Perceptron, Gradient descent, SGD
1/28	Neural Networks IV: Sigmoid units, Multi-layer networks, Backprop
1/29	Neural Networks V: Reducing overfitting
1/31	SVMs I: Large margin separation, Soft-margin SVM, learning with kernels	Lectures 12–14: slides	SVM tutorial - reading guide Andrew Ng's SVM lecture notes (optional)
2/4	Snow day ☃ - no class!
2/5	SVMs II: Soft-margin SVM, learning with kernels
2/7	Midterm
2/11	Learning with kernels
2/12	Probability Review Maximum Likelihood Estimation	Lecture 15: notes (Spring 2021)	Estimating Probabilities: MLE and MAP (Murphy) Chapter 4 (optional) - reading guide
2/14	MAP Estimation (including MDL)	Lecture 16: slides	(Mitchell) Section 6.6: MDL Principle
	*Reading Break*
2/25	MAP estimation continued		Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression (Murphy) Chapters 9 and 10 (optional) - reading guide
2/26	Naive Bayes and Logistic Regression	Lectures 18–19: slides
2/28	Naive Bayes and Logistic Regression continued
3/4	Clustering I: K-means problem	Lectures 20–21: slides
3/5	Clustering II: Hierarchical clustering
3/7	Instance-based Learning: k-NN and recommender systems	Lectures 22–23: slides	(Mitchell) Chapter 8 (Sections 8.1 and 8.2)
3/11	Instance-based Learning continued
3/12	Learning Theory I: PAC Learning	Lectures 24–26: slides	(Mitchell) Chapter 7 (up to and including Section 7.4.3)
3/14	Learning Theory II: Agnostic Learning
3/18	Learning Theory III: VC dimension
3/19	Gaussian mixture models and EM	Lectures 27–28: slides/notes Jupyter notebook for EM	(Murphy, 2012) Chapter 11 - reading guide
3/21	Gaussian mixture models and EM continued
3/25	Dimension Reduction/Feature Transformation: PCA I	Lectures 29–30: slides Jupyter notebook for Eigenfaces	Jonathon Shlens's PCA tutorial (Sections I through V)
3/26	Dimension Reduction/Feature Transformation: PCA II
3/28	Fairness and Machine Learning	Lecture 31: slides
4/1	Project Presentations (in class)
4/2	Project Presentations (in class)
4/4	Project Presentations (in class)