CSCA 5632: Unsupervised Algorithms in Machine Learning

Get a head start on program admission

ÌýÌýPreview this courseÌýin the non-credit experience today!Ìý
Start working toward program admission and requirements right away.ÌýWork you complete in the non-credit experience will transfer to the for-credit experience when you upgrade and pay tuition. See How It Works for details.

Cross-listed with DTSA 5510

ÌýÌýImportant Update: Machine Learning Specialization ChangesÌýÌý

We are excited to inform you the current Machine Learning: Theory and Hands-On Practice with Python Specialization (taught by Professor Geena Kim) is being retired and will be replaced with a new and improved version (to be taught by Professor Daniel Acuna) that reflects the latest advancements in the field. The last opportunity to sign up for the current version will now be November 28, 2025. The new version will be available Spring 1, 2026.

Course Type: Breadth (MS-CS) Pathway|Breadth (MS-AI)
Specialization: Machine Learning: Theory & Hands-On Practice with Python
Instructor:ÌýDr. Geena Kim, Adjunct Professor of Computer Science
Prior knowledge needed:
  • Programming languages: Basic to intermediate level experience of Python, Jupyter Notebook
  • Math: Basic knowledge of Probability and Statistics, Linear Algebra
  • Technical requirements: Windows or Mac, Linux, Jupyter Notebook

Learning Outcomes

  • Explain what unsupervised learning is, and list methods used in unsupervised learning.
  • List and explain algorithms for various matrix factorization methods, and for what each is used.

Course Grading Policy

AssignmentPercentage of GradeAI Usage Policy
Week 1ÌýÌý
Week 1 Quiz6%Limited
Week 1 Peer Review PCA10%Limited
Week 2ÌýÌý
Week 2 Quiz6%Limited
Week 2 Peer Review: Clustering10%Limited
Week 3ÌýÌý
Week 3 Quiz6%Limited
Week 3 Programming Assignment: Recommender Systems10%Limited
Week 3 Peer Review: Recommender Systems2%Limited
Week 4ÌýÌý
Week 4 Quiz5%Limited
Week 4 Peer Review: NLP Disaster Tweets Kaggle Mini-Project15%Limited
Week 5ÌýÌý
CSCA 5632 Unsupervised Algorithms in Machine Learning Final Project30%Limited
Total100%Ìý

Course Content

Duration: 8 hours

Now that you have a solid foundation in Supervised Learning, we shift our attention to uncovering the hidden structure from unlabeled data. We will start with an introduction to Unsupervised Learning. In this course, the models no longer have labels to learn from. They need to make sense of the data from the observations themselves. This week we are diving into Principal Component Analysis, PCA, a foundational dimension reduction technique. When you first start learning this topic, it might not seem easy. There is undoubtedly some math involved in this section. However, PCA can be grasped conceptually, perhaps more readily than anticipated. In the Supervised Learning course, we struggled with the Curse of Dimensionality. This week, we will see how PCA can reduce the number of dimensions and improve classification/regression tasks. You will have reading, a quiz, and a Jupyter lab/Peer Review to implement the PCA algorithm. It's only the first week of the course, but we wanted to remind you that in Week 5, you will turn in a final Unsupervised Learning project on a topic of your choice. If you are joining us from the Supervised Learning course, the project will be a similar rubric and workflow to that final project. Since this course will move fast, it would be a good idea to look at the final project rubric this week (and upcoming course topics) and spend some time choosing a dataset and project topic.Ìý

Duration: 7Ìýhours

This week, we are working with clustering, one of the most popular unsupervised learning methods. Last week, we used PCA to find a low-dimensional representation of data. Clustering, on the other hand, finds subgroups among observations. We can get a meaningful intuition of the data structure or use a procedure like Cluster-then-predict. Clustering has several applications ranging from marketing customer segmentation and advertising, identifying similar movies/music, to genomics research and disease subtypes discovery. We will focus our efforts mainly on K-means clustering and hierarchical clustering, considering the benefits and disadvantages of both and the choice of metrics like distance or linkage. We have reading, a quiz, and a Jupyter notebook lab/Peer Review this week. Make sure that you are working on your final project this week. To stay on track, finalize your project topic and complete any EDA and preprocessing so that next week, you can focus on the central part of the project and your unsupervised learning models.

Duration: 7 hours

This week we are working with Recommender Systems. Websites like Netflix, Amazon, and YouTube will surface personalized recommendations for movies, items, or videos. This week, we explore Recommendation Engines' strategies to predict users' likes. We will consider popularity, content-based, and collaborative filtering approaches, and what similarity metrics to use. As we work with Recommendation Systems, there are challenges, like the time complexity of operations and sparse data. This week is relatively math dense. You will have a quiz wherein you will work with different similarity metric calculations. Give yourself time for this week's Jupyter notebook lab and consider performant implementations. The Peer Review section this week is short. Since this course is dense, please make sure that you are working on your final projects to turn in during Week 5.ÌýÌý

Duration: 13Ìýhours

We are already at the last week of course material! Get ready for another dense math week. Last week, we learned about Recommendation Systems. We used a Neighborhood Method of Collaborative Filtering, utilizing similarity measures. Latent Factor Models, including the popular Matrix Factorization (MF), can also be used for Collaborative Filtering. A 1999 publication in Nature made Non-negative Matrix Factorization extremely popular. MF has many applications, including image analysis, text mining/topic modeling, Recommender systems, audio signal separation, analytic chemistry, and gene expression analysis. This week, we focus on Singular Value Decomposition, Non-negative Matrix Factorization, and Approximation methods. We have reading, a quiz, and a Kaggle mini-project utilizing matrix factorization to categorize news articles. Next week is the due date for your final course project. Keep running experiments and working on the primary analysis for your final project. Ideally, it would be excellent to finish experimenting and iterate with your models this week so that next week, you can focus on preparing your final project deliverables.

Duration: 6.25Ìýhours

Final Exam Format: Peer reviewed project

For the final peer reviewed project, you will identify an Unsupervised Learning problem to perform EDA and model analysis. The project has 140 total points. In the instructions is a summary of the criteria you will use to guide your submission and review others’ submissions. You will submit three deliverables.

Notes

  • Cross-listed Courses: CoursesÌýthat are offered under two or more programs. Considered equivalent when evaluating progress toward degree requirements. You may not earn credit for more than one version of a cross-listed course.
  • Page Updates: This page is periodically updated. Course information on the Coursera platform supersedes the information on this page. Click theÌýView on CourseraÌýbuttonÌýabove for the most up-to-date information