Data Mining

Group Project - Formal Proposal

When is it due? Monday, October 16th at 11:59pm.

How many points is it worth? It's worth 16% of your project grade (or 4% of your final grade).

The full instructions will be posted to Brightspace soon, but they will be approximately the same as the ones below.

You'll be submitting a (concise!) 3 to 4 page formal proposal (single spaced), with the following 4 sections:

  1. The problem: You should concisely describe the problem you plan to tackle, including an explanation of why this problem is important and why your project is interesting (the latter two are likely related, so your explanation might cover both together). As part of this description, describe the datasets you will use (or, if datasets don't exist, explain how you plan to get/generate data). Regarding datasets, be clear about risks in obtaining data (might you fail to obtain it?), and talk about backup data sources in case there is a nontrivial chance that your originally desired data is unobtainable.
  2. Goals: How will you measure success? In this course, this almost always should be something quantifiable. Talk about the different results your project might yield. This is of course speculative, but it's a good exercise; in particular, it might help you think backwards from your end goals to figure out a good "forward direction" plan.
  3. Plan: Describe the types of experiments you'll do and estimate by when you will do them. Mention what you will do to ensure success (check out certain papers/books, talk to certain people that are knowledgeable, etc.); as much as you can, be specific about what/who you might consult with. Grad students might rely on faculty mentors/advisors for expert advice. Undergrads might also contact faculty members (i.e. professors) or other researchers for advice.
  4. Task breakdown: Describe the tasks that each individual group member will perform. Remember that each student is expected to be involved with some machine learning component of the project. So, for example, one person's entire job cannot be "data collection" (unless there is some sophisticated machine learning happening just for the data collection itself).