Machine Learning: Foundations (2010/11)

Final Project due 3.3.2013

REMARK: For submission of the project create a small web site with the document and pointers to any relevant stuff (data, code, etc.) You probably not want to write your student ID, having your name is enough.

For the final project you need to select a project that would utilize what you learned about machine learning.

There will be a great flexibility in the topic of the project, but you need to first get it approved !

Classifying the possible projects, there are a few different general types

The project can be mainly an empirical project. Involving experimentation (hopefully) with real data, and the

methodologies (algorithms) introduced in the course. The project can be theoretical in mature. This again will have

two distinct flavors. It can be a summary, reading a few related papers and writing a critical summary of them

(critical summary means that you try to point to weaknesses in the model/results, and what should be the goal).

It can be a research project, which will usually involve in trying to define an open interesting problem, and trying

(hopefully) solve it, but at the very least be able to explain what you tried to solve it, and why did it fail (or where

did you get stuck).

If you have an idea which you think is reasonable, and does not fall in any of the category, simply ask if it can be a project,

At the end of the project you will need to write a report (ideally, 7-10 pages and no more than 12) that you summarize what you did in the project.

The project can be done in pairs.

REMEMBER: You need to first get your project approved!

The deadline for the project is Dec 23, 2012.

In the following I will be more precise, and try to give you pointers to each project.

Empirical data project

For this kind of project you will need two elements. First you will need the data that you are interested in,

and second, you need to define what you like to do (learn) with the data.

DATA:

The best kind of data is any data that you have access to and you are interested in analyzing.

(If this is part of your graduate studies, even better.)

Here are a few more standard pointers to open data sets:

Delve Datasets

UC Irvine Machine Learning Repository

ICDAR

MNIST and procedures to handle it Hinton's webpage

Also, you can try to think “out of the box”

Yahoo! Finance has a large variety of financial data about stock price.

Google Trend has information about queries.

Finally, there is a huge variety on the web of user report on product and services.

LEARNING:

Try to propose what you would like to do with the data. You can either try to simply learn a given task,

or try to learn something about a learning algorithm , or try to compose a two (or more) learning algorithms.

Here are a few examples:

TRIPADVISOR: try to reconstruct their proprietary Popularity Index algorithm. (see http://www.tripadvisor.nl/pages/owner_faq.html)

SENTIMENT ANALYSIS: Given user reviews, to determine if the review positive, negative or neutral. (see http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html)

SPAM Analysis: http://csmining.org/index.php/data.html

Empirical algorithm project

Select one algorithm, and some works that show how to optimize the performance of the algorithm.

Do an empirical evaluation of the proposed methodologies of setting the parameters, and their influence on the performance.

Still, you need in advance to decide on which data sets you will perform the empirical evaluation

Summary project

You first need to find a number of related papers (2-3) on the same topic (that interest you).

A good starting place is surveys and tutorials that explain the area and have many references.

Another place to find papers on machine learning is in the conferences (COLT (more theoretical), ICML (more empirical) or NIPS) or journals (JMLR or Machine Learning Journal).

Here are a few leads on some of the topics:

Boosting page also tutorial, tutorial2

Semi-supervised learning- survey, tutorial

Active learning: Sanjoy Dasgupta and tutorial

Online Learning and Regret Minimization: Survey, tutorial, tutorial2, Elad Hazan, Nicolo Cesa-Bianchi

Domain Adaptation: tutorial and Mehryar Mohri

Structured Prediction: tutorial

Agnostic Learning: tutorial

Research project

Basically, this is very similar to the summary project, in the planning and preparation. The main difference might be that you will select maybe a single paper,

and that your goal would be to identify a research project that is related to this work. In the proposal stage try to give a general outline of what you would like to do.

Multi-Class Labels

Overview can be found in Chapter 8 of:

Foundations of machine learning Mehryar Mohri, Afshin Rostamizadeh and Ameet Talwalkar; MIT Press, 2012

The slides are available here