Recitations: Thursdays
Contact Info:


Email 
Phone 
Office 
OfficeHours 
Instructor 
benny AT cs.tau.ac.il 
6405977 
Schreiber 223 
eappointment 

Instructor 
ruppin AT
cs.tau.ac.il 
6406528 
Schreiber 118 
eappointment 

Teaching Assistant 
shlomito AT post.tau.ac.il 
6405369 
Schreiber 19מ 
eappointment 
Course Outline
The Human Genome Project has completed the sequencing of the human DNA. Other genome projects of various organisms (bacteria, fungi, worms, rice, wheat, flies, chicken, mice and primates) are either completed or in the process of advanced sequencing. New biotechnologies produce huge amounts of biological data, whose manual analysis not possible. These advances have pushed for a new scientific discipline of bioinformatics, which integrates computer science, statistics, mathematics, and various biological fields. In this introductory course we concentrate on the algorithmic aspects of various computational tasks with biological origins. We will explore their computational complexity, and ways to solve them even if they are computationally intractable, such as approximation algorithms and heuristics.
Course topics include comparisons and alignments of biological sequences; Suffix trees and their uses for analyzing very long sequences (like complete genomes); Reconstruction of evolutionary trees using various optimization criteria, such as maximum parsimony, maximum likelihood, and distance criteria; Physical mapping of genomes; Genome alignment; Probabilistic models, such as PSSM (Position Specific Scoring Matrix) and HMM(Hidden Markov Models); HMM algorithms (Viterby, Expectation Maximization);Biological applications of the probabilistic models (e.g. gene prediction, protein classification, and sequence alignment); Analyzing biological networks; DNA microarrays and their uses; Clustering gene expression data; Classification using SVM (Support Vector Machines), and applications to clinical diagnosis.
Final exam: Closed book except for 4 doublesided pages of your choice.
Tantative Course Plan (subject to changes):
Lecture 
Date 
Subject 
Reference(s) 
Recitation 
1 

Introduction and administrativia. 
Kanehisa, ch. 3 Durbin et. al.,
ch. 2. 
______ 
2 
9/11 
Pairwise Sequence Alignment: Linear space algorithm; FASTA and BLAST heuristics; PAM and BLOSSUM 
Durbin et. al., 2.5, 2.7 

3 
16/11 
Genome Alignment (no slides) Introduction to suffix trees and suffix
arrays: Dan Geiger (pdf), FernndezBaca slides
(ppt). 
Gusfield, ch. 15 Gusfield, ch. 5,6,7 

4 
23/11 
Durbin et al., 7.1, 7.2 Gusfield,
ch. 17. 
Multiple Sequence Alignment: Family
Representations and


5 
30/11 
Reconstructing phylogenetic (evolutionary) trees: 
Cancelled 

6 
7/12 
Maximum likelihood estimate (MLE); ML inference of trees. ML and AML properties.
Large, small, and tiny versions. Hardness results. 

Solutions to assignment 1 (recitation in Schreiber 309) 
7 
14/12 
1) Maximum likelihood (ML and AML): Felsenstein "tiny ML" algorithm, Pupko et. al "tiny AML" algorithm. 

Weighted parsimony. Factor 2 approximation to parsimony. TSP: Approximation and simulated annealing heuristic. 
8 
21/12 
Durbin et al. , ch. 3 
ML vs. MP. ML examples. 

9 
28/12 
Hidden Markov
models (HMM): Parameter estimation using the EM algorithm. ML vs MAP (maximum apostriori probability); 
Profile HMM to identify protein families. 

10 
4/1/2006 
Introduction to biological networks 1 

Solutions:
Assignment 2 
11 
11/1 

Pair HMM 

12 
19/1 
Intro to DNA chips and an animation;


Network Motives 
13 
26/1 
Pluto's Introduction to Machine Learning. Support vector machines and kernels. SVMs for string problems and protein classification. Disease diagnosis by SVM on gene expression data. Linear separability. 
Solutions: Assignment 3. Linear programming. 

14 
3/2 
. 

Exam examples 
The course is "officially'' opened to 3rd year
students of the bioinformatics track, to 3rd year and graduate Computer Science
students. The algorithms course (“efficiency of computations”) is a
prerequisite, and the computability course (“computational models”) should at
least be taken concurrently with this course. Students from other disciplines
are welcome to take the course, but are encouraged to first contact the
instructor or teaching assistant. Some background in algorithms, probability,
and programming is required.(In the previous times
this course was given by this instructor, a number of life science students
completed it successfully).
Administratrivia
BTW this is also the case
regarding the lectures: If you can understand the material, answer the home
assignments (on your own) and pass the exam in flying colors without attending
classes, then surely windsurfing or boogyboarding is
a better use of your time.)
Homework submission in singles or pairs (no triplets, quartets, quintets etc.). There will be a total of 56 problem sets, involving both "dry" assignments and "wet" ones. The wet parts will require understanding and running existing software, but not writing any code. Homework should be done independently. External sources (books, journal articles, web pages) can be used but should be clearly quoted. Research done at the Technion has shown a significant correlation between independent work and final performance.
Assignment
1 (due
Assignment
2 (due
Assignment
3 (due
Assignment 4 (not there yet)
Books
Other Online Courses
Lior
Pachter course: Algebraic Statistics for
Computational Biology, UC Berkeley, Fall 2004.
Ron Shamir course notes: Computational Genomics,
TelAviv UniversityFall2003/4
Martin Tompa's course notes: Computational Biology, (CSE 527),
University of
Nir
Friedman's course slides: Computational Molecular Biology the
Karp, Ruzzo and Tompa's course notes:
Algorithms in Molecular Biology
Other Interesting Resources (not course related)