Tel Aviv UniversitySchool of Computer Science

Fall 2004-2005


Computational Genomics

  0382.3102

http://www.cs.tau.ac.il/~bchor/CG05/comp-genom.html

Benny Chor  and Amos Tanay

Lectures: Wednsdays 10:00-13:00

Recitations: Sundays 15:00-16:00

 

Instructions for Moed A exam, scheduled to Jan 25th, 2005.

 


Contact Info:
 

 

 

Email

Phone

Office

OfficeHours

Instructor

Prof. Benny Chor

benny@cs.tau.ac.il

640-5977

Schreiber  223

By  e-appointment 

Teaching Assistant and Solicitor 
     (J & A Tanai, Q.C.s)


Adv. Amos Tanay


amos@post.tau.ac.il


640-5394


Schreiber  011


By  e-appointment 


 
 
 

Course Outline

The Human Genome Project has completed the sequencing of over 95 % of human DNA.  Other genome projects of various organisms (bacteria, fungi, worms, rice, wheat, flies, mice and primates)are either completed or in the process of advanced sequencing.  Newbio-technologies produce huge amounts of biological data, whose manual analysisis not possible.  These advances have pushed for a new scientific disciplineof bioinformatics, which integrates computer science, statistics, mathematics,and various biological fields.  In this  introductory course we concentrate on the algorithmic aspects of various computationaltasks with biological origins.

Prerequisites

The course is opened to 3rd year students of the bioinformatics track and to 3rd year and M.Sc.Computer Science students. The algorithms course (“efficiency of computations”)is a prerequisite, and the computability course (“computational models”) shouldat least be taken concurrently with this course. Students from other disciplinesare encouraged to contact the instructor or teaching assistant. A backgroundin algorithms, probability, and programming is required. (In the previoustime this course was given by this instructor, a number of life science studentscompleted it successfully).
 
 
Administratrivia

 

  • This year, in addition to a 3 hours weekly lecture, there will also be a 1 hour weekly recitation, given by the teaching assistant. Attending this recitation is absolutely optional, but is strongly encouraged. (BTW this is also the case regarding the lectures: If you can understand the material, answer the home assignments and pass the exam in flying colors without attending classes, then surely windsurfing or boogy-boarding is a better use of your time.)
  • Grades  in the course will be based on five-six problem sets (20-25% of total grade)and a final exam (75-80%).  

Homework submission in singles or pairs (no triplets, quartets, quintets etc.). There will be a total of 5-6 problem sets, involving both "dry" assignments and "wet" ones. The wet parts will require understanding and running existing software, but not writing any code. Homework should be done independently. External sources (books, journal articles, web pages) can be used but should be clearly quoted.

 

 

Tantative Course Plan (subject to changes):

Lecture
___________

Date
______

 Subject
______________________________________

 Reference(s) 
__________________

Recitation

________________

             1

20/10/04

 Introduction and administrativia.
 Pairwise Sequence Alignment: Global alignment.

 Kanehisa, ch. 3 
 Gusfield, chapter 11.

 Durbin et. al., chapter 2.

 

Biological background

             2

27/10

Pairwise Sequence Alignment: Linear space algorithm.Local Alignment; FASTA and BLAST heuristics;

Durbin et. al., 2.5, 2.7

Gusfield, ch. 11

Dynamic programming

Examples. Affine gap penalties;

            3 

 3/11

PAM and BLOSSUM. Multiple sequence alignment:

NP hardness of MSA (Isaac Elias slides)

 Gusfield, ch. 15 
 

Dynamic programming

Examples

            4

 10/11

Introduction to suffix trees and suffix arrays.
Biological and non-biological applications
Dan Geiger slides (pdf). Fernández-Baca slides (ppt).

Gusfield, ch. 5,6,7 

 

RNA folding

           5           

          17/11

More suffix trees and applications.

 

 Durbin et al., 7.1, 7.2

 Gusfield, ch. 17.

Multiple alignment

            6

         24/11

Intro to phylogenetic trees reconstruction.

 

Suffix trees. Entropy

           7

          1/12

Reconstructing phylogenetic (evolutionary) trees: Character based methods: Maximum parsimony (MP).         NP hardness of MP.

 

 

Entropy vs. linear correlation.

Phylogenetic analysis:

Algorithms for small MP and weighted MP

           8

       8/12

Maximum likelihood (ML). Felsenstein "tiny ML" algorithm. ML properties, and comparison to MP.

Approximations and heuristics for hard computational problems: Solving traveling salesman problem (TSP) using simulated annealing heuristic.

Dimacs 2001 TSP challenge page

Outline of Koonin's COGs.

MP with short and long edges.

Ford Dollitle's lateral transfer cartoon.

           9

              

      15/12 

Intro to physical mapping and radiation hybrid (RH) mapping. Producing RH maps via reduction to TSP. Held-Karp: Bounding TSP via MST.

Levitation without meditation.

RHO  (Ben-Dor et.al.)

 

RHO server

Approximation

algorithms: examples

(Hanuka break)

         10

      22/12

Hidden Markov models (HMM):

Examples; Forward-backwards

 Rabiner's Tutorial
Durbin et al. , ch. 3

Biological background

          11

       28/12

Hidden Markov models (HMM):

Viterbi's algorithm; EM.

Durbin et al. , ch. 3

HMMs: Alignment

         12

    5/1/2005

Hidden Markov Models: Bioinfo applications

(lecture by Amos Tanay)

R. Durbin et al. , ch. 4

Profile HMMs

         

         13

         12/1

DNA chips: Animation; Analysis: Clustering and biclustering. (portions of ref1, ref2, ref3).

Disease diagnosis by machine learning and support vector machines:

Pluto's Introduction to Machine Learning.

Schàlkopf-Smola,1.1,1.2

Bayesian Nets

         14

          19/1

Guest lectures by Rotem Sorek: Computation at the service of biological research - Alternative splicing as a case study, and by Natan Nelson:

On the complex architecture of  oxygenic photosynthesis.

 

Exam examples

 

Problem Sets

General Submission Guidelines
 

         Problem set 1

         Problem set 2

         Problem set 3

        Problem set 4

Books

  • Biological Sequence Analysis,R. Durbin et al. , Cambridge University Press, 1998.
  • Algorithms on Strings, Trees, andSequences: Computer Science and Computational Biology, D. Gusfield, Cambridge University Press, 1997.
  • Post-genome Informatics, M. Kanehisa , Oxford University Press, 2000.
  • Introduction to Computational MolecularBiology, J. Meidanis and J. Setubal, Brooks/Cole Pub Co., 1997.
  • Computational Molecular Biology : An Algorithmic Approach, P. Pevzner, MIT Press, 2000.
  • Introduction to Bioinformatics Algorithims,Neil C. Jones, Pavel A. Pevzner, MIT Press, 2004.
  •  Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, Bernhard Schölkopf and Alex Smola, MIT Press, 2002.    Available on-line: Preface, Section 1.1, Section 1.2, and more.

Biology Background Books

  • The Cartoon Guide to Genetics, by Larry Gonick and Mark Wheelis, Harper Perennial, 1991 (I have not read this book, but I was told it is gives a good popular overview. Some students complained the book spends way too much time before getting down to relevant business).


 

Other On-line Courses

Lior Pachter course: AlgebraicStatistics for Computational Biology, UC Berkeley, Fall 2004.
Ron Shamir course notes: Computational Genomics,
Tel-Aviv UniversityFall 2003/4

Martin Tompa's course notes: Computational Biology, (CSE 527), University of Washington, Seattle, Winter 2000.

Nir Friedman's course slides: ComputationalMolecular Biology the Hebrew University, Jerusalem, Fall 1999.
Karp, Ruzzoand Tompa's course notes: Algorithms in Molecular Biology University of Washington, Seattle, Winter 1998. 

 

Other Interesting On-line Resources