Tel Aviv UniversitySchool of Computer Science

Fall 2004-2005

Computational Genomics


Benny Chor  and Amos Tanay

Lectures: Wednsdays 10:00-13:00

Recitations: Sundays 15:00-16:00


Instructions for Moed A exam, scheduled to Jan 25th, 2005.


Contact Info:








Prof. Benny Chor


Schreiber  223

By  e-appointment 

Teaching Assistant and Solicitor 
     (J & A Tanai, Q.C.s)

Adv. Amos Tanay


Schreiber  011

By  e-appointment 


Course Outline

The Human Genome Project has completed the sequencing of over 95 % of human DNA.  Other genome projects of various organisms (bacteria, fungi, worms, rice, wheat, flies, mice and primates)are either completed or in the process of advanced sequencing.  Newbio-technologies produce huge amounts of biological data, whose manual analysisis not possible.  These advances have pushed for a new scientific disciplineof bioinformatics, which integrates computer science, statistics, mathematics,and various biological fields.  In this  introductory course we concentrate on the algorithmic aspects of various computationaltasks with biological origins.


The course is opened to 3rd year students of the bioinformatics track and to 3rd year and M.Sc.Computer Science students. The algorithms course (“efficiency of computations”)is a prerequisite, and the computability course (“computational models”) shouldat least be taken concurrently with this course. Students from other disciplinesare encouraged to contact the instructor or teaching assistant. A backgroundin algorithms, probability, and programming is required. (In the previoustime this course was given by this instructor, a number of life science studentscompleted it successfully).


  • This year, in addition to a 3 hours weekly lecture, there will also be a 1 hour weekly recitation, given by the teaching assistant. Attending this recitation is absolutely optional, but is strongly encouraged. (BTW this is also the case regarding the lectures: If you can understand the material, answer the home assignments and pass the exam in flying colors without attending classes, then surely windsurfing or boogy-boarding is a better use of your time.)
  • Grades  in the course will be based on five-six problem sets (20-25% of total grade)and a final exam (75-80%).  

Homework submission in singles or pairs (no triplets, quartets, quintets etc.). There will be a total of 5-6 problem sets, involving both "dry" assignments and "wet" ones. The wet parts will require understanding and running existing software, but not writing any code. Homework should be done independently. External sources (books, journal articles, web pages) can be used but should be clearly quoted.



Tantative Course Plan (subject to changes):









 Introduction and administrativia.
 Pairwise Sequence Alignment: Global alignment.

 Kanehisa, ch. 3 
 Gusfield, chapter 11.

 Durbin et. al., chapter 2.


Biological background



Pairwise Sequence Alignment: Linear space algorithm.Local Alignment; FASTA and BLAST heuristics;

Durbin et. al., 2.5, 2.7

Gusfield, ch. 11

Dynamic programming

Examples. Affine gap penalties;



PAM and BLOSSUM. Multiple sequence alignment:

NP hardness of MSA (Isaac Elias slides)

 Gusfield, ch. 15 

Dynamic programming




Introduction to suffix trees and suffix arrays.
Biological and non-biological applications
Dan Geiger slides (pdf). Fernández-Baca slides (ppt).

Gusfield, ch. 5,6,7 


RNA folding



More suffix trees and applications.


 Durbin et al., 7.1, 7.2

 Gusfield, ch. 17.

Multiple alignment



Intro to phylogenetic trees reconstruction.


Suffix trees. Entropy



Reconstructing phylogenetic (evolutionary) trees: Character based methods: Maximum parsimony (MP).         NP hardness of MP.



Entropy vs. linear correlation.

Phylogenetic analysis:

Algorithms for small MP and weighted MP



Maximum likelihood (ML). Felsenstein "tiny ML" algorithm. ML properties, and comparison to MP.

Approximations and heuristics for hard computational problems: Solving traveling salesman problem (TSP) using simulated annealing heuristic.

Dimacs 2001 TSP challenge page

Outline of Koonin's COGs.

MP with short and long edges.

Ford Dollitle's lateral transfer cartoon.




Intro to physical mapping and radiation hybrid (RH) mapping. Producing RH maps via reduction to TSP. Held-Karp: Bounding TSP via MST.

Levitation without meditation.

RHO  (Ben-Dor


RHO server


algorithms: examples

(Hanuka break)



Hidden Markov models (HMM):

Examples; Forward-backwards

 Rabiner's Tutorial
Durbin et al. , ch. 3

Biological background



Hidden Markov models (HMM):

Viterbi's algorithm; EM.

Durbin et al. , ch. 3

HMMs: Alignment



Hidden Markov Models: Bioinfo applications

(lecture by Amos Tanay)

R. Durbin et al. , ch. 4

Profile HMMs




DNA chips: Animation; Analysis: Clustering and biclustering. (portions of ref1, ref2, ref3).

Disease diagnosis by machine learning and support vector machines:

Pluto's Introduction to Machine Learning.


Bayesian Nets



Guest lectures by Rotem Sorek: Computation at the service of biological research - Alternative splicing as a case study, and by Natan Nelson:

On the complex architecture of  oxygenic photosynthesis.


Exam examples


Problem Sets

General Submission Guidelines

         Problem set 1

         Problem set 2

         Problem set 3

        Problem set 4


  • Biological Sequence Analysis,R. Durbin et al. , Cambridge University Press, 1998.
  • Algorithms on Strings, Trees, andSequences: Computer Science and Computational Biology, D. Gusfield, Cambridge University Press, 1997.
  • Post-genome Informatics, M. Kanehisa , Oxford University Press, 2000.
  • Introduction to Computational MolecularBiology, J. Meidanis and J. Setubal, Brooks/Cole Pub Co., 1997.
  • Computational Molecular Biology : An Algorithmic Approach, P. Pevzner, MIT Press, 2000.
  • Introduction to Bioinformatics Algorithims,Neil C. Jones, Pavel A. Pevzner, MIT Press, 2004.
  •  Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, Bernhard Schölkopf and Alex Smola, MIT Press, 2002.    Available on-line: Preface, Section 1.1, Section 1.2, and more.

Biology Background Books

  • The Cartoon Guide to Genetics, by Larry Gonick and Mark Wheelis, Harper Perennial, 1991 (I have not read this book, but I was told it is gives a good popular overview. Some students complained the book spends way too much time before getting down to relevant business).


Other On-line Courses

Lior Pachter course: AlgebraicStatistics for Computational Biology, UC Berkeley, Fall 2004.
Ron Shamir course notes: Computational Genomics,
Tel-Aviv UniversityFall 2003/4

Martin Tompa's course notes: Computational Biology, (CSE 527), University of Washington, Seattle, Winter 2000.

Nir Friedman's course slides: ComputationalMolecular Biology the Hebrew University, Jerusalem, Fall 1999.
Karp, Ruzzoand Tompa's course notes: Algorithms in Molecular Biology University of Washington, Seattle, Winter 1998. 


Other Interesting On-line Resources