Instructor
: Prof. Benny Chor
(benny@cs.tau.ac.il), Schreiber, room 223.
Teaching Assistants : Tamir Tuller (tamirtul@post.tau.ac.il),
Schreiber,
room M20 (basement),
and
Giora Unger (giorau@post.tau.ac.il), Schreiber, room 012.
Where : Schreiber, Room 008.
When : Mondays
Students of the bioinformatics track will have higher priority in
registering
to this workshop.
The workshop will provide hands on experience in implementing various optimization techniques, developing statistical tools for analyzing biological data, and using public biological databases. Most (but not all) projects have both research and implementation aspects. We offer projects in various current subjects in computational biology, including analysis of gene expression data and DNA microarrays, analysis of gene structure, understanding genomic pathways, construction of phylogenetic trees and networks, etc.
The workshop also
includes four
lectures on various topics in software engineering, given by the
Computer
Science system staff. These will include subjects relevant to ant
medium to
large scale software project. Students are expected to utilize these
tools and
techniques in the final projects and documentation. Attending these
four
lectures is mandatory.
An integral part of the workshop is
presenting
an outline of the project, as well as the completed project,
to all participants (not just the
course staff).
It is mandatory to physically attend class during these entire parts of
the
workshop,
which are expected to take place on March 29th (outlines presentation)
and
during the last three meetings (final projects presentation).
Naturally, your projects should be completed by then, and submitted no
later
than the last day of the semester.
Projects will be performed in groups consisting of at most two students
(if the number of students is odd, one triplet may be allowed).
In the first two weeks, each group must choose two preferred
projects (a
blue one and a red
one) and send them to Giora (giorau@post.tau.ac.il).
We will try to match choices with availability, but in case of
collisions,
assignments will be based on the time at which the request reaches
Giora's
mailbox.
March 1 
Lecture: Introduction to
BioInformatics. Administratrivia. 
March 8 
Lecture: Projects'
Overview 
March 15 
Project selection completed. 
March 15 
Lecture on CVS by Alon Shalita from the CS system staff (slides, summery) 
March 22 
Lecture on parallel
processing and Condor by Edward Aronovich from the CS
system staff (slides) 
March 29 
Outlines of all planned projects presented
by students in class (10 minutes per group) 
April 19 
Lecture on databases and MySQL by Alon Shalita from the CS system staff 
May 5 
Lecture on databases, MySQL, and packaging by Alon Shalita from the CS system staff 
May 31 
Highly recommended (free, registration required): Annual Israeli BioInformatics day
in the Crowne Plaza Hotel, Jerusalem 
June 7 
Presentation of completed projects in
class (20 minutes per group). Will take
place 1113 and 1416. 
June 11 
Last date for projects submission, including
a detailed presentation of the projects to the course staff. 
Giora
Unger 

Yaara Azaria, Maya Mograbi, Daniela Raijman 
Extending linear separability tests from
pairs to triplets and quadruples, 
Eran Ophir, Amir Segall 
MAD MEX  motif finding 
Udi Altshuler, Elad Mazor 
Extending linear separability tests from SVM to additional classifiers 
Tamir
Tuller 

Tal Peled, Uri Zonnens 
Simultaneous identification of duplication and lateral transfer 
Igor Ulitsky, Dudu Burstein 
A phylogenetic tree construction algorithm for complete genomes 
Tal Tamir, Lior Gad 
Gene prediction by spectral rotation measure
and LZW compression algorithm 
Erez Makabi, Eyal Megran 
Constructing
Bayes nets 
1.
Simultaneous
identification of duplication and lateral transfer
Overview
This project is based on recent work by M.Hallet, J.Lagergren
and A.Tofigh.
The work introduce a combinatorial model that incorporate duplication
events as
well as lateral gene transfer event. The goal is to explain the
deference
between a gene tree an a species tree. In the project you will
implement two
algorithms from the article and apply it on synthetic and biologic data.
References
M.Hallet, J.Lagergren and A.Tofigh, Simultaneous identification of duplications and lateral transfer, 2004.
What to do
1.
2. Implementation of two algorithms from the article.
3. Applying the methods on synthetic and biologic data.
2. Using Hadamard transform and the LogDeterminant for construction of phylogenetic trees.
Overview
The spectral analysis of sequences and distances
data is
Hadamard transform based method for phylogenetic tree construction. The
LogDeterminant transform provides distances which allows the correct
tree to be
recovered consistency when sequences differ markedly in nucleotide
frequencies.
In this project you will implement and analyze those two methods for
phylogenetic tree construction.
References
Steel,M.., Lockhart,P.J. and Penny,D. (1993) Confidence in evolutionary trees from biological sequence data. Nature, 364, 440–442.
M.D.Hendy. Spectral analysis of phylogenetic data. 1993.
M.D.Hendy, D.Penny, M.A.Steel. A discrete fourier analysis for evolutionary trees. 1993.
Additional
(optional) References
Lockhart,P.J., Steel,M.A., Hendy,M.D. and Penny,D. Recovering evolutionary trees under a more realistic model of sequence
evolution. Mol. Biol. Evol.,
11, 605–612, 1994
Steel,M., Huson,D. and
Lockhart,P.J. (2000) Invariable sites models and their use in phylogeny
reconstruction.
Syst. Biol., 49, 225–232
D.Penny, M.Hasegawa, P.J.Waddel, M.D.Hendy. Mammalian evolution: timing and implications from using the LogDeterminant Transform for proteins of differing amino acid composition. 1999.
What to do
1.
Reading and understanding the references
2. Implementation of two methods for reconstruction of phylogenetic
tree,
partial code will be given by Tamir.
3. Applying the methods on synthetic and biologic data, analyzing the
results
and checking the number of maximum points reached by those methods on
biological data.
3.
A
phylogenetic tree construction algorithm for complete genomes.
Overview
In this project you will implement an new LZW based algorithm
for generating
a phylogenetic tree, this algorithm is different from other known
method in two
main point: it doesn’t restricted to any nucleotide  substitution
model and it
can be applied to complete genomes.
References
Http://www.Datacompression.com/vq.html.
More
material will be given by Tamir.
What to do
1.
Implementation of some LZW based algorithms for finding
distances between genomes.
2. Using Matlab or any other public software for generating a
phylogenetic tree
from the distances matrix found in stage 1.
3. Download complete genomes from NCBI, applying and checking the
method on
those genomes.
4. More precise details will be given by Tamir.
.
Overview
One of the major problems in bioinforamtics is gene finding. In
this
project you will write a gene prediction software. Your algorithm will
combine
two methods, one is based on FFT and the other is based on the LZW
compression
algorithm.
References
D.Kotlar,
Y.Lavner. Gene prediction by spectral rotation
(SR) measure : a new method for identifying proteincoding regions,
Genome
Research 13:19301937, 2003
Http://www.Datacompression.com/vq.html.
What to do
1.
Implement an LZW based algorithm for gene prediction.
2. Implement a combined LZW/FFT based algorithm for gene prediction by
using
neuron network/SVD or any other learning method (after discussing with
the
Tamir), the code for the FFT method will be given by the instructor.
3. Checking the performance of the method on several genomes and
compare it to
other methods.
Overview
DNA hybridization array simultaneously measure expression level
for
thousands of genes. This measurements provide a “snapshot” of the
transciption
level within the cell, one of the methods for discovering interactions
between
genes base on multiple expression measurements is based on Bayesian
network,
which is a graph based model of joint multivariate probability
distribution
that capture properties of conditional dependence between variables. In
this
project you will write a code for learning a Bayesian network according
to a
known algorithm and you’ll check it on synthetic and real data
Remark:
If needed, this project can be expanded to two
projects, so that two groups may work (independently) on them.
References
D.
Margarities and S. Thrun. Bayesian network induction via
local Neighborhoods, 1999.
N.Friedman, M.Linial, I.Nachman, D.pe’er. Using Bayesian network for
analyze
expression data, 2000.
What to do
1. Write a code for a
known Bayesian network inferring algorithm.
2. Test the performances of the algorithm on synthetic data, which you
will
sample from a known Bayesian network, and on real biological data.
3. Improving the performance of the algorithm on biological data (for
more
details please contact Tamir).