Tel Aviv University School of Computer Science

Fall 2010-11

Workshop in Computer Science

0368-3500-22

http://www.cs.tau.ac.il/~rshamir/workshop/10/

Workshop instructor: Prof. Ron Shamir
Lab Instructor: Yaron Orenstein (yaronore AT post.tau.ac.il)

Workshop: Tuesdays 14-16 ; Lab: TBA

 

 

Downloads: Training files , Fixed training files , Test files , Math functions , Motif logo , ComparePwms.java

First assignment: Research report, due 23/11/10.

Workshop Topic: The workshop will deal with design, analysis and development of efficient algorithms for finding sequence motifs in Protein Binding Microarray (PBM) data. The motivation comes from identifying regulatory motifs in DNA, an important topic that has been under intensive research for over ten years. As part of the project, application of the software developed on real biological data will be performed.

 

Sequence

Signal

CATGTAAGAGTTGACTCTGGTCTGTTCTAAT

28926

TTGCTCATCAGAGTCGCGTAACAGGCTTTC

1457

TCCAGTTTAGGTGGCGCCCGGAACCCTTAA

12972

……

……

…..

……

……

…..

CATGTAGCCCTTAACTGTGACTAAAGCCCC

33755

This is a simplified example of PBM dataset. It consists of a list of ~41000 sequences, each of length 35 in the four-letter DNA alphabet. Each possible 10-mer appears exactly once in the set. For each 35-mer six measured values are provided: “signal” indicates how strong the motif matches the sequence, as measured experimentally.
 

A simplified example of a motif: it is a sequence of length 6-12, which is typically degenerate: in some positions alternative letters may occur. The motif shown is of length 8 where the 3rd and 7th positions are degenerate. Occurrences of this motif is marked in red in two of the BPM sequences above.
In more general motif models, each of the four letters has a specified probability in each position.

 
Prerequisites: The workshop is open to all 3rd year students in computer science. No biological background is assumed. In case the workshop is oversubscribed, there will be preference to students in the bioinformatics track. Knowledge of Java is required.

Format: The work will be done by pairs of students or individually. We shall have 2-3 introductory meetings in the beginning of the semester to provide the necessary background. Then groups will be formed and each group will start the design phase of its project. After individual meetings with the groups and confirmation of the design, the implementation will start. Towards the end of the semester, joint meetings of all participants will take place, in which each group will present its project. After the completion of the project, each group will meet with the instructors to demonstrate the software and evaluate its performance, in addition to submitting the results of the algorithm on the test data.

Consultation meetings of single groups with the instructors will be carried out throughout the semester as needed.

Students will be given training datasets with given solutions, for training and practice, and test datasets. The same datasets will be given to all groups. The performance of all algorithms will be measured on the test datasets. In addition, in the final meeting with the instructors, an additional dataset will be given for online query testing.

Software: The algorithms will be implemented in JAVA and tested on Linux.

Grading:

 

Slides: background and project plan