Tel Aviv University School of Computer Science

Fall 2007-8

Workshop in Computer Science

0368-3500-07

http://www.cs.tau.ac.il/~rshamir/workshop/07/

Workshop instructor: Prof. Ron Shamir
Lab Instructor: Chaim Linhart (chaiml AT post.tau.ac.il)

Workshop: Sundays 16-18 ; Lab: Tuesdays 14-17

 

 

Downloads: Sample input files , Math functions , Motif logo

First assignment: Design document, due 24/2/08.

Workshop Topic: The workshop will deal with design, analysis and development of efficient algorithms for finding pairs of recurring sequence patterns in a text . The motivation comes from identifying regulatory motifs in DNA, a topic that has been under intensive research for the last ten years. As part of the project, application of the software developed on simulated and real biological data will be performed.


In this example, two sequence patterns (the blue and the green motifs) are marked in eight different sequences. For the biologically knowledgeable, each line shows a fragment of a promoter of a human G2+M cell cycle gene (The gene names are lister on the left). Each such sequence is >1000 letters long. The blue motif (CCAAT) is the NF-Y recognition site, and the green motif [G/A]TTT[G/A]AA is the CHR recognition site. It is clear from the picture that these motifs tend to co-occur in close proximity in the same targe genes, and indeed the transcription factors that they represent together regulate the expression of G2+M cell cycle genes.

 
Prerequisites: The workshop is open to all 3rd year students in computer science. No biological background is assumed. There will be preference to students in the bioinformatics track. Knowledge of Java is required.

Format: The work will be done by pairs or single students. We shall have several introductory meetings in the beginning of the semester to provide the necessary background. Then groups will be formed and each group will start the design phase of its project. After individual meetings of the groups with the instructors and confirmation of the design, the implementation will start. Towards the end of the semester joint meetings of all participants will take place, in which each group will present its project. After the completion of the project each group will meet with the instructors to demonstrate the software.

Consultation meetings of single groups with the instructors will be carried out throughout the semester as needed.

The test datasets of the project will be the same for all groups, and the performance of all algorithms will be measured on that data. A bonus will be given to the program that will produce the correct results most efficiently. In addition, each group will be given separate real biological datasets of various types and organisms and will be required to explore it and report on the findings.

Software: The algorithms will be implemented in JAVA and tested on Linux.

Grading: 30% for the design, 40% for the implementation, 20% for the performance and experimental results, 10% for final presentation, 10% bonus to the group with the best performance.

 

Slides: background and project plan