Image result for personalized medicine time magazine


Tel Aviv University -- Blavatnik School of Computer Science


Algorithms for Big Data Analysis in Biology and Medicine

Prof. Ron Shamir

TA: Nimrod Rappoport

Fall 2017

Tuesday 16-19 Location tbd


We shall describe and analyze algorithmic and statistical methods for modern large-scale data. The methods are generic, but we motivate them and also demonstrate their applications on biomedical data. The course combines topics and ideas from genomics*, precision medicine**, machine learning and big data*.

Curriculum: (tentative)


        Clustering: graph formulations, k-means, SOM, hierarchical, CLICK, Newman's alg, Consensus, FPF, K-Boost, PCA

        Biclustering: ISA, Samba, Bimax

        Statistical toolbox: Enrichment analysis: GO, TANGO, GSEA, KM plots, LogRank, Cox, ROC, PR curves

        Classification: Introduction, dimension reduction, KNN, SVM, Regression, feature selection BHASIC

        Biological networks: Matisse, Cezanne, Network propagation

        Drugs and personalized medicine

        Integrated analysis: Paradigm, iCluster, CoC, Hotnet, SNF, spectral methods


Audience: The course is open for graduate and undergraduate students. Students in the MSc and BSc bioinformatics tracks can take this as a core course.

Prerequisites: Statistics for CS and Algorithms. Background in biology, machine learning or bioinformatics is not required. No biology background is assumed. The basic background in biology will be given in the first meetings.

Requirements: (tentative subject to change)


        (70%) Homework assignments involving theory and implementation (can be done in pairs)

        (30%) final exam


        (60%) Homework assignments

        (30%) final exam

        (15%) scribe

Course material:

        Lecture notes of my course on Gene Expression analysis (covering about half of the material)

        Scribe instructions (tbd)

        Speaker instructions (tbd)


To be added later.

Some background:

*Genomics and Big Data: Biotechnology enables today to measure many aspects of cellular life on the scale of the whole genome: the DNA, the RNA, proteins, interactions and many more. A typical single measurement ('profile') can produce 104-105 values. A typical medical study can generate multiple profiles for each of 100-1000 patients. Advanced computational methods are being developed to analyze such data, combining algorithms, machine learning, graph theory and statistics.

**Precision Medicine: The combination of cheap and accessible biotechnology, advanced computation and big data is expected to change the medical practice: rather than one-size-fits-all treatment and drug prescription, care will be tailored to the particular properties of a group of individuals - or even to a single individual. These properties can be based on the patients' genomes (via DNA deep sequencing), their metagenomes (skin, gut and other microbial community genomes, also measured by deep sequencing), their life style (monitored online by wearable devices) and their medical history (available as electronic medical records). Large projects have been initiated with this vision. For example, the US Precision Medicine Initiative, Genomics England 100,000 Genomes Project, Denmark's GenomeDenmark platform, and commercial projects (e.g. 23&me and Regeneron and Geisinger) are collecting genetic and clinical data from hundreds of thousands of patients. The determination of the best treatment based on these data raises major computational challenges, and we shall study some of them.


Contact info: email: rshamir AT tau dot ac dot il; phone: 640-5383; office: Schreiber 014; office hours by appointment


picture credits:

          Time magazine