Tel Aviv University -- Blavatnik School of Computer Science
Algorithms for Big Data Analysis in Biology and Medicine
àìâåøéúîéí ìðéúåç ðúåðé òú÷ áîãòé äçééí åáøôåàä
Prof. Ron Shamir
TA: Nimrod Rappoport
Tuesday 16-19 Location tbd
We shall describe and analyze algorithmic and statistical methods for modern large-scale data. The methods are generic, but we motivate them and also demonstrate their applications on biomedical data. The course combines topics and ideas from genomics*, precision medicine**, machine learning and big data*.
· Clustering: graph formulations, k-means, SOM, hierarchical, CLICK, Newman's alg, Consensus, FPF, K-Boost, PCA
· Biclustering: ISA, Samba, Bimax
· Statistical toolbox: Enrichment analysis: GO, TANGO, GSEA, KM plots, LogRank, Cox, ROC, PR curves
· Classification: Introduction, dimension reduction, KNN, SVM, Regression, feature selection BHASIC
· Biological networks: Matisse, Cezanne, Network propagation
· Drugs and personalized medicine
· Integrated analysis: Paradigm, iCluster, CoC, Hotnet, SNF, spectral methods
Audience: The course is open for graduate and undergraduate students. Students in the MSc and BSc bioinformatics tracks can take this as a core course.
Prerequisites: Statistics for CS and Algorithms. Background in biology, machine learning or bioinformatics is not required. No biology background is assumed. The basic background in biology will be given in the first meetings.
Requirements: (tentative – subject to change)
· (70%) Homework assignments involving theory and implementation (can be done in pairs)
· (30%) final exam
· (60%) Homework assignments
· (30%) final exam
· (15%) scribe
· Lecture notes of my course on Gene Expression analysis (covering about half of the material)
· Scribe instructions (tbd)
· Speaker instructions (tbd)
To be added later.
*Genomics and Big Data: Biotechnology enables today to measure many aspects of cellular life on the scale of the whole genome: the DNA, the RNA, proteins, interactions and many more. A typical single measurement ('profile') can produce 104-105 values. A typical medical study can generate multiple profiles for each of 100-1000 patients. Advanced computational methods are being developed to analyze such data, combining algorithms, machine learning, graph theory and statistics.
**Precision Medicine: The combination of cheap and accessible biotechnology, advanced computation and big data is expected to change the medical practice: rather than one-size-fits-all treatment and drug prescription, care will be tailored to the particular properties of a group of individuals - or even to a single individual. These properties can be based on the patients' genomes (via DNA deep sequencing), their metagenomes (skin, gut and other microbial community genomes, also measured by deep sequencing), their life style (monitored online by wearable devices) and their medical history (available as electronic medical records). Large projects have been initiated with this vision. For example, the US Precision Medicine Initiative, Genomics England 100,000 Genomes Project, Denmark's GenomeDenmark platform, and commercial projects (e.g. 23&me and Regeneron and Geisinger) are collecting genetic and clinical data from hundreds of thousands of patients. The determination of the best treatment based on these data raises major computational challenges, and we shall study some of them.
Contact info: email: rshamir AT tau dot ac dot il; phone: 640-5383; office: Schreiber 014; office hours – by appointment
· Time magazine