Algorithms for Big Data in BioMed

Image result for personalized medicine time magazine

Personalized-medicine

Tel Aviv University -- Blavatnik School of Computer Science

0368-4190-01

Algorithms for Big Data Analysis in Biology and Medicine

אלגוריתמים לניתוח נתוני עתק במדעי החיים וברפואה

http://www.cs.tau.ac.il/~rshamir/abdbm

Prof. Ron Shamir

TA: Nimrod Rappoport

Fall 2017

Tuesday 16-19 Dan David 003

Jump to:

Topic

Plan

We shall describe and analyze algorithmic and statistical methods for modern large-scale data. The methods are generic, but we motivate them and also demonstrate their applications on biomedical data. The course combines topics and ideas from genomics*, precision medicine**, machine learning and big data*.

Curriculum: (tentative)

· Introduction

· Statistical toolbox: Enrichment analysis: GO, TANGO, GSEA, KM plots, LogRank, Cox, ROC, PR curves

· Motif finding: PRIMA, MEME, Amadeus, DREME

· Clustering: graph formulations, k-means, SOM, hierarchical, CLICK, Newman's alg, Consensus, FPF, K-Boost, PCA

· Biclustering: ISA, Samba, Bimax

· Classification: Introduction, dimension reduction, KNN, SVM, Regression, feature selection BHASIC

· Biological networks: Matisse, Cezanne, Network propagation

· Drugs and personalized medicine

· Integrated analysis: Paradigm, iCluster, CoC, Hotnet, SNF, spectral methods

Administration:

Audience: The course is open for graduate and undergraduate students. Students in the MSc and BSc bioinformatics tracks can take this as a core course.

Prerequisites: Statistics for CS and Algorithms. Background in biology, machine learning or bioinformatics is not required. No biology background is assumed. The basic background in biology will be given in the first meetings.

Requirements:

Non-Scribers:

· (70%) Homework assignments involving theory and implementation (can be done in pairs)

· (30%) final exam

Scribers:

· (60%) Homework assignments

· (30%) final exam

· (15%) scribe

Course material:

· Lecture notes of my course on Gene Expression analysis (covering about half of the material).

· Class presentations and new scribes will be added during the semester.

· Scribe instructions

· Note: Homework assignments are available on Moodle.

Plan (tentative):

Lec.	Date	Topic	Scribe
1	24/10	Introduction	-
2	31/10	Statistical toolbox	Kathy Razmadze
3	7/11	Motif discovery	-
4	14/11	Clustering 1	-
5	21/11	Clustering 2	Tomer Wolfson
6	28/11	Biclustering	-
7	5/12	Classification 1	-
8	12/12	Classification 2	-
9	19/12	Integration 1 (Nimrod Rappoport)	Shahar Segal
10	26/12	Integration 2	Dan Coster
11	2/1	Systems genetics (Prof. Irit Gat-Viks)	Itay Levy
12	9/1	Biological networks / EMRs	Itay Harel, Omri Lifshitz
13	16/1	Drug targets (Prof. Roded Sharan)	David Pellow

Some background:

*Genomics and Big Data: Biotechnology enables today to measure many aspects of cellular life on the scale of the whole genome: the DNA, the RNA, proteins, interactions and many more. A typical single measurement ('profile') can produce 10⁴-10⁵ values. A typical medical study can generate multiple profiles for each of 100-1000 patients. Advanced computational methods are being developed to analyze such data, combining algorithms, machine learning, graph theory and statistics.

**Precision Medicine: The combination of cheap and accessible biotechnology, advanced computation and big data is expected to change the medical practice: rather than one-size-fits-all treatment and drug prescription, care will be tailored to the particular properties of a group of individuals - or even to a single individual. These properties can be based on the patients' genomes (via DNA deep sequencing), their metagenomes (skin, gut and other microbial community genomes, also measured by deep sequencing), their life style (monitored online by wearable devices) and their medical history (available as electronic medical records). Large projects have been initiated with this vision. For example, the US Precision Medicine Initiative, Genomics England 100,000 Genomes Project, Denmark's GenomeDenmark platform, and commercial projects (e.g. 23&me and Regeneron and Geisinger) are collecting genetic and clinical data from hundreds of thousands of patients. The determination of the best treatment based on these data raises major computational challenges, and we shall study some of them.

Contact info: email: rshamir AT tau dot ac dot il; phone: 640-5383; office: Schreiber 014; office hours – by appointment

picture credits:

· http://bioinformaticsreview.com/20151005/biominer-intro/

· Time magazine

· https://www.linkedin.com/pulse/20140923215637-5241481-artificial-intelligence-to-deliver-personalised-medicine

· https://www.whitehouse.gov/blog/2015/01/30/precision-medicine-initiative-data-driven-treatments-unique-your-own-body