Seminar – Advanced Topics in Web Data Management

 

Tova Milo, Daniel Deutch, 2016/17

 

Meetings: Wednesdays 16-18, Kaplun 324

 


Seminar Information

 

The seminar focuses on managing, analyzing, sharing, and integrating data and applications on the web. Areas of interest include

crowdsourcing, data exploration, Big Data, probabilistic data and data provenance. We shall read recent

papers in this area, focusing on several specific issues, and then explore possible future directions. A tentative list of

papers is enclosed.

 

Schedule: Sem A

 

 

Slava 16.11

Extracting Databases from Dark Data with DeepDive

 

Chai & Shevach 23.11

Goods: Organizing Google's Datasets

 

Tomer wolfson 30.11

Semantic SPARQL Similarity Search Over RDF Knowledge Graphs

 

Efrat 7.12

RDF Graph Alignment with Bisimulation

 

 

Amir+Yuval 14.12

Big Data Analytics with Datalog Queries on Spark.

 

 

Oded+Eyal 21.12

 Query From Examples: An Iterative, Data-Driven Approach to Query Construction

 

 

Brit+Ahmad 28.12

SLING: a near optimal index structure simrank

 

 

Yehonatan + Tomer H. 4.1

ActiveClean: Interactive Data Cleaning For Statistical Modeling

 

 

Ariel+Amit 11.1

Top-k Relevant Semantic Place Retrieval on Spatial RDF Data 

 

 

NO SEMINAR 18.1

 

Ori 25.1

CLAMShell: Speeding up Crowds for Low-latency Data Labeling

 

 

Schedule: Sem B

 

29.3: Yuval+Slava+Ahmad

 

5.4+12.4+19.4: NO SEMINAR (PASSOVER BREAK + ICDE)

 

26.4: Brit+Yizhak+Ori

 

3.5: Tomer H. + Shevah + Eyal

 

10.5: Nave + Tomer W. + Ariel

 

17.5: NO SEMINAR (SIGMOD)

 

24.5: Special Guest: Prof. Susan Davidson (Upenn)

 

7.6: Oded+Amit+Chai

 

14.6: Amir + special presentations by children groups

 

21.6 Efrat+Yehonatan

 

 

 

Papers

 

 

Crowdsourcing


Towards Globally Optimal Crowdsourcing Quality Management: The
Uniform Worker Setting. Akash Das Sarma; Aditya Parameswaran;
Jennifer Widom. SIGMOD 2016

CLAMShell: Speeding up Crowds for Low-latency Data Labeling. Daniel
Haas; Jiannan Wang; Eugene Wu; Michael Franklin. VLDB 2016


Data discovery and extraction


Goods: Organizing Google's Datasets. Sudip Roy; Neoklis Polyzotis;
Natasha Noy; Steven Whang; Christopher Olston; Alon Halevy; Flip
Korn. SIGMOD 2016

Extracting Databases from Dark Data with DeepDive. Ce Zhang; Jaeho

Shin; Christopher Re; Michael Cafarella; Feng Niu. SIGMOD 2016

Estimating the Impact of Unknown Unknowns on Aggregate Query
Results. Yeounoh Chung; Tim Kraska; Carsten Binnig; Michael Lind
Mortensen. SIGMOD 2016



Data Cleaning

PrivateClean: Data Cleaning and Differential Privacy. Sanjay
Krishnan; Jiannan Wang; Tim Kraska; Ken Goldberg; Michael Franklin.

SIGMOD 2016

Combining Quantitative and Logical Data Cleaning. Nataliya
Prokoshyna; Jaroslaw Szlichta; Fei Chiang; Renee Miller; Divesh
Srivastava. VLDB 2016

Temporal Rules Discovery for Web Data Cleaning. Ziawasch Abedjan;

Cuneyt Akcora; Mourad Ouzzani; Paolo Papotti; Michael Stonebraker.
VLDB 2016

ActiveClean: Interactive Data Cleaning For Statistical Modeling.
Sanjay Krishnan; Jiannan Wang; Eugene Wu; Michael Franklin; Ken
Goldberg. VLDB 2016


Big Data and Cloud


SparkR: Scaling R Programs with Spark. Shivaram Venkataraman;
Zongheng Yang; Davies Liu; Eric Liang; Hossein Falaki; Xiangrui
Meng; Reynold Xin; Ali Ghodsi; Michael Franklin; Ion Stoica; Matei
Zaharia. SIGMOD 2016

Big Data Analytics with Datalog Queries on Spark. Alexander

Shkapsky; Mohan Yang; Matteo Interlandi; Hsuan Chiu; Tyson Condie;
Carlo Zaniolo. SIGMOD 2016

S2RDF: RDF Querying with SPARQL on Spark. Alexander Schetzle; Martin
Przyjaciel-Zablocki; Simon Skilevic; Georg Lausen. VLDB 2016




Data exploration


FluxQuery: An Execution Framework for Highly Interactive Query
Workloads Roee Ebenstein; Niranjan Kamat; Arnab Nandi. SIGMOD 2016

Query From Examples: An Iterative, Data-Driven Approach to Query
Construction. Hao Li; Chee-Yong Chan; David Maier. VLDB 2016



Similarity, relevance, semantics


SLING: A Near-Optimal Index Structure for SimRank Boyu Tian; Xiaokui
Xiao. SIGMOD 2016

Top-k Relevant Semantic Place Retrieval on Spatial RDF Data Jieming
Shi; Dingming Wu; Nikos Mamoulis. SIGMOD 2016


RDF Graph Alignment with Bisimulation. Peter Buneman; Seawek
Staworko. VLDB 2016


Semantic SPARQL Similarity Search Over RDF Knowledge Graphs. Weiguo
Zheng; Lei Zou; Wei Peng; Xifeng Yan; Shaoxu Song; Dongyan Zhao.
VLDB 2016