Seminar – Advanced Topics in Web Data Management

 

Tova Milo, Daniel Deutch, 2018/19

 

Meetings: Tuesdays 10-12

 


Seminar Information

 

The seminar focuses on managing, analyzing, sharing, and integrating data and applications on the web. Areas of interest include

crowdsourcing, data exploration, Big Data, probabilistic data and data provenance. We shall read recent

papers in this area, focusing on several specific issues, and then explore possible future directions. A tentative list of

papers is enclosed.

 

 

Schedule (Sem B)

 

19\3: EDBT Rehersals: Slava, Brit, Amit

 

2\4: ICDE Rehersals: Yuval, Naama, Tomer, Amit

 

30\4: Tomer, Slava, Naama

 

21\5: Shay, Kathy

 

28\5: Gefen, Ori

 

11\6: Uri, Yuval

 

 

Schedule (Sem A)

 

 

4/11  Ori and Tomer W. Subjective Knowledge Base Construction Powered By Crowdsourcing and
Knowledge Base. Hao XinRui Meng, Lei Chen, SIGMOD'18

 

11/11  Slava and Shay RC-index: diversifying answers to range queries, Yue Wang. Alexandra Meliou. Gerome Miklau, VLDB '18 

 

18/11  Amit and Tomer H. The Case for Learned Index Structures, Tim Kraska, Alex Beutel, Ed
Chi, Jeff Dean, Neoklis Polyzotis, SIGMOD'18

 

25/11 No meeting

 

2/12  Gefen and Dvir Scalable Semantic Querying of Text, Xiaolan Wang, Aaron Feng, Behzad Golshan, Alon Halevy,

George Mihaila, Hidekazu Oiwa, Wang-Chiew Tan, VLDB'18

 

9/12 Hanuka

 

16/12 Talk by Raymond Ng

 

23/12  Naama and Nave Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers?, VLDB '18

 

30/12 Yuval and Shevach Provchain: A blockchain-based data provenance architecture in cloud environment with enhanced privacy and availabilityþ

X Liang, S Shetty, D Tosh, C Kamhoua, Socc 2017

 

6/1  Kathy and Rony Navigating the Data Lake with Datamaran: Automatically Extracting Structure from Log Datasets, Yihan GaoSilu Huang, Aditya Parameswaran, SIGMOD'18

 

 

 

Papers

 

 

Advanced query processing

The Case for Learned Index Structures, Tim Kraska, Alex Beutel, Ed
Chi, Jeff Dean, Neoklis Polyzotis, SIGMOD'18



FastQRE: Fast Query Reverse Engineering, Dmitri Kalashnikov, Laks
V.S. Lakshmanan, Divesh Srivastava, SIGMOD'18


Navigating the Data Lake with Datamaran: Automatically Extracting
Structure from Log Datasets, Yihan Gao, Silu Huang, Aditya
Parameswaran, SIGMOD'18

Bias in OLAP Queries: Detection, Explanation, and Removal, Babak
Salimi, Johannes Gehrke, Dan Suciu, SIGMOD'18


The Vadalog System: Datalog-based Reasoning for Knowledge Graphs.
Luigi Bellomarini, Emanuel Sallinger, Georg Gottlob, VLDB'18


LevelHeaded: A Unified Engine for Business Intelligence and Linear
Algebra Querying, Christopher Aberger, Andrew Lamb, Kunle Olukotun,
Christopher Re, ICDE'18




Cleaning, dependencies, and entity resolution


Explaining Repaired Data with CFDs, Joeri Rammelaere, Floris Geerts,
VLDB'18

Efficient Discovery of Approximate Dependencies, Sebastian Kruse,
Felix, VLDB'18

Parallel Reasoning of Graph Functional Dependencies,  Wenfei Fan,
Xueli Liu, Yingjie Cao, ICDE'18

Discovering Graph Functional Dependencies, Wenfei Fan, Chunming Hu,
Xueli Liu, Ping Lu, SIGMOD'18

Entity Matching with Active Monotone Classification, Yufei Tao,
PODS18 (best paper)



Semantics


Scalable Semantic Querying of Text, Xiaolan Wang, Aaron Feng, Behzad
Golshan, Alon Halevy, George Mihaila, Hidekazu Oiwa, Wang-Chiew Tan,
VLDB'18


Seeping Semantics: Linking Datasets using Word Embeddings for Data
Discovery. Raul Castro Fernandez, Essam Mansour, Abdulhakim Qahtan,
Ahmed Elmagarmid, Ihab Ilyas, Samuel Madden, Mourad Ouzzani, Michael
Stonebraker, Nan Tang, ICDE'18



Crowdsourcing


Task Relevance and Diversity as Worker Motivation in Crowdsourcing,
Julien Pilourdault, Sihem Amer-Yahia, Senjuti Basu Roy, Dongwon Lee,
ICDE'18

Knowledge Base Enhancement via Data Facts and Crowdsourcing, Linnan
Jiang, Lei Chen, ZHao Chen, ICDE'18

Incentive-Based Entity Collection using Crowdsourcing, Chengliang
Chai, Ju Fan, Guoliang Li, ICDE'18

Worker Recommendation for Crowdsourced Q&A Services: A Triple-Factor
Aware Approach, Zheng Liu, Lei Chen, VLDB'18

Subjective Knowledge Base Construction Powered By Crowdsourcing and
Knowledge Base. Hao Xin, Rui Meng, Lei Chen, SIGMOD'18

 

Provenance

 

You say 'what', i hear 'where' and 'why': (mis-)interpreting SQL to derive fine-grained provenance

Muller, Dietrich, Grust, VLDB 2018

 

DfAnalyzer: runtime dataflow analysis of scientific applications using provenanceþ

V SilvaD De OliveiraP Valduriez, VLDB 2018

 

Provchain: A blockchain-based data provenance architecture in cloud environment with enhanced privacy and availabilityþ
X Liang
S ShettyD ToshC Kamhoua, VLDB 2017