Seminar – Managing Information on the Web

Seminar – Advanced Topics in Web Data Management

Tova Milo, Daniel Deutch, 2015/16

Meetings: Tuesdays 18-20, Kaplun 324

Seminar Information

The seminar focuses on managing, analyzing, sharing, and integrating data and applications on the web. Areas of interest include

crowdsourcing, data exploration, Big Data, probabilistic data and data provenance. We shall read recent

papers in this area, focusing on several specific issues, and then explore possible future directions. A tentative list of

papers is enclosed.

Schedule (Sem. B)

15.3 Slava, Nave

29.3 Yizhak

5.4 Ahmad, Amit

3.5 Yuval, Amir

10.5 Brit, Shevah

24.5 Efrat, Yonatan, Matan

7.6 Chai, Eyal, Elian

Schedule (Sem. A)

27.10 Slava , "Argonaut: Macrotask Crowdsourcing for Complex Data Processing"

3.11 Tomer, "TransactiveDB: Tapping into Collective Human Memories"

10.11 Amit, "Efficient Top-K SimRank-based Similarity Join"

24.11 Yizhak, "Preference-aware Integration of Temporal Data"

1.12 Ahmad, "Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning"

8.12 Amir, "Association Rules with Graph Patterns"

15.12 Nave, "Linearized and Single-Pass Belief Propagation"

22.12 Yehonatan, "Incremental Knowledge Base Construction Using DeepDive"

29.12 Brit, "The Importance of Being Expert: Efficient Max-Finding in Crowdsourcing"

Matan, "Worker Skill Estimation in Team-Based Tasks"

12.1 Elian, Relational Data Processing in Spark

Shevah, JetScope: Reliable and Interactive Analytics at Cloud Scale Rethinking Data-Intensive Science

Papers

CROWDSOURCING

Argonaut: Macrotask Crowdsourcing for Complex Data Processing
Daniel Haas,Jason Ansel,Lydia Gu,Adam Marcus, VLDB 2015 (Industrial
track)

TransactiveDB: Tapping into Collective Human Memories
Michele Catasta, Alberto Tonon, Djellel Eddine Difallah, Gianluca Demartini,
Karl Aberer, Philippe Cudré-Mauroux, VLDB 2015

Worker Skill Estimation in Team-Based Tasks
Habibur Rahman, Saravanan Thirumuruganathan, Senjuti Basu Roy, Sihem
Amer-Yahia, Gautam Das, VLDB 2015

Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning
Barzan Mozafari, Purna Sarkar, Michael Franklin, Michael Jordan, Sam
Madden, VLDB 2015

Hear the Whole Story: Towards the Diversity of Opinion in Crowdsourcing Markets
Ting Wu, Lei Chen, Pan Hui, CHEN ZHANG, Weikai Li, VLDB 2015

The Importance of Being Expert: Efficient Max-Finding in Crowdsourcing
Aris Anagnostopoulos, Luca Becchetti, Adriano Fazzone, Ida Mele, Matteo
Riondato, SIGMOD 2015

GRAPH PROCESSING

Association Rules with Graph Patterns
Wenfei Fan, Xin Wang, Yinghui Wu, Jingbo Xu, VLDB 2015

Efficient Top-K SimRank-based Similarity Join
Wenbo Tao, Minghe Yu, Guoliang Li, VLDB 2015

Efficient Enumeration of Maximal k-Plexes
Devora Berlowitz, Sara Cohen, Benny Kimelfeld, SIGMOD 2015

PROVENANCE

Dynamic provenance for SPARQL Updates,

Harry Halpin and James Cheney, ISWC 2014

Linearized and Single-Pass Belief Propagation

Wolfgang Gatterbauer, Stephan Günnemann, Danai Koutra, Christos Faloutsos, VLDB 2015

Answering Why-not Questions on Reverse Top-k Queries

Yunjun Gao, Qing Liu, Gang Chen, Baihua Zheng, Linlin Zhou, VLDB 2015

INFORMATION INTEGRATION

Preference-aware Integration of Temporal Data
Bogdan Alexe, Mary Roth, Wang-Chiew Tan, VLDB 2015

Enriching Data Imputation with Extensive Similarity Neighbors
Shaoxu Song, Aoqian Zhang, Lei Chen, Jianmin Wang, VLDB 2015

Incremental Knowledge Base Construction Using DeepDive
Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, Christopher
Re, VLDB 2015

A Declarative Framework for Linking Entities
Douglas Burdick, Ronald Fagin, Phokion Kolaitis, Lucian Popa, Wang-Chiew
Tan, ICDT 2015 (Best paper award)

JetScope: Reliable and Interactive Analytics at Cloud Scale
Eric Boutin, Paul Brett, Xiaoyu Chen, Jaliya Ekanayake, Tao Guan,
Anna Korsun, Zhicheng Yin, Nan Zhang, Jingren Zhou, VLDB 2015

Rethinking Data-Intensive Science Using Scalable Analytics Systems
Frank Austin Nothaft, Matt Massie, Timothy Danford, Zhao Zhang, Uri Laserson,
Carl Yeksigian, Jey Kottalam, Arun Ahuja, Jeff Hammerbacher, Michael
Linderman, Michael J. Franklin, Anthony D. Joseph, David A.
Patterso, SIGMOD 2015 (Insustrial)

MISCELLANEOUS

Spark SQL: Relational Data Processing in Spark
Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley,
Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, Matei
Zaharia, SIGMOD 2015

Mining Subjective Properties on the Web
Immanuel Trummer, Alon Halevy, Hongrae Lee, Sunita Sarawagi, Rahul
Gupta, SIGMOD 2015

Making Queries Tractable on Big Data with Preprocessing

Wenfei Fan, Floris Geerts, Frank Neven, VLDB 2013

PVLDB 6(9): 685-696 (2013)