Seminar – Advanced Topics in Web Data Management

 

Tova Milo, Daniel Deutch, 2015/16

 

Meetings: Tuesdays 18-20, Kaplun 324

 


Seminar Information

 

The seminar focuses on managing, analyzing, sharing, and integrating data and applications on the web. Areas of interest include

crowdsourcing, data exploration, Big Data, probabilistic data and data provenance. We shall read recent

papers in this area, focusing on several specific issues, and then explore possible future directions. A tentative list of

papers is enclosed.

 

Schedule (Sem. B)

 

15.3 Slava, Nave

 

29.3 Yizhak

 

5.4 Ahmad, Amit

 

3.5 Yuval, Amir

 

10.5 Brit, Shevah

 

24.5 Efrat, Yonatan, Matan

 

7.6 Chai, Eyal, Elian

 

Schedule (Sem. A)

 

27.10 Slava , "Argonaut: Macrotask Crowdsourcing for Complex Data Processing"

 

3.11 Tomer, "TransactiveDB: Tapping into Collective Human Memories"

 

10.11 Amit,  "Efficient Top-K SimRank-based Similarity Join"

 

24.11 Yizhak, "Preference-aware Integration of Temporal Data"

 

1.12 Ahmad, "Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning"

 

8.12   Amir, "Association Rules with Graph Patterns"

 

15.12  Nave, "Linearized and Single-Pass Belief Propagation"

 

22.12  Yehonatan, "Incremental Knowledge Base Construction Using DeepDive"

 

29.12   Brit, "The Importance of Being Expert: Efficient Max-Finding in Crowdsourcing"

             Matan,  "Worker Skill Estimation in Team-Based Tasks"

 

12.1    Elian, Relational Data Processing in Spark

            Shevah, JetScope: Reliable and Interactive Analytics at Cloud Scale Rethinking Data-Intensive Science

 

 

 

Papers

 

 

CROWDSOURCING

 Argonaut: Macrotask Crowdsourcing for Complex Data Processing
 
Daniel Haas,Jason Ansel,Lydia Gu,Adam Marcus, VLDB 2015 (Industrial
 track)

 TransactiveDB: Tapping into Collective Human Memories
 
Michele Catasta, Alberto Tonon, Djellel Eddine Difallah, Gianluca Demartini,
 Karl Aberer, Philippe Cudré-Mauroux, VLDB 2015

 Worker Skill Estimation in Team-Based Tasks
 Habibur Rahman, Saravanan Thirumuruganathan, Senjuti Basu Roy, Sihem
 Amer-Yahia, Gautam Das, VLDB 2015

 Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning
 
Barzan Mozafari, Purna Sarkar, Michael Franklin, Michael Jordan, Sam
 Madden, VLDB 2015

 Hear the Whole Story: Towards the Diversity of Opinion in Crowdsourcing Markets
 
Ting Wu, Lei Chen, Pan Hui, CHEN ZHANG, Weikai Li, VLDB 2015

  The Importance of Being Expert: Efficient Max-Finding in Crowdsourcing
  Aris Anagnostopoulos, Luca Becchetti, Adriano Fazzone, Ida Mele, Matteo
  Riondato, SIGMOD 2015

 

 


GRAPH PROCESSING

Association Rules with Graph Patterns
 Wenfei Fan, Xin Wang, Yinghui Wu, Jingbo Xu, VLDB 2015

 Efficient Top-K SimRank-based Similarity Join
 Wenbo Tao, Minghe Yu, Guoliang Li, VLDB 2015

  Efficient Enumeration of Maximal k-Plexes
  Devora Berlowitz, Sara Cohen, Benny Kimelfeld, SIGMOD 2015

 

PROVENANCE

 

Dynamic provenance for SPARQL Updates,

Harry Halpin and James Cheney, ISWC 2014

 

Linearized and Single-Pass Belief Propagation

Wolfgang Gatterbauer, Stephan Günnemann, Danai Koutra, Christos Faloutsos, VLDB 2015

 

Answering Why-not Questions on Reverse Top-k Queries

Yunjun Gao, Qing Liu, Gang Chen, Baihua Zheng, Linlin Zhou, VLDB 2015




INFORMATION INTEGRATION

Preference-aware Integration of Temporal Data
 Bogdan Alexe, Mary Roth, Wang-Chiew Tan, VLDB 2015

 Enriching Data Imputation with Extensive Similarity Neighbors
 
Shaoxu Song, Aoqian Zhang, Lei Chen, Jianmin Wang, VLDB 2015

 Incremental Knowledge Base Construction Using DeepDive
 
Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, Christopher
 Re, VLDB 2015

 A Declarative Framework for Linking Entities
 
Douglas Burdick, Ronald Fagin, Phokion Kolaitis, Lucian Popa, Wang-Chiew
 Tan, ICDT 2015 (Best paper award)

 

  JetScope: Reliable and Interactive Analytics at Cloud Scale
  Eric Boutin, Paul Brett, Xiaoyu Chen, Jaliya Ekanayake, Tao Guan,
  Anna Korsun, Zhicheng Yin, Nan Zhang, Jingren Zhou, VLDB 2015

   Rethinking Data-Intensive Science Using Scalable Analytics Systems
   Frank Austin Nothaft, Matt Massie, Timothy Danford, Zhao Zhang, Uri Laserson,
   Carl Yeksigian, Jey Kottalam, Arun Ahuja, Jeff Hammerbacher, Michael
   Linderman, Michael J. Franklin, Anthony D. Joseph, David A.
   Patterso, SIGMOD 2015 (Insustrial)

 

 

MISCELLANEOUS

 

    Spark SQL: Relational Data Processing in Spark
   
Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley,
    Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, Matei
    Zaharia, SIGMOD 2015

     Mining Subjective Properties on the Web
    Immanuel Trummer, Alon Halevy, Hongrae Lee, Sunita Sarawagi, Rahul
    Gupta, SIGMOD 2015

 

   Making Queries Tractable on Big Data with Preprocessing

Wenfei Fan, Floris Geerts, Frank Neven, VLDB 2013  

PVLDB 6(9): 685-696 (2013)