My research focuses on web data management (in social networks, web applications, forums, etc.), database applications, and the use of provenance for user interactions. I am currently working on the translation of provenance to natural language and the translation of provenance to queries.
See also DBLP
Provenance for Natural Language Queries (Best Paper Award)
Daniel Deutch, Nave Frost, Amir Gilad, PVLDB 10(5), 2017Abstract
Multiple lines of research have developed Natural Language (NL) interfaces for formulating database queries. We build upon this work, but focus on presenting a highly detailed form of the answers in NL. The answers that we present are importantly based on the provenance of tuples in the query result, detailing not only the results but also their explanations. We develop a novel method for transforming provenance information to NL, by leveraging the original NL query structure. Furthermore, since provenance information is typically large and complex, we present two solutions for its e ective presentation as NL text: one that is based on provenance factorization, with novel desiderata relevant to the NL case, and one that is based on summarization. We have implemented our solution in an end-to-end system supporting questions, answers and provenance, all expressed in NL. Our experiments, including a user study, indicate the quality of our solution and its scalability
NLProv: Query By Explanation
Daniel Deutch, Nave Frost, Amir Gilad, PVLDB 9(13), 2016Abstract
We propose to present NLProv: an end-to-end Natural Language (NL) interface for database queries. Previous work has focused on interfaces for specifying NL questions, which are then compiled into queries in a formal language (e.g. SQL). We build upon this work, but focus on presenting a detailed form of the answers in Natural Language. The answers that we present are importantly based on the provenance of tuples in the query result, detailing not only which are the results but also their explanations. We develop a novel method for transforming provenance information to NL, by leveraging the original NL question structure. Furthermore, since provenance information is typically large, we present two solutions for its effective presentation as NL text: one that is based on provenance factorization with novel desiderata relevant to the NL case, and one that is based on summarization.
QPlain: Query By Explanation
Daniel Deutch, Amir Gilad, ICDE, 2016Abstract
To assist non-specialists in formulating database queries, multiple frameworks that automatically infer queries from a set of input and output examples have been proposed. While highly useful, a shortcoming of the approach is that if users can only provide a small set of examples, many inherently different queries may qualify. We observe that additional information about the examples, in the form of their explanations, is useful in significantly focusing the set of qualifying queries. We propose to demonstrate QPlain, a system that learns conjunctive queries from examples and their explanations. We capture explanations of different levels of granularity and detail, by leveraging recently developed models for data provenance. Explanations are fed through an intuitive interface, are compiled to the appropriate provenance model, and are then used to derive proposed queries. We will demonstrate that it is feasible for non-specialists to provide examples with meaningful explanations, and that the presence of such explanations result in a much more focused set of queries which better match user intentions.
- Selective Provenance for Datalog Programs Using Top-K Queries
Daniel Deutch, Amir Gilad, Yuval Moskovitch, PVLDB 8(12), 2015Abstract
Highly expressive declarative languages, such as datalog, are now commonly used to model the operational logic of dataintensive applications. The typical complexity of such datalog programs, and the large volume of data that they process, call for result explanation. Results may be explained through the tracking and presentation of data provenance, and here we focus on a detailed form of provenance (how-provenance), defining it as the set of derivation trees of a given fact. While informative, the size of such full provenance information is typically too large and complex (even when compactly represented) to allow displaying it to the user. To this end, we propose a novel top-k query language for querying datalog provenance, supporting selection criteria based on tree patterns and ranking based on the rules and database facts used in derivation. We propose an effi- cient novel algorithm based on (1) instrumenting the datalog program so that, upon evaluation, it generates only relevant provenance, and (2) efficient top-k (relevant) provenance generation, combined with bottom-up datalog evaluation. The algorithm computes in polynomial data complexity a compact representation of the top-k trees which may be explicitly constructed in linear time with respect to their size. We further experimentally study the algorithm performance, showing its scalability even for complex datalog programs where full provenance tracking is infeasible.
- Towards web-scale how-provenance
Daniel Deutch, Amir Gilad, Yuval Moskovitch, ICDE Workshops, 2015Abstract
The annotation of data with meta-data, and its propagation through data-intensive computation in a way that follows the transformations that the data undergoes (“how-provenance”), has many applications, including explanation of the computation results, assessing their trustworthiness and proving their correctness, evaluation in presence of incomplete or probabilistic information, view maintenance, etc. As data gets bigger, its transformations become more complex, and both are being relegated to the cloud, the role of provenance in these applications is even more crucial. But at the same time, the overhead incurred due to provenance computation, in terms of time, space and communication, may limit the scalability of how-provenance management systems. We envision an approach for addressing this complex problem, through allowing selective tracking of how-provenance, where the selection criteria are partly based on the meta-data itself. We illustrate use-cases in the web context, and highlight some challenges in this respect.
- selP: Selective tracking and presentation of data provenance
Daniel Deutch, Amir Gilad, Yuval Moskovitch, ICDE, 2015Abstract
Highly expressive declarative languages, such as Datalog, are now commonly used to model the operational logic of data-intensive applications. The typical complexity of such Datalog programs, and the large volume of data that they process, call for the tracking and presentation of data provenance. Provenance information is crucial for explaining and justifying the Datalog program results. However, the size of full provenance information is in many cases too large (and its concise representations are too complex) to allow its presentation to the user. To this end, we propose a demonstration of selP, a system that allows the selective presentation of provenance, based on user-specified top-k queries. We will demonstrate the usefulness of selP using a real-life program and data, in the context of Information Extraction.