I am a Ph.D candidate in the Database Group of the Computer Science School at Tel Aviv University under the supervision of Prof. Daniel Deutch. Since August 2018, I am a visiting scholar at Duke University, hosted by Prof. Sudeepa Roy.
I have won the 2017 VLBD Best Paper Award, the 2018 SIGMOD Research Highlight Award, and the 2019 Google Ph.D. Fellowship in Structured Data and Database Management.
I completed both my B.Sc - a double major in Mathematics and Computer Science and my M.Sc in Computer Science (Magna Cum Laude) at Tel Aviv University.

Research

My research focuses on database applications, and the use of provenance for user interactions. I am currently working on the translation of provenance to natural language and the translation of provenance to queries.

Publications

See also DBLP


  • Reverse-Engineering Conjunctive Queries from Provenance Examples
    Daniel Deutch, Amir Gilad, EDBT, 2019
    Abstract
    The provenance of a query result details relevant parts of the input data as well as the computation leading to each output tuple. Multiple lines of work have studied the tracking and presentation of provenance, showing its effectiveness in explaining and justifying query results. The willingness of application owners to share provenance information for these purposes may however be hindered by the resulting exposure of the underlying query logic, which may be proprietary and confidential. We therefore formalize and study the following problem: when a (small) subset of the query results along with their provenance is given, what information is revealed about the underlying query? Our model is based on the provenance semiring framework and applies to many previously proposed provenance models. We analyze two flavors of the problem: (1) how many queries may be consistent with a given provenance example? and (2) what is the complexity of inferring a consistent query, or one that is a “best fit"? Our theoretical analysis shows that there may be many (for some models, even infinitely many in presence of self-joins) consistent queries, yet we provide practically efficient algorithms to find (best-fit) such queries. We experimentally show that the algorithms are generally successful in correctly reverse engineering queries, even when given only a few output examples and their provenance.
  • NLProveNAns: Natural Language Provenance for Non-Answers
    Daniel Deutch, Nave Frost, Amir Gilad, Tomer Haimovich, PVLDB 11(12), 2018
    Abstract
    Natural language (NL) interfaces to databases allow users without technical background to query the database and get the results. Users of such systems may be surprised by the absence of certain expected results. To this end, we propose to demonstrate NLProveNAns, a system that allows non-expert users to view explanations for non-answers of interest. The explanations are shown in an intuitive manner, by highlighting parts of the original NL query that are intuitively “responsible” for the absence of the expected result. Our solution builds upon and combines recent advancements in Natural Language Interfaces to Databases and models for why-not provenance. In particular, the systems can provide explanations in one of two flavors corresponding to two different why-not provenance models: a short explanation based on the frontier picky model, and a detailed explanation based on the why-not polynomial model.
  • QuestPro: Queries in SPARQL Through Provenance
    Efrat Abramovitz, Daniel Deutch, Amir Gilad, PVLDB 11(12), 2018
    Abstract
    We propose to demonstrate QuestPro, a prototype interactive system aimed at allowing non-expert users to specify SPARQL queries. Notably, QuestPro makes an extensive use of provenance in deriving the SPARQL queries, in two ways. First, we ask users to provide example output nodes along with explanations that are then treated as the provenance of the underlying query, guiding the system’s search for a fitting query. We have designed an intuitive interface through which users can gradually build their explanations while understanding the connections between the different objects. The system then generates a set of candidate queries and uses provenance to explain each candidate, prompting user feedback to choose between them. We will demonstrate the usability of QuestPro using an ontology of academic publications, engaging the audience in the interactive process while explaining the under-the-hood model and algorithms.
  • Natural Language Explanations for Query Results (SIGMOD Research Highlight Award)
    Daniel Deutch, Nave Frost, Amir Gilad, SIGMOD Record 47(1), 2018
    Abstract
    Multiple lines of research have developed Natural Language (NL) interfaces for formulating database queries. We build upon this work, but focus on presenting a highly detailed form of the answers in NL. The answers that we present are importantly based on the provenance of tuples in the query result, detailing not only the results but also their explanations. We develop a novel method for transforming provenance information to NL, by leveraging the original NL query structure. Furthermore, since provenance information is typically large and complex, we present two solutions for its effective presentation as NL text: one that is based on provenance factorization, with novel desiderata relevant to the NL case, and one that is based on summarization.
  • Interactive Inference of SPARQL Queries Using Provenance
    Efrat Abramovitz, Daniel Deutch, Amir Gilad, ICDE, 2018
    Abstract
    Inference of queries from their output examples has been extensively studied in multiple contexts as means to ease the formulation of queries. In this paper we propose a novel approach for the problem, based on provenance. The idea is to use provenance in two manners: first as an additional information that is associated with the given examples and explains their rationale; and then again as a way to show users a description of the difference between candidate queries, prompting their feedback. We have implemented the framework in the context of simple graph patterns and union thereof, and demonstrate its effectiveness in the context of multiple ontologies.
  • Efficient Provenance Tracking For Datalog Using Top-K Queries
    Daniel Deutch, Amir Gilad, Yuval Moskovitch, VLDB J. 27(2), 2018
    Abstract
    Highly expressive declarative languages, such as datalog, are now commonly used to model the operational logic of data-intensive applications. The typical complexity of such datalog programs, and the large volume of data that they process, call for result explanation. Results may be explained through the tracking and presentation of data provenance, defined here as the set of derivation trees of a given fact. While informative, the size of such full provenance information is typically too large and complex (even when compactly represented) to allow displaying it to the user. To this end, we propose a novel top-k query language for querying datalog provenance, supporting selection criteria based on tree patterns and ranking based on the rules and database facts used in derivation. We propose an efficient novel algorithm that computes in polynomial data complexity a compact representation of the top-k trees which may be explicitly constructed in linear time with respect to their size. We further experimentally study the algorithm performance, showing its scalability even for complex datalog programs where full provenance tracking is infeasible.
  • Provenance for Non-Experts
    Daniel Deutch, Nave Frost, Amir Gilad, IEEE Data Engineering Bulletin 41(1), 2018
    Abstract
    The flourish of data-intensive systems that are geared towards direct use by non-experts, such as Natural Language question answering systems and query-by-example frameworks calls for the incorporation of provenance management. Provenance is arguably even more important for such systems than for “classic” database application. This is due to the elevated level of uncertainty associated with the typical ambiguity of user specification (e.g. phrasing questions in Natural Language or through examples). Existing provenance solutions are not geared towards the non-experts, and the typical complexity and size of their instances render them ill-suited for this goal. We outline in this paper our ongoing research and preliminary results, addressing these challenges towards developing provenance solutions that serve to explain computation results to non-expert users.
  • Provenance for Natural Language Queries (Best Paper Award)
    Daniel Deutch, Nave Frost, Amir Gilad, PVLDB 10(5), 2017
    Abstract
    Multiple lines of research have developed Natural Language (NL) interfaces for formulating database queries. We build upon this work, but focus on presenting a highly detailed form of the answers in NL. The answers that we present are importantly based on the provenance of tuples in the query result, detailing not only the results but also their explanations. We develop a novel method for transforming provenance information to NL, by leveraging the original NL query structure. Furthermore, since provenance information is typically large and complex, we present two solutions for its effective presentation as NL text: one that is based on provenance factorization, with novel desiderata relevant to the NL case, and one that is based on summarization. We have implemented our solution in an end-to-end system supporting questions, answers and provenance, all expressed in NL. Our experiments, including a user study, indicate the quality of our solution and its scalability.
  • NLProv: Natural Language Provenance
    Daniel Deutch, Nave Frost, Amir Gilad, PVLDB 9(13), 2016
    Abstract
    We propose to present NLProv: an end-to-end Natural Language (NL) interface for database queries. Previous work has focused on interfaces for specifying NL questions, which are then compiled into queries in a formal language (e.g. SQL). We build upon this work, but focus on presenting a detailed form of the answers in Natural Language. The answers that we present are importantly based on the provenance of tuples in the query result, detailing not only which are the results but also their explanations. We develop a novel method for transforming provenance information to NL, by leveraging the original NL question structure. Furthermore, since provenance information is typically large, we present two solutions for its effective presentation as NL text: one that is based on provenance factorization with novel desiderata relevant to the NL case, and one that is based on summarization.
  • QPlain: Query By Explanation
    Daniel Deutch, Amir Gilad, ICDE, 2016
    Abstract
    To assist non-specialists in formulating database queries, multiple frameworks that automatically infer queries from a set of input and output examples have been proposed. While highly useful, a shortcoming of the approach is that if users can only provide a small set of examples, many inherently different queries may qualify. We observe that additional information about the examples, in the form of their explanations, is useful in significantly focusing the set of qualifying queries. We propose to demonstrate QPlain, a system that learns conjunctive queries from examples and their explanations. We capture explanations of different levels of granularity and detail, by leveraging recently developed models for data provenance. Explanations are fed through an intuitive interface, are compiled to the appropriate provenance model, and are then used to derive proposed queries. We will demonstrate that it is feasible for non-specialists to provide examples with meaningful explanations, and that the presence of such explanations result in a much more focused set of queries which better match user intentions.
  • Selective Provenance for Datalog Programs Using Top-K Queries
    Daniel Deutch, Amir Gilad, Yuval Moskovitch, PVLDB 8(12), 2015
    Abstract
    Highly expressive declarative languages, such as datalog, are now commonly used to model the operational logic of dataintensive applications. The typical complexity of such datalog programs, and the large volume of data that they process, call for result explanation. Results may be explained through the tracking and presentation of data provenance, and here we focus on a detailed form of provenance (how-provenance), defining it as the set of derivation trees of a given fact. While informative, the size of such full provenance information is typically too large and complex (even when compactly represented) to allow displaying it to the user. To this end, we propose a novel top-k query language for querying datalog provenance, supporting selection criteria based on tree patterns and ranking based on the rules and database facts used in derivation. We propose an effi- cient novel algorithm based on (1) instrumenting the datalog program so that, upon evaluation, it generates only relevant provenance, and (2) efficient top-k (relevant) provenance generation, combined with bottom-up datalog evaluation. The algorithm computes in polynomial data complexity a compact representation of the top-k trees which may be explicitly constructed in linear time with respect to their size. We further experimentally study the algorithm performance, showing its scalability even for complex datalog programs where full provenance tracking is infeasible.
  • Towards web-scale how-provenance
    Daniel Deutch, Amir Gilad, Yuval Moskovitch, ICDE Workshops, 2015
    Abstract
    The annotation of data with meta-data, and its propagation through data-intensive computation in a way that follows the transformations that the data undergoes (“how-provenance”), has many applications, including explanation of the computation results, assessing their trustworthiness and proving their correctness, evaluation in presence of incomplete or probabilistic information, view maintenance, etc. As data gets bigger, its transformations become more complex, and both are being relegated to the cloud, the role of provenance in these applications is even more crucial. But at the same time, the overhead incurred due to provenance computation, in terms of time, space and communication, may limit the scalability of how-provenance management systems. We envision an approach for addressing this complex problem, through allowing selective tracking of how-provenance, where the selection criteria are partly based on the meta-data itself. We illustrate use-cases in the web context, and highlight some challenges in this respect.
  • selP: Selective tracking and presentation of data provenance
    Daniel Deutch, Amir Gilad, Yuval Moskovitch, ICDE, 2015
    Abstract
    Highly expressive declarative languages, such as Datalog, are now commonly used to model the operational logic of data-intensive applications. The typical complexity of such Datalog programs, and the large volume of data that they process, call for the tracking and presentation of data provenance. Provenance information is crucial for explaining and justifying the Datalog program results. However, the size of full provenance information is in many cases too large (and its concise representations are too complex) to allow its presentation to the user. To this end, we propose a demonstration of selP, a system that allows the selective presentation of provenance, based on user-specified top-k queries. We will demonstrate the usefulness of selP using a real-life program and data, in the context of Information Extraction.

Teaching

  • Extended Introduction to Computer Science (2016-2018)

  • Workshop: Google Technologies (2016)

Contact & CV

  • Blavatnik School of Computer Science, Schreiber Building, Databases Lab, M-20
  • CV: