Tel-Aviv University - Computer Science Colloquium

Sunday, May 28, 2006, 11:15-12:15

Room 309
Schreiber Building


Ronen Feldman

Bar Ilan University

Title: Improving Self-Supervised Relation Extraction from the Web




Web extraction systems attempt to use the immense amount of unlabeled

text in the Web in order to create large lists of entities and

relations. Unlike traditional IE methods, the Web extraction systems

do not label every mention of the target entity or relation, instead

focusing on extracting as many different instances as possible while

keeping the precision of the resulting list reasonably high. SRES is a

self-supervised Web relation extraction system that learns powerful

extraction patterns from unlabeled text, using short descriptions of

the target elations and their attributes. SRES automatically generates

the training data needed for its pattern-learning component. The

performance of SRES is further enhanced by classifying its output

instances using the properties of the extracted patterns. The features

we use for classification and the trained classification model are

independent from the target relation, which we demonstrate in a series

of experiments. We also compare the performance of SRES to the

performance of the state-of-the-art KnowItAll system, and to the

performance of its pattern learning component, which uses a simpler

and less powerful pattern language than SRES.