Patent · US Active

Open information extraction from the web

US8938410B2 · kind B2 · utility

10Cited by

1References

25Claims

0Family size

Assignee

University of Washington through its Center for Commercialization · US

Inventors

Michael J. Cafarella · Seattle, US
Michele Banko · Seattle, US
Oren Etzioni · Seattle, US

Key dates

Filing date	Dec 16, 2010
Grant date	Jan 20, 2015
Priority date	—
Expiry date	Apr 16, 2031

Classification

Technology area (CPC G)Physics
CPC primaryG06Q10/02
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

To implement open information extraction, a new extraction paradigm has been developed in which a system makes a single data-driven pass over a corpus of text, extracting a large set of relational tuples without requiring any human input. Using training data, a Self-Supervised Learner employs a parser and heuristics to determine criteria that will be used by an extraction classifier (or other ranking model) for evaluating the trustworthiness of candidate tuples that have been extracted from the corpus of text, by applying heuristics to the corpus of text. The classifier retains tuples with a sufficiently high probability of being trustworthy. A redundancy-based assessor assigns a probability to each retained tuple to indicate a likelihood that the retained tuple is an actual instance of a relationship between a plurality of objects comprising the retained tuple. The retained tuples comprise an extraction graph that can be queried for information.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.