Patent · US Expired

Automatic extraction of metadata using a neural network

US6044375A · kind A · utility

183Cited by

23References

16Claims

0Family size

Assignee

Hewlett-Packard Company, L.P. · US

Inventors

Oded Shmueli · New York, US
Darryl Greig · Haifa, IL
Carl Staelin · Haifa, IL
Tami Tamir · Haifa, IL

Key dates

Filing date	Apr 30, 1998
Grant date	Mar 28, 2000
Priority date	—
Expiry date	Apr 30, 2018

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY10S707/99943
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method of automatically extracting metadata from a document. The method of the invention provides a computer readable document that includes blocks comprised of words, an authority list that includes common uses of a set of words, and a neural network trained to extract metadata from groupings of data called compounds. Compounds are created with one compound describing each of the blocks. Each compound includes the words making up the block, descriptive information about the blocks, and authority information associated with some of the words. The descriptive information may include such items as bounding box information, describing the size and position of the block, and font information, describing the size and type of font the words of the block use. The authority information is located by comparing each the words from the block to the authority list. The compounds are processed through the neural network to generate metadata guesses including word guesses, compound guesses and document guesses along with confidence factors associated with the guesses indicating the likelihood that each of the guesses is correct. The method may additionally include providing a document knowledg…

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.