Patent · US Expired

Using canonical forms to develop a dictionary of names in a text

US5832480A · kind A · utility

82Cited by

4References

17Claims

0Family size

Assignee

International Business Machines Corporation · US

Inventors

Roy J. Byrd · Ossining, US
Misook A. Choi · Campion Road, US
Yael Ravin · Mount Kisco, US
Faye Nina Wacholder · Roslyn Heights, US

Key dates

Filing date	Jul 12, 1996
Grant date	Nov 3, 1998
Priority date	—
Expiry date	Jul 12, 2016

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY10S707/99935
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Descriptive canonical forms of entity types are created by scanning one or more documents in a database of a computer system to identify one or more proper names that appear in the documents as raw names. Each of the raw names has zero or more proper names, zero or more medial substrings, zero or more leading substrings, and zero or more trailing substrings. The raw names of one or more documents are "cleaned" and "split" until certain "cleaning and splitting conditions" are no longer met to obtain a list of clean and split candidate names. Anchor names are selected from the list that unambiguously represent an entity type. The anchor names have one or more entity-type attribute values. Variant names, clean and split candidate names having one or more shared attribute (values) with the anchor name, are combined with the anchor name to create an equivalence group of names that refer to the same entity. A canonical form is generated for the group from a subset of the anchor name attributes. A canonical form is created in this manner for all of the clean and split candidate names on the list.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.