Patent · US Expired

Using canonical forms to develop a dictionary of names in a text

US5832480A · kind A · utility

82Cited by
4References
17Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 12, 1996
Grant dateNov 3, 1998
Priority date
Expiry dateJul 12, 2016

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99935
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Descriptive canonical forms of entity types are created by scanning one or more documents in a database of a computer system to identify one or more proper names that appear in the documents as raw names. Each of the raw names has zero or more proper names, zero or more medial substrings, zero or more leading substrings, and zero or more trailing substrings. The raw names of one or more documents are "cleaned" and "split" until certain "cleaning and splitting conditions" are no longer met to obtain a list of clean and split candidate names. Anchor names are selected from the list that unambiguously represent an entity type. The anchor names have one or more entity-type attribute values. Variant names, clean and split candidate names having one or more shared attribute (values) with the anchor name, are combined with the anchor name to create an equivalence group of names that refer to the same entity. A canonical form is generated for the group from a subset of the anchor name attributes. A canonical form is created in this manner for all of the clean and split candidate names on the list.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.