Patent · US Expired

Method and apparatus for partitioning a database upon a timestamp, support values for phrases and generating a history of frequently occurring phrases

US6308172A · kind A · utility

64Cited by

4References

9Claims

0Family size

Assignee

International Business Machines Corporation · US

Inventors

Rakesh Agrawal · San Jose, US
Ramakrishnan Srikant · San Jose, US
Brian Lent · Bellevue, US

Key dates

Filing date	Jul 6, 1999
Grant date	Oct 23, 2001
Priority date	—
Expiry date	Jul 6, 2019

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY10S707/99953
WIPO fieldBasic materials chemistry
WIPO sectorChemistry

Abstract

A method and apparatus for mining text databases, employing sequential pattern phrase identification and shape queries, to discover trends. The method passes over a desired database using a dynamically generated shape query. Documents within the database are selected based on specific classifications and user defined partitions. Once a partition is specified, transaction IDs are assigned to the words in the text documents depending on their placement within each document. The transaction IDs encode both the position of each word within the document as well as representing sentence, paragraph, and section breaks, and are represented in one embodiment as long integers with the sentence boundaries. A maximum and minimum gap between words in the phrases and the minimum support all phrases must meet for the selected time period may be specified. A generalized sequential pattern method is used to generate those phrases in each partition that meet the minimum support threshold. The shape query engine takes the set of phrases for the partition of interest and selects those that match a given shape query. A query may take the form of requesting a trend such as "recent upwards trend", "recen…

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.