Patent · US Expired

Method and system for linearly detecting data deviations in a large database

US5813002A · kind A · utility

26Cited by

28References

15Claims

0Family size

Assignee

International Business Machines Corporation · US

Inventors

Rakesh Agrawal · San Jose, US
Andreas Arning · Rottenburg am Neckar, DE

Key dates

Filing date	Jul 31, 1996
Grant date	Sep 22, 1998
Priority date	—
Expiry date	Jul 31, 2016

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY10S707/99936
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method for detecting deviations in a database is disclosed, comprising the steps of: determining respective frequencies of occurrence for the attribute values of the data items, and identifying any itemset whose similarity value satisfies a predetermined criterion as a deviation, based on the frequencies of occurrence. The determination of the frequencies of occurrence includes computing an overall similarity value for the database, and for each first itemset, computing a difference between the overall similarity value and the similarity value of a second itemset. The second itemset has all the data items except those of the first itemset. Preferably, a smoothing factor is used for indicating how much dissimilarity in an itemset can be reduced by removing a subset of items from the itemset. The smoothing factor is evaluated as each item is incrementally removed from the itemset, thereby allowing a data item to be identified as a deviation when the difference if similarity value is the highest.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.