Similarity clustering in linear time with error-free retrieval using signature overlap with signature size matching
US9753964B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 19, 2017 |
| Grant date | Sep 5, 2017 |
| Priority date | — |
| Expiry date | Jan 19, 2037 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F2218/00
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for a processing device to determine whether to assign a data item to at least one cluster of data items is disclosed. The processing device may identify a signature of the data item, the signature including a set of elements. The processing device derive a first size value of the number of elements of the identified signature based on a set of size values of signatures that includes a maximum size value representing the largest number of elements in a signature. The processing device may derive a second size value of the number of elements of a second signature that is similar to the identified signature based on the set of size values of signatures. The processing device may select a subset of the set of elements of the identified signature to form at least one partial signature of the identified signature wherein the number of elements in the partial signature represents the number of elements in common between a signature having the first size value and a second similar signature having the second size value. The processing device may combine the selected subset of elements into at least one token. The processing device may determine whether the at least one token is p…
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.