De-duplicating transaction records using targeted fuzzy matching
US12287767B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 30, 2024 |
| Grant date | Apr 29, 2025 |
| Priority date | — |
| Expiry date | Jan 30, 2044 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/412
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A computer-implemented method is disclosed. The method includes obtaining, by a de-duplication server, a candidate pair of a plurality of digitally stored documents from a document database. Text elements are identified from each digitally stored document in the candidate pair in response, and the text elements are stored as document extraction attributes. The method then automatically computes and stores relative positional differences of the text elements between each digitally stored document of the candidate pair and a document similarity score based on the relative positional differences. The relative positional differences are compared with a similarity function to form a difference similarity vector for the candidate pair. The difference similarity vector comprises components corresponding to each relative positional difference. The components of the difference similarity vector are aggregated to determine a final score for the candidate pair. A document-level similarity metric is determined from the final score. The method includes determining whether the final score is above a cutoff value, and in response to determining that the final score for the candidate pair is above…
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.