Deduplicating records received from multiple data sources
US12182088B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 15, 2023 |
| Grant date | Dec 31, 2024 |
| Priority date | — |
| Expiry date | Sep 15, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/2379
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method includes generating a plurality of pages from a plurality of records received from a plurality of data sources. Deduplication of the plurality of pages is facilitated based on a plurality of page metadata of the plurality of pages based on, for the each page of the plurality of pages. A filtered set of potentially-intersecting pages is identified for each given page as a proper subset of the plurality of pages stored in the page storage system based on first comparison parameters, and an intersecting set of pages that include a row number intersection with the given page is identified as a proper subset of the filtered set of potentially-intersecting pages based on second comparison parameters. Records with records with row numbers included in row number intersections with other pages in the intersecting set of pages are removed from the each page.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.