Methods and apparatus for identification and analysis of temporally differing corpora
US9135243B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 15, 2013 |
| Grant date | Sep 15, 2015 |
| Priority date | — |
| Expiry date | Aug 20, 2033 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L15/19
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Differences are identified, at the lexical unit and/or phrase level, between time-varying corpora. A corpus for a time period of interest is compared with a reference corpus. N-grams are generated for both the corpus of interest and reference corpus. Numbers of occurrences are counted. An average number of occurrences, for each n-gram of the reference corpus, is determined. A difference value, between number of occurrences in corpus of interest and average number of occurrences, is determined. Each difference value is normalized. N-grams can be selected for display, or for further processing, on the basis of the normalized difference value. Further processing can include selecting a sample period. A plurality of reference corpora are produced, where a begin time, for each sub-corpus of the plurality of reference corpora, differs, from a begin time for the corpus of interest, by an integer multiple of the sample period. Word Cloud visualization is shown.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.