Patent · US Active

Methods and apparatus for identification and analysis of temporally differing corpora

US10847144B1 · kind B1 · utility

0Cited by
18References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 15, 2015
Grant dateNov 24, 2020
Priority date
Expiry dateSep 15, 2035

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L15/19
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Differences are identified, at the lexical unit and/or phrase level, between time-varying corpora. A corpus for a time period of interest is compared with a reference corpus. N-grams are generated for both the corpus of interest and reference corpus. Numbers of occurrences are counted. An average number of occurrences, for each n-gram of the reference corpus, is determined. A difference value, between number of occurrences in corpus of interest and average number of occurrences, is determined. Each difference value is normalized. N-grams can be selected for display, or for further processing, on the basis of the normalized difference value. Further processing can include selecting a sample period. A plurality of reference corpora are produced, where a begin time, for each sub-corpus of the plurality of reference corpora, differs, from a begin time for the corpus of interest, by an integer multiple of the sample period. Word Cloud visualization is shown.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.