Patent · US Active

Systems and methods for optimizing very large n-gram collections for speed and memory

US8572126B2 · kind B2 · utility

15Cited by
2References
16Claims
0Family size

Assignee

Inventor

Key dates

Filing dateJun 24, 2011
Grant dateOct 29, 2013
Priority date
Expiry dateJun 24, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/40
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A computer memory stores a data structure representing a ternary search tree (TST) representing multiple word n-grams for a corpus of documents. The data structure includes plural records in a first memory, each record representing a node of the TST and comprising plural fields. At least some n-grams have a sequence of units. The plurality of fields includes one for identifying a given unit of the sequence for a given node, one reserved for storing payload information for the given node, and plural child fields reserved for storing information for a first, second and third child nodes of the given node. The child fields store a null value indicating the absence of the child node or an identifier identifying a memory location of the child node. For at least one record, at least one of the child fields stores an identifier identifying a memory location of a memory different than the first memory.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.