Patent · US Active

Identifying content and content relationship information associated with the content for ingestion into a corpus

US10642935B2 · kind B2 · utility

0Cited by
5References
19Claims
0Family size

Assignee

Inventor

Key dates

Filing dateMay 12, 2014
Grant dateMay 5, 2020
Priority date
Expiry dateJun 27, 2034

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/40
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A mechanism is provided, in a data processing system comprising a processor and a memory configured to implement a natural language processing (NLP) system, for identifying content relationship for content copied by a content identification mechanism. The content identification mechanism identifies content from a website and then identifies relationship content information associated with a current web page where the content is found. The content identification mechanism modifies a file structure associated with the content with the relationship content information. The content identification mechanism identifies one or more classification identifiers in order to classify the content. Finally, the content identification mechanism transmits the content and the file structure to a specific corpus based on the one or more classification identifiers.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.