Patent · US Active

Building of a web corpus with the help of a reference web crawl

US9529911B2 · kind B2 · utility

65Cited by
1References
7Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 11, 2013
Grant dateDec 27, 2016
Priority date
Expiry dateOct 8, 2033

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/951
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Computer-implemented method for building a web corpus (WCD) comprising the steps of: sending by a web crawler (WC) a query to a reference web crawl agent (RWCA), this query containing a least one identifier of a resource, receiving by the web crawler (WC) a response from the reference web crawl agent (RWCA); if this response does not contain the resource identified by the identifier, downloading by the web crawler (WC) the resource from the website (WS) corresponding to the identifier and adding the resource to the web corpus (WCD; and if this response contains the resource identified by the identifier, adding the resource to the web corpus (WCD).

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.