Patent · US Active

Citation record extraction system and method

US8429520B2 · kind B2 · utility

0Cited by
2References
13Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 12, 2010
Grant dateApr 23, 2013
Priority date
Expiry dateAug 2, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/137
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A citation record extraction system is provided for extracting citation records from publication list pages having different layouts and contents. An HTML rendering engine receives a publication list web page, parses the publication list web page to obtain layout information of the web page. A web page sequence builder generates a web page characteristic sequence for the web page according to the layout information. A web page repeated pattern analyzer analyzes repeated patterns presented in the web page characteristic sequence, screens out non-citation records therefrom, and obtains a citation record of the publication list web page.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.