Citation record extraction system and method
US8429520B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 12, 2010 |
| Grant date | Apr 23, 2013 |
| Priority date | — |
| Expiry date | Aug 2, 2031 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/137
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A citation record extraction system is provided for extracting citation records from publication list pages having different layouts and contents. An HTML rendering engine receives a publication list web page, parses the publication list web page to obtain layout information of the web page. A web page sequence builder generates a web page characteristic sequence for the web page according to the layout information. A web page repeated pattern analyzer analyzes repeated patterns presented in the web page characteristic sequence, screens out non-citation records therefrom, and obtains a citation record of the publication list web page.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.