Searching multilingual documents based on document structure extraction
US10691734B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 21, 2017 |
| Grant date | Jun 23, 2020 |
| Priority date | — |
| Expiry date | Aug 5, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/58
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
An approach is provided for searching multilingual documents. Structure components are extracted from multilingual documents. Based on the extracted components, the documents are grouped into classifications including respective sets of documents expressed in different respective natural languages. A natural language in a query is detected. One of the documents is selected based on the document having content indicated by the query and the natural language of the document matching the detected natural language. Structure components of the selected document are extracted. Based on the extracted structure components of the selected document, one of the classifications is identified as including the selected document. Other document(s) in the classification are identified and presented as having content that matches the content of the selected document. The natural language(s) of the other document(s) are each different from the natural language of the selected document.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.