System and method for modelling and profiling in multiple languages
US9026542B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 23, 2010 |
| Grant date | May 5, 2015 |
| Priority date | — |
| Expiry date | Jun 6, 2031 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/337
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system and method for generating feature vectors of documents in different languages are provided. The feature vectors provide scores associated with keywords defined in a base language for use by a profiler for generating or updating a user profile. The system and method use a plurality of keyword sets comprising: a base language keyword set comprising a plurality of base language keywords each associated with a respective identifier (ID); and a second language keyword set comprising a plurality of second language keywords each corresponding in meaning to a respective one of the base language keywords and associated with the ID of the corresponding base language keyword. One of a plurality of tokenizers is selected to parse a document based on the language of the document and to generate the feature vector using the keyword set of the corresponding language.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.