Privacy-preserving text language identification using homomorphic encryption
US9288039B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 1, 2014 |
| Grant date | Mar 15, 2016 |
| Priority date | — |
| Expiry date | Dec 1, 2034 |
Classification
- Technology area (CPC H)Electricity
- CPC primaryH04L9/008
- WIPO fieldDigital communication
- WIPO sectorElectrical engineering
Abstract
A system and method for text language identification allow private information of a server and a client to be kept secret from each other. An encrypted score for each of a plurality of languages is received by the server from the client. The encrypted scores are generated by homomorphic addition of encrypted frequencies of n-grams in a list of n-grams extracted from text. The unencrypted list is not provided to the server. The encrypted frequencies of the n-grams in the list are extracted using encrypted resources which, for each of the plurality of languages, include an encrypted frequency for each of a set of n-grams. At the server, the encrypted scores are decrypted to generate unencrypted scores and information is provided to the client based on the unencrypted scores from which the client is able to identify a language for the text.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.