Patent · US Active

Language agnostic machine learning model for title standardization

US11610109B2 · kind B2 · utility

0Cited by

2References

20Claims

0Family size

Assignee

MICROSOFT TECHNOLOGY LICENSING, LLC · US

Inventors

Sebastian Alexander Csar · New York, US
Uri Merhav · Rehovot, IL
Dan Shacham · Sunnyvale, US

Key dates

Filing date	Sep 26, 2018
Grant date	Mar 21, 2023
Priority date	—
Expiry date	Dec 14, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG06N20/20
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

In an example embodiment, a system is provided whereby a machine learning model is trained to predict a standardization for a given raw title. A neural network may be trained whose input is a raw title (such as a query string) and a list of candidate titles (either title identifications in a taxonomy, or English strings), which produces a probability that the raw title and each candidate belong to the same title. The model is able to standardize titles in any language included in the training data without first having to perform language identification or normalization of the title. Additionally, the model is able to benefit from the existence of “loan words” (words adopted from a foreign language with little or no modification) and relations between languages.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.