Patent · US Active

Techniques for generating a training data set for a machine-learning model

US12299581B1 · kind B1 · utility

0Cited by

2References

20Claims

0Family size

Assignee

AMAZON TECHNOLOGIES, INC. · US

Inventors

Patrick Ian Wilson · Renton, US
Dmitry Zhiyanov · Redmond, US
Lichao Wang · Redmond, US
Jong-Wan Kim · Seongnam-si, KR
Srinivas K Yellala · Bellevue, US

Key dates

Filing date	Mar 26, 2021
Grant date	May 13, 2025
Priority date	—
Expiry date	Feb 28, 2044

Classification

Technology area (CPC G)Physics
CPC primaryG06N20/00
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Systems and methods are provided herein for generating a synthetic training data set that can be used to train a machine-learning model to identify when two addresses match (e.g., when a user-defined address and an authoritative address match). The addresses may each be tokenized. Each candidate address can be scored based on a number of common tokens it shares with the user-defined address. The highest-scored candidate address may be selected as a matching address for the user-defined address. In some embodiments, a number of the remaining candidate address can be selected as negative examples (e.g., candidate addresses that do not match the user-defined address) based on, for example, historical delivery information associated with the corresponding addresses. In this manner, an expansive training data set may be generated using addresses associated with user profiles of an online service provider and a set of authoritative addresses obtained from an authoritative source.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.