Distributed database job data skew detection
US10713250B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 13, 2015 |
| Grant date | Jul 14, 2020 |
| Priority date | — |
| Expiry date | Aug 6, 2036 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/3346
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system and method for identifying whether data skew is causing delays in a map phase and/or a reduce phase of a query of a distributed database. The system and method identify the values of various metrics relating to a database query. These metrics include map phase and reduce phase durations and various related metrics. The system and method gather statistics of multiple queries to determine correlation levels between the metrics and the map phase and reduce phase durations. Based on the statistics, the system and method determine whether one or both of the map and reduce phases for a query/response are taking longer than expected. If the durations are longer than expected, the system identifies the delay as caused by data skew and informs the originator of the query.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.