Systems and methods for de novo assembly of nucleotide sequence reads using a modified string graph
US11557374B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 13, 2018 |
| Grant date | Jan 17, 2023 |
| Priority date | — |
| Expiry date | Nov 18, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG16B50/00
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Systems and methods to automatically de novo assemble a set of unordered read sequences into one or more, larger nucleotide sequences are presented. The method involves first creating two identical sets of the reads, dividing each read in both sets into smaller sorted mer sequences and then comparing the mers for each read in set 1 to the mers from each read in set 2 to exhaustively identify overlapping segments. Overlap information is used to construct a modified assembly string graph, traversal of which produces a sorted string graph layout file consisting of all the reads ordered left to right including their approximate starting offset position. The sorted string graph layout file is then processed by a novel multiple sequence alignment system that uses mer matches between all the overlapping reads at a given position to place matching individual bases from each read into columns from which an overall consensus sequence is determined.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.