Method and system for computationally identifying clusters within a set of sequences
US6109776A · kind A · utility
Assignee
Inventor
Key dates
| Filing date | Apr 21, 1998 |
| Grant date | Aug 29, 2000 |
| Priority date | — |
| Expiry date | Apr 21, 2018 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG16B30/00
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and system for computationally analyzing an initial set of patterns in order to identify subsets of patterns, called clusters, that contain common sub-patterns. The patterns of the initial set of patterns are represented as linear sequences of subunits, and the common sub-patterns occur as sub-sequences of subunits within the linear sequences starting at different positions within the different linear sequences. Variations in the offset and in the sequence of subunits within a common sub-pattern are considered in the analysis. In one embodiment, an initial set of oligonucleotide sequences that are produced by various biochemical techniques are computationally analyzed to identify clusters that may correspond to a number of different binding sites for DNA-binding proteins within one or more double-stranded DNA duplexes. The method places each oligonucleotide sequence within a new cluster and calculates an initial information weight matrix for that cluster. Then, other sequences from the initial set of sequences are added to the cluster and the information weight matrix of the cluster is re-computed until the information content of the information weight matrix falls below a…
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.