Method for organizing semi-structured data into a taxonomy, based on tag-separated clustering
US7502765B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 21, 2005 |
| Grant date | Mar 10, 2009 |
| Priority date | — |
| Expiry date | Dec 21, 2025 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F18/231
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method organizes semi-structured data into a taxonomy, based on Tag-Separated (TS) clustering. The method comprises retrieving documents including the semi-structured data. The semi-structured data comprises structured data including structured data fields and tags, and unstructured data. The method selects a structured attribute type including any of a categorical attribute, a numerical attribute, and a tag associated with annotated text, and an unstructured attribute type including a text attribute. The method clusters the semi-structured data from the retrieved documents into a plurality of clusters based on the selected structured attribute type and the selected unstructured attribute type. For a categorical attribute, each category corresponds to a single cluster. For a numerical attribute, a clustering algorithm clusters numerical data projected onto a range of the numerical attribute. For an annotated text attribute, a monothetic clustering algorithm clusters annotated text data according to tags associated with a vocabulary for the annotated text data.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.