Patent · US Expired

Method for organizing semi-structured data into a taxonomy, based on tag-separated clustering

US7502765B2 · kind B2 · utility

8Cited by
11References
12Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 21, 2005
Grant dateMar 10, 2009
Priority date
Expiry dateDec 21, 2025

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F18/231
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method organizes semi-structured data into a taxonomy, based on Tag-Separated (TS) clustering. The method comprises retrieving documents including the semi-structured data. The semi-structured data comprises structured data including structured data fields and tags, and unstructured data. The method selects a structured attribute type including any of a categorical attribute, a numerical attribute, and a tag associated with annotated text, and an unstructured attribute type including a text attribute. The method clusters the semi-structured data from the retrieved documents into a plurality of clusters based on the selected structured attribute type and the selected unstructured attribute type. For a categorical attribute, each category corresponds to a single cluster. For a numerical attribute, a clustering algorithm clusters numerical data projected onto a range of the numerical attribute. For an annotated text attribute, a monothetic clustering algorithm clusters annotated text data according to tags associated with a vocabulary for the annotated text data.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.