Sign-up for our newsletter
MAIN
Event Calendar
Awardee Reports
ABOUT DIACOMP
Citing DiaComp
Contact
Committees
Institutions
Awardee Reports
Publications
Bioinformatics
RESOURCES
Protocols & Methods
Reagents & Resources
Mouse Diet
Breeding Schemes
Validation Criteria
IMPC / KOMP Data
Publications
Bioinformatics
CONTACT
PARTICIPANT AREA
Login
▹
Publications
▹
Home
Publication
A modified hyper plane clustering algorithm allows for efficient and accurate
clustering of extremely large datasets.
Authors
Sharma A, Podolsky R, Zhao J, McIndoe RA.
Submitted By
Richard McIndoe on 3/6/2009
Status
Published
Journal
Bioinformatics (Oxford, England)
Year
2009
Date Published
5/1/2009
Volume : Pages
Not Specified
:
Not Specified
PubMed Reference
19261720
Abstract
Motivation: As the number of publically available microarray experiments
increases, the ability to analyze extremely large data sets across multiple
experiments becomes critical. There is a requirement to develop algorithms which
are fast and can cluster extremely large datasets without affecting the cluster
quality. Clustering is an unsupervised exploratory technique applied to
microarray data to find similar data structures or expression patterns. Because
of the high I/O costs involved and large distance matrices calculated, most of
the algomerative clustering algorithms fail on large datasets (30,000+
genes/200+ arrays). In this paper we propose a new two-stage algorithm which
partitions the high dimensional space associated with microarray data using
hyper planes. The first stage is based on the BIRCH (Balanced Iterative Reducing
and Clustering using Hierarchies) algorithm with the second stage being a
conventional k-Means clustering technique. This algorithm has been implemented
in a software tool (HPCluster) designed to cluster gene expression data. We
compared the clustering results using the two stage hyper plane algorithm with
the conventional k-Means algorithm from other available programs. Because the
first stage traverses the data in a single scan, the performance and speed
increases substantially. The data reduction accomplished in the first stage of
the algorithm reduces the memory requirements allowing us to cluster 44,460
genes without failure and significantly decreases the time to complete when
compared to popular k-Means programs. The software was written in C# (.NET 1.1).
Availability: The program is freely available and can be downloaded from
http://www.amdcc.org/bioinformatics.
Investigators with authorship
Name
Institution
Richard McIndoe
Augusta University
Complications
All Complications
Bioinformatics
Bone
Cardiomyopathy
Cardiovascular
Gastro-Intestinal (GI)
Nephropathy
Neuropathy & Neurocognition
Pediatric Endocrinology
Retinopathy
Uropathy
Wound Healing
Welcome to the DiaComp Login / Account Request Page.
Email Address:
Password:
Note: Passwords are case-sensitive.
Please save my Email Address on this machine.
Not a member?
If you are a funded DiaComp investigator, a member of an investigator's lab,
or an External Scientific Panel member to the consortium, please
request an account.
Forgot your password?
Enter your Email Address and
click here.
ERROR!
There was a problem with the page:
User Info
User Confirm
Please acknowledge all posters, manuscripts or scientific materials that were generated in part or whole using funds from the Diabetic Complications Consortium(DiaComp) using the following text:
Financial support for this work provided by the NIDDK Diabetic Complications Consortium (RRID:SCR_001415, www.diacomp.org), grants DK076169 and DK115255
Citation text and image have been copied to your clipboard. You may now paste them into your document. Thank you!