ParaSAM is an application for significance analysis of microarrays using|
This software is distributed under the Open Software License v3.0 agreement
To download the software please read and accept the license here
We designed the software so it can be deployed on most Windows operating systems. The applications are written for the .NET Framework v1.1 using the C# programming language. The parallel nature of the application comes from the use of a webservice to perform the distance calculations and cluster assignments. Because we use a webservice, it is essential that at least one computer has Internet Information Services (IIS v.5 or better) installed and running.
The application was designed in a modular fashion to provide both deployment flexibility
as well as flexibility in the user interface. The application is made of three software
- A Web Service - this software component is the main computation workhorse
and resides on the "compute nodes". Data and the cluster centroids are sent to the
web service where the distance calculations and cluster assignments are performed.
Of note, is that once the data is sent to the web service it never leaves.
- A Main API - this software component is a .NET dynamic link library (DLL)
used by the user interfaces to orchestrate the activities of the compute nodes.
The API is responsible for managing the ThreadPool and working with the web services
to perform the K Means clustering algorithm across the compute nodes. The API provides
all the methods and properties necessary. We will provide documentation to the API
in case anyone wants to use it in another application.
- User Interfaces - this software component provides the actual application
that the user interacts with to run the programs. We provide two different user
interfaces, a stand alone windows application and a web application.
- ParaSAM stand-alone windows application. The windows application can be installed
on any windows machine regardless of whether or not IIS is installed. This application
provides easy file management, compute node management, program options and a results
window for data viewing and saving.
- ParaSAM web application. This interface requires IIS to be installed on the
computer. The web application provides the same functionality as the stand alone,
but requires that each set of data to be analyzed be uploaded to the server.
Although the software was created using this modular design, the end user only needs
to concern themselves with the web service and which user interface they want to
The basic steps involved in the ParaSAM algorithm:
- The user opens or uploads the data to be analyzed.
- The user selects whether to cluster genes, arrays or both.
- The user selects the number of clusters and compute nodes to use in the algorithm.
- The user selects the method to initialize the centroids for the first round.
- The algorithm partitions the data based on the number of nodes used.
- The algorithm creates an array of web proxies used to connect to the compute nodes.
- The algorithm initializes the centroids based on the method selected by the user.
- The algorithm asychronously sends the data and the initial centroids to the compute
- Each compute node calculates the Euclidian Distance matrix and assigns the data
on that node to each of the cluster centroids.
- Once all the compute nodes finish the cluster assignments, the performance function
for that node is returned and summed across all nodes. The summed performance function
is used to calculate the new centroids.
- The algorithm sends the new centroids back to the compute nodes for another round
- The algorithm ends when the performance function does not change between rounds.