The K-Means algorithm requires the user to enter the number of clusters to search for. Often this is not known ahead of time, so MPCluster has the ability to estimate the number of clusters present.
There are a couple of drawbacks with this estimation, so it is not performed automatically. First, the cluster estimation can take a long time to process. Cluster estimation is heavily multi-threaded so that it can take advantage of modern multi-processor PCs. Despite this, it can still take a while to calculate, and a single processor PC could still take an hour to calculate an estimate.
Secondly, due to the stochastic nature of the underlying cluster analysis, you will find the estimates vary slightly. Due to these drawbacks, the estimation function should be considered slightly experimental. Future versions will almost certainly include improvements in the quality of the estimation and processing speed.
Before estimating the number of clusters, you should set the Input Data and Cluster Options parameters on the main MPCluster dialog box. These will be used during the estimation process and changing them may change the number of clusters that can be found. After these parameters have been set, press the Estimate button on the main MPCluster dialog box:
This will display the Estimate the Number of Clusters dialog box:
MPCluster requires a range of clusters to analyze. Enter the minimum and maximum values of the range to search. The processing time is directly related to maximum ("To") value. Setting this too high will greatly increase the processing time. Try to keep it as low as possible.
The calculated estimate will include any fixed clusters that you may have defined. Therefore the minimum ("From") will be set to the number of pre-defined fixed clusters, plus one. I.e. there must be at least one non-fixed cluster.
After the range has been set, press the Start Estimation button to start the estimation process. Progress will be indicated with the following progress dialog box:
The cluster table building stage is the longest. This will tend to start quickly before slowing down. The actual estimation process completes very quickly after the table has been built.
After the estimation process has completed, the Calculated Estimate value is set in the dialog box:
Press Use this estimate to close the dialog box and to use the estimate for number of clusters in the main cluster calculation. Press Cancel to close the dialog box and discard the estimate.