[Fwd: Need advice about Principal Component Analysis (PCA)]
- Subject: [Fwd: Need advice about Principal Component Analysis (PCA)]
- From: Ron Boisvert <boisvert@nist.gov>
- Date: Tue, 26 Jun 2007 13:11:57 -0400
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=ISO-8859-1; format=flowed
- User-Agent: Thunderbird 1.5.0.12 (Windows/20070509)
Sender: "Jim Cant" <cant_jim@hotmail.com>
Subject: Need advice about Principal Component Analysis (PCA)
Hi,
I have a few questions (below) about Principal Component Analysis (PCA)
which
I am hoping someone will help me with. I ask this because I'm trying two
PCA packages* and they give different results. My fear is that I
know just enough to be dangerous.
I haven't been able to find answers either on-line or in any of the
linear algebra books in our library.
Thanks for your help; it is greatly appreciated.
Jim Cant
I apologize for the length of the questions; I opted for clarity rather than
brevity.
1. Under what conditions are 2 sets of eigenvectors and associated
eigenvalues considered equal?
My hunch is that
1. If all corresponding eigenvectors are the same scalar
multiple of each other.
AND
2. If the ratio of corresponding eigenvalues from each set is
the same, i.e. are scalar multiples of each other
THEN
The results are equivalent.
1b. What if #1 is relaxed to say that each pair of corresponding
eigenvector are scalar multiples but the multiplier differs
for each pair?
1c. What if the multiplier is the same for all pairs but sometimes
differs in sign?
2. When calculating the covariance matrix, does one use the
deviation of each observations from the mean of all
observations for the feature or the mean of all observations
over all features. From what I read, the first is the correct
approach but these two packages seem to differ.
3. Does the order of the calculated eigenvectors have any
significance?. It seems they are often returned sorted by
eigenvalue. I ask because in my data, each feature is an image
taken at a particular time interval after an perturbation giving
the data an inherent ordering. I'm concerned that if I consider
the data after sorting, that it may be difficult to 'attribute' an
eigenvector to a particular underlying cause (if the sort order
changes).
4. Can anyone point me to some data with the results of eigenvalue
analysis for the data? This would help a lot in testing. Even
better, is there a way to programmatically generate test data where
the eigenvectors/values are known?
5. Are there any other packages to do PCA that you'd recommend?
* The two packages are
JAMA from NIST (http://math.nist.gov/javanumerics/jama/)
and
BIJ, Bio-medical Imaging in Java (http://bij.isi.uu.nl/)
I can only get the BIJ to agree with the JAMA if the raw data has
only 2 features and the mean of the observations is 0 (before
analysis.) Looking at the BIJ source code, it appears that when
calculating the covariance matrix, the deviations are taken with
respect to the mean of all observations. (Also, the calculation of the
mean itself appears suspect.)
Date Index |
Thread Index |
Problems or questions? Contact list-master@nist.gov