[Fwd: eigenvectors differences with matlab on high dimensional data.]
- Subject: [Fwd: eigenvectors differences with matlab on high dimensional data.]
- From: Ronald Boisvert <boisvert@nist.gov>
- Date: Tue, 08 Jun 2010 09:35:55 -0400
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=iso-8859-1; format=flowed
- User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
Sender: Miguel Garcia Torres <mgarciat@ull.es>
Subject: eigenvectors differences with matlab on high dimensional data.
Hi all,
I am using Jama to perform PCA via SVD decomposition. I am testing the
results
with those obtained with Matlab and for large data size (number of
points >=
number of variables), I get the same results as Matlab (very very good
precision).
But if the number of variables is greater than the examples, the last
component
of the eigenvalues has a poor precision.
To clarify, if A is the matrix with [m,n] (where m correspond to the
number of
points and n to the number of variables). If m>=n, then the eigenvectors
correspond
to the columns of V. In this case the values correspond to Matlab results.
If m < n, then I tanspose the Matrix to perform the decomposition and so the
eigenvectors correspond to the columns of U. In this case, the last
element of
the eigenvectors has a values wich is far from that obtained in Matlab.
You can download the matrices from
_http://webpages.ull.es/users/mgarciat/pca_hd.tgz_
This archive contains the following files:
pca_hd_p20.csv -> correspond to the high dimensional data (hd) with 20
decimals.
pca_hd_eigenvectors.csv -> the eigenvectors obtained with Matlab.
pca_hd_eigenvaluess.csv -> the eigenvalues obtained with Matlab.
When I compare I get some differences like:
[604,6]: expected: 0.015720974334793 -0.004136528914528481
[604,10]: expected: 0.052244404045946 -0.046277291821854284
[604,11]: expected: -0.021269147636881 -0.0011694770500378223
[604,12]: expected: 0.052382661544416 0.010050589203392619
[604,13]: expected: -0.021673352215208 0.012748585718630039
[604,17]: expected: 0.022376430522196 -0.01744958012069039
I would be very grateful if someone could check it and explained if these
differences correspond to an error or not. Although I am writing
some methods (in Java), I could send the code if someone request it for
testing
Thanks you in advance,
MiguelGT
PS. Here I attach some code in Java
-----------------To read csv file into an array of
doubles-----------------------
private static double[][] readMatrix(String fname) throws Exception {
double[][] data = null;
BufferedReader br = new BufferedReader(new FileReader(fname));
String line = null;
List<double[]> lst = new ArrayList<double[]>();
while ((line = br.readLine()) != null) {
String[] svalues = line.split(",");
double[] row = new double[svalues.length];
for (int i = 0; i < svalues.length; i++) {
row[i] = Double.parseDouble(svalues[i]);
}
lst.add(row);
}
//
data = new double[lst.size()][];
for (int i = 0; i < lst.size(); i++) {
data[i] = lst.get(i);
}
return data;
}
---------------------------------------------------------------------------------------------
------To obtain the mean values of each column----------
public static double[] columnMeans(double[][] data) {
//variable mean
double[] mean = new double[data[0].length];
for (int e = 0; e < mean.length; e++) {
mean[e] = 0.;
}
for (int r = 0; r < data.length; r++) {
for (int c = 0; c < data[r].length; c++) {
mean[c] += data[r][c];
}
}
for (int e = 0; e < mean.length; e++) {
mean[e] /= (double) data.length;
}
return mean;
}
------------------------------------------------------------------------
----To center the data-----------------------
public static double[][] centerData(final double[][] data, double[] mean) {
double[][] cdata = new double[data.length][];
for (int r = 0; r < data.length; r++) {
cdata[r] = new double[data[r].length];
for (int c = 0; c < data[r].length; c++) {
cdata[r][c] = data[r][c] - mean[c];
}
}
return cdata;
}
------------------------------------------------------------------
Date Index |
Thread Index |
Problems or questions? Contact list-master@nist.gov