A Corrected Criterion for Selecting the Optimum Number of Principal Components
AbstractDetermining the optimum number of components to be retained is a key problem in principal component analysis (PCA). Besides the rule of thumb estimates there exist several sophisticated methods for automatically selecting the dimensionality of the data. Based on the probabilistic PCA model Minka (2001) proposed an approximate Bayesian model selection criterion. In this paper we correct this criterion and present a modified version. We compare the novel criterion with various other approaches in a simulation
study. Furthermore, we use it for finding the optimum number of principal components in hyper-spectral skin cancer images.
Basilevsky, A. (1994). Statistical Factor Analysis and Related Methods. New York: Wiley.
Bishop, C. (2008). Pattern Recognition and Machine Learning. Berlin: Springer.
Cichocki, A., and Amari, S. (2002). Adaptive Blind Signal and Image Processing. Chichester: Wiley.
James, A. (1954). Normal Multivariate Analysis and the Orthogonal Group. Annals of Mathematical Statistics, 25, 40-75.
Kazianka, H. (2007). Classification Techniques for Hyper-Spectral Medical Image Data. Unpublished master’s thesis, University of Klagenfurt.
Kazianka, H., Leitner, R., and Pilz, J. (2008). Segmentation and Classification of Hyper-Spectral Skin Data. In C. Preisach, H. Burkhardt, L. Schmidt-Thieme, and R. Decker (Eds.), Data Analysis, Machine Learning and Applications (p. 245-252). Berlin: Springer.
Khatri, C., and Mardia, K. (1977). The von Mises-Fisher Distribution in Orientation Statistics. Journal of the Royal Statistical Society, Series B, 39, 95-106.
Leonowicz, Z., Karvanen, J., Tanaka, T., and Rezmer, J. (2004). Model Order Selection Criteria: Comparative Study and Applications. In Proceedings of the VIth International Workshop CPEE 2004 (p. 193-196). Warsaw: University of Technology.
Lindley, D. (1980). Approximate Bayesian Statistics Methods. In J. Bernardo, M. de Groot, D. Lindley, and A. Smith (Eds.), Bayesian Statistics (p. 223-237). Valencia: University Press.
Minka, T. (2001). Automatic Choice of Dimensionality for PCA. Advances in Neural Information Processing Systems, 13, 598-604.
Pearson, K. (1901). On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine, 2, 559-572.
Smidl, V., and Quinn, A. (2005). The Variational Bayes Method in Signal Processing. Berlin: Springer.
Tipping, M., and Bishop, C. (1999). Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society, Series B, 61, 611-622.
How to Cite
The Austrian Journal of Statistics publish open access articles under the terms of the Creative Commons Attribution (CC BY) License.
The Creative Commons Attribution License (CC-BY) allows users to copy, distribute and transmit an article, adapt the article and make commercial use of the article. The CC BY license permits commercial and non-commercial re-use of an open access article, as long as the author is properly attributed.
Copyright on any research article published by the Austrian Journal of Statistics is retained by the author(s). Authors grant the Austrian Journal of Statistics a license to publish the article and identify itself as the original publisher. Authors also grant any third party the right to use the article freely as long as its original authors, citation details and publisher are identified.
Manuscripts should be unpublished and not be under consideration for publication elsewhere. By submitting an article, the author(s) certify that the article is their original work, that they have the right to submit the article for publication, and that they can grant the above license.