A Corrected Criterion for Selecting the Optimum Number of Principal Components

Authors

  • Hannes Kazianka Institute of Statistics, University of Klagenfurt
  • Jürgen Pilz Institute of Statistics, University of Klagenfurt

DOI:

https://doi.org/10.17713/ajs.v38i3.268

Abstract

Determining the optimum number of components to be retained is a key problem in principal component analysis (PCA). Besides the rule of thumb estimates there exist several sophisticated methods for automatically selecting the dimensionality of the data. Based on the probabilistic PCA model Minka (2001) proposed an approximate Bayesian model selection criterion. In this paper we correct this criterion and present a modified version. We compare the novel criterion with various other approaches in a simulation
study. Furthermore, we use it for finding the optimum number of principal components in hyper-spectral skin cancer images.

References

Basilevsky, A. (1994). Statistical Factor Analysis and Related Methods. New York: Wiley.

Bishop, C. (2008). Pattern Recognition and Machine Learning. Berlin: Springer.

Cichocki, A., and Amari, S. (2002). Adaptive Blind Signal and Image Processing. Chichester: Wiley.

James, A. (1954). Normal Multivariate Analysis and the Orthogonal Group. Annals of Mathematical Statistics, 25, 40-75.

Kazianka, H. (2007). Classification Techniques for Hyper-Spectral Medical Image Data. Unpublished master’s thesis, University of Klagenfurt.

Kazianka, H., Leitner, R., and Pilz, J. (2008). Segmentation and Classification of Hyper-Spectral Skin Data. In C. Preisach, H. Burkhardt, L. Schmidt-Thieme, and R. Decker (Eds.), Data Analysis, Machine Learning and Applications (p. 245-252). Berlin: Springer.

Khatri, C., and Mardia, K. (1977). The von Mises-Fisher Distribution in Orientation Statistics. Journal of the Royal Statistical Society, Series B, 39, 95-106.

Leonowicz, Z., Karvanen, J., Tanaka, T., and Rezmer, J. (2004). Model Order Selection Criteria: Comparative Study and Applications. In Proceedings of the VIth International Workshop CPEE 2004 (p. 193-196). Warsaw: University of Technology.

Lindley, D. (1980). Approximate Bayesian Statistics Methods. In J. Bernardo, M. de Groot, D. Lindley, and A. Smith (Eds.), Bayesian Statistics (p. 223-237). Valencia: University Press.

Minka, T. (2001). Automatic Choice of Dimensionality for PCA. Advances in Neural Information Processing Systems, 13, 598-604.

Pearson, K. (1901). On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine, 2, 559-572.

Smidl, V., and Quinn, A. (2005). The Variational Bayes Method in Signal Processing. Berlin: Springer.

Tipping, M., and Bishop, C. (1999). Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society, Series B, 61, 611-622.

Published

2016-04-03

How to Cite

Kazianka, H., & Pilz, J. (2016). A Corrected Criterion for Selecting the Optimum Number of Principal Components. Austrian Journal of Statistics, 38(3), 135–150. https://doi.org/10.17713/ajs.v38i3.268

Issue

Section

Articles