Exploring Compositional Data with the CoDa-Dendrogram
DOI:
https://doi.org/10.17713/ajs.v40i1&2.202Abstract
Within the special geometry of the simplex, the sample space of compositional data, compositional orthonormal coordinates allow the application of any multivariate statistical approach. The search for meaningful coordinates has suggested balances (between two groups of parts)—based on a sequential binary partition of a D-part composition—and a representation in form of a CoDa-dendrogram. Projected samples are represented in adendrogram-like graph showing: (a) the way of grouping parts; (b) the explanatory
role of subcompositions generated in the partition process; (c) the decomposition of the variance; (d) the center and quantiles of each balance. The representation is useful for the interpretation of balances and to describe the sample in a single diagram independently of the number of parts. Also, samples of two or more populations, as well as several samples from the same population, can be represented in the same graph, as long as they have
the same parts registered. The approach is illustrated with an example of food consumption in Europe.
References
Aitchison, J. (1986). The Statistical Analysis of Compositional Data. London: Chapman and Hall. (Reprinted 2003 with additional material by The Blackburn Press)
Aitchison, J. (1997). The one-hour course in compositional data analysis or compositional data analysis is simple. In V. Pawlowsky-Glahn (Ed.), Proceedings of IAMG’97 – The III Annual Conference of the International Association for Mathematical
Geology (Vols. I, II and addendum, p. 3-35)). Barcelona: International Center for Numerical Methods in Engineering (CIMNE).
Aitchison, J., and Egozcue, J. J. (2005). Compositional data analysis: where are we and where should we be heading? Mathematical Geology, 37, 829-850.
Aitchison, J., and Greenacre, M. (2002). Biplots for compositional data.
Barceló-Vidal, C., Martín-Fernández, J. A., and Pawlowsky-Glahn, V. (2001). Mathematical foundations of compositional data analysis. In G. Ross (Ed.), Proceedings of IAMG’01 – the VII annual conference of the international association for mathematical
geology (p. 20). Cancun: Kansas Geological Survey.
Billheimer, D., Guttorp, P., and Fagan, W. (2001). Statistical interpretation of species composition. Journal of the American Statistical Association, 96, 1205-1214.
Egozcue, J. J. (2009). Reply to “On the Harker variation diagrams; . . . ” by J. A. Cortés. Mathematical Geosciences, 41, 829-834.
Egozcue, J. J., and Pawlowsky-Glahn, V. (2005a). CoDa-dendrogram: a new exploratory tool. In G. Mateu-Figueras and C. Barceló-Vidal (Eds.), Compositional Data Analysis Workshop - CoDaWork’05, Proceedings. Girona: Universitat de Girona.
(http://ima.udg.es/Activitats/CoDaWork05/)
Egozcue, J. J., and Pawlowsky-Glahn, V. (2005b). Groups of parts and their balances in compositional data analysis. Mathematical Geology, 37, 795-828.
Egozcue, J. J., and Pawlowsky-Glahn, V. (2006). Exploring compositional data with the CoDa-dendrogram. In E. Pirard, A. Dassargues, and H. B. Havenith (Eds.), Proceedings of IAMG’06 – The XI Annual Conference of the International Association
for Mathematical Geology. Liège: University of Liège, Belgium, CD-ROM.
Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G., and Barceló-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35, 279-300.
Kolmogorov, A. N., and Fomin, S. V. (1957). Elements of the Theory of Functions and Functional Analysis (Vols. I+II). Mineola, NY: Dover Publications, Inc.
Pawlowsky-Glahn, V. (2003). Statistical modelling on coordinates. In S. Thió-Henestrosa and A. Martín-Fernández (Eds.), CoDaWork’03 – Proceedings. Girona: Universitat de Girona. (http://ima.udg.es/Activitats/CoDaWork03/)
Pawlowsky-Glahn, V., and Egozcue, J. J. (2001). Geometric approach to statistical analysis on the simplex. Stochastic Environmental Research and Risk Assessment (SERRA), 15, 384-398.
Pawlowsky-Glahn, V., and Egozcue, J. J. (2002). BLU estimators and compositional data. Mathematical Geology, 34, 259-274.
Pawlowsky-Glahn, V., Egozcue, J. J., and Tolosana-Delgado, R. (2007). Lecture Notes on Compositional Data Analysis.
(http://hdl.handle.net/10256/297)
Peña, D. (2002). Análisis de datos multivariantes. Madrid: McGraw Hill.
Thió-Henestrosa, S., Egozcue, J. J., Kovács, V. P.-G. O., and Kovács, G. (2008). Balancedendrogram. a new routine of CoDaPack. Computer and Geosciences, 34, 1682-1696.
Downloads
Published
How to Cite
Issue
Section
License
The Austrian Journal of Statistics publish open access articles under the terms of the Creative Commons Attribution (CC BY) License.
The Creative Commons Attribution License (CC-BY) allows users to copy, distribute and transmit an article, adapt the article and make commercial use of the article. The CC BY license permits commercial and non-commercial re-use of an open access article, as long as the author is properly attributed.
Copyright on any research article published by the Austrian Journal of Statistics is retained by the author(s). Authors grant the Austrian Journal of Statistics a license to publish the article and identify itself as the original publisher. Authors also grant any third party the right to use the article freely as long as its original authors, citation details and publisher are identified.
Manuscripts should be unpublished and not be under consideration for publication elsewhere. By submitting an article, the author(s) certify that the article is their original work, that they have the right to submit the article for publication, and that they can grant the above license.