Exploring Compositional Data with the CoDa-Dendrogram

Authors

  • Vera Pawlowsky-Glahn University of Girona, Spain
  • Juan Jose Egozcue Technical University of Catalonia, Barcelona, Spain

DOI:

https://doi.org/10.17713/ajs.v40i1&2.202

Abstract

Within the special geometry of the simplex, the sample space of compositional data, compositional orthonormal coordinates allow the application of any multivariate statistical approach. The search for meaningful coordinates has suggested balances (between two groups of parts)—based on a sequential binary partition of a D-part composition—and a representation in form of a CoDa-dendrogram. Projected samples are represented in a
dendrogram-like graph showing: (a) the way of grouping parts; (b) the explanatory
role of subcompositions generated in the partition process; (c) the decomposition of the variance; (d) the center and quantiles of each balance. The representation is useful for the interpretation of balances and to describe the sample in a single diagram independently of the number of parts. Also, samples of two or more populations, as well as several samples from the same population, can be represented in the same graph, as long as they have
the same parts registered. The approach is illustrated with an example of food consumption in Europe.

References

Aitchison, J. (1986). The Statistical Analysis of Compositional Data. London: Chapman and Hall. (Reprinted 2003 with additional material by The Blackburn Press)

Aitchison, J. (1997). The one-hour course in compositional data analysis or compositional data analysis is simple. In V. Pawlowsky-Glahn (Ed.), Proceedings of IAMG’97 – The III Annual Conference of the International Association for Mathematical

Geology (Vols. I, II and addendum, p. 3-35)). Barcelona: International Center for Numerical Methods in Engineering (CIMNE).

Aitchison, J., and Egozcue, J. J. (2005). Compositional data analysis: where are we and where should we be heading? Mathematical Geology, 37, 829-850.

Aitchison, J., and Greenacre, M. (2002). Biplots for compositional data.

Barceló-Vidal, C., Martín-Fernández, J. A., and Pawlowsky-Glahn, V. (2001). Mathematical foundations of compositional data analysis. In G. Ross (Ed.), Proceedings of IAMG’01 – the VII annual conference of the international association for mathematical

geology (p. 20). Cancun: Kansas Geological Survey.

Billheimer, D., Guttorp, P., and Fagan, W. (2001). Statistical interpretation of species composition. Journal of the American Statistical Association, 96, 1205-1214.

Egozcue, J. J. (2009). Reply to “On the Harker variation diagrams; . . . ” by J. A. Cortés. Mathematical Geosciences, 41, 829-834.

Egozcue, J. J., and Pawlowsky-Glahn, V. (2005a). CoDa-dendrogram: a new exploratory tool. In G. Mateu-Figueras and C. Barceló-Vidal (Eds.), Compositional Data Analysis Workshop - CoDaWork’05, Proceedings. Girona: Universitat de Girona.

(http://ima.udg.es/Activitats/CoDaWork05/)

Egozcue, J. J., and Pawlowsky-Glahn, V. (2005b). Groups of parts and their balances in compositional data analysis. Mathematical Geology, 37, 795-828.

Egozcue, J. J., and Pawlowsky-Glahn, V. (2006). Exploring compositional data with the CoDa-dendrogram. In E. Pirard, A. Dassargues, and H. B. Havenith (Eds.), Proceedings of IAMG’06 – The XI Annual Conference of the International Association

for Mathematical Geology. Liège: University of Liège, Belgium, CD-ROM.

Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G., and Barceló-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35, 279-300.

Kolmogorov, A. N., and Fomin, S. V. (1957). Elements of the Theory of Functions and Functional Analysis (Vols. I+II). Mineola, NY: Dover Publications, Inc.

Pawlowsky-Glahn, V. (2003). Statistical modelling on coordinates. In S. Thió-Henestrosa and A. Martín-Fernández (Eds.), CoDaWork’03 – Proceedings. Girona: Universitat de Girona. (http://ima.udg.es/Activitats/CoDaWork03/)

Pawlowsky-Glahn, V., and Egozcue, J. J. (2001). Geometric approach to statistical analysis on the simplex. Stochastic Environmental Research and Risk Assessment (SERRA), 15, 384-398.

Pawlowsky-Glahn, V., and Egozcue, J. J. (2002). BLU estimators and compositional data. Mathematical Geology, 34, 259-274.

Pawlowsky-Glahn, V., Egozcue, J. J., and Tolosana-Delgado, R. (2007). Lecture Notes on Compositional Data Analysis.

(http://hdl.handle.net/10256/297)

Peña, D. (2002). Análisis de datos multivariantes. Madrid: McGraw Hill.

Thió-Henestrosa, S., Egozcue, J. J., Kovács, V. P.-G. O., and Kovács, G. (2008). Balancedendrogram. a new routine of CoDaPack. Computer and Geosciences, 34, 1682-1696.

Downloads

Published

2016-02-24

How to Cite

Pawlowsky-Glahn, V., & Egozcue, J. J. (2016). Exploring Compositional Data with the CoDa-Dendrogram. Austrian Journal of Statistics, 40(1&2), 103–113. https://doi.org/10.17713/ajs.v40i1&2.202

Issue

Section

Articles