Extracting Information from Interval Data Using Symbolic Principal Component Analysis

M. R. Oliveira, M. Vilela, A. Pacheco, Rui Valadas, Paulo Salvador


We introduce generic definitions of symbolic variance and covariance for random interval-valued variables, that lead to a unified and insightful interpretation of four known symbolic principal component estimation methods: CPCA, VPCA, CIPCA, and SymCovPCA. Moreover, we propose the use of truncated versions of symbolic principal components, that use a strict subset of the original symbolic variables, as a way to improve the interpretation of symbolic principal components. Furthermore, the analysis of a real dataset leads to a meaningful characterization of Internet traffic applications, while highligting similarities between the symbolic principal component estimation methods considered in the paper.

