Treatment of Multivariate Outliers in Incomplete Business Survey Data

Authors

  • Marc Bill FHNW School of Business
  • Beat Hulliger FHNW School of Business

DOI:

https://doi.org/10.17713/ajs.v45i1.86

Abstract

The distribution of multivariate quantitative survey data usually is not normal. Skewed and semi-continuous distributions occur often. In addition, missing values and non-response is common. All together this mix of problems makes multivariate outlier detection difficult. Examples of surveys where these problems occur are most business surveys and some household surveys like the Survey for the Statistics of Income and Living Condition (SILC) of the European Union. Several methods for multivariate outlier detection  are collected in the R-package modi. This paper gives an overview of modi and its functions for outlier detection and corresponding imputation. The use of the methods is explained with a business survey dataset. The discussion covers pre- and post-processing  to deal with skewness and zero-inflation, advantages and disadvantages of the methods and the choice of the parameters.

References

Béguin C, Hulliger B (2004). Multivariate Outlier Detection in Incomplete Survey Data: the

Epidemic Algorithm and Transformed Rank Correlations." Journal of the Royal Statistical

Society, Series A: Statistics in Society, 167(2), 275{294.

Béguin C, Hulliger B (2008). The BACON-EEM Algorithm for Multivariate Outlier Detec-

tion in Incomplete Survey Data." Survey Methodology, Vol. 34, No. 1, 91{103.

Campbell N (1989). Bush_re mapping using noaa avhrr data." Technical report, Common-

wealth Scienti_c and Industrial Research Organisation, North Ryde.

Chambers R (1986). Outlier Robust Finite Population Estimation." Journal of the American

Statistical Association, 81(396), 1063{1069.

Charlton J (ed.) (2003). Towards Effective Statistical Editing and Imputation Strate-

gies - Findings of the Euredit project, volume 1 and 2. EUREDIT consortium.

Http://www.cs.york.ac.uk/euredit/results/results.html.

Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (2005). Robust Statistics: The Ap-

proach Based on Inuence Functions. Wiley.

Hulliger B (2013). modi: Multivariate outlier detection and imputation for incomplete survey

data. R package version 1.2/r6, URL http://R-Forge.R-project.org/projects/modi/.

Hulliger B, Schoch T (2013). Mechanisms for multivariate outliers and missing values." In

Proceedings of the NTTS2013 Conference, Brussels.

Little R, Smith P (1987). Editing and imputation for quantitative survey data." Journal of

the American Statistical Association, 82, 58{68.

Luzi O, De Waal T, Hulliger B, Di Zio M, Pannekoek J, Kilchmann D, Guarnera U, Hoogland

J, Manzari A, Tempelman C (2007). Recommended Practices for Editing and Imputation in

Cross-Sectional Business Surveys. Italian Statistical Institute ISTAT,. Institutions: ISTAT,

CBS, SFSO, Eurostat.

Maronna R, Zamar R (2002). Robust Estimates of Location and Dispersion for High-

Dimensional Datasets." Technometrics, 44(4), 307{317.

Todorov V, Filzmoser P (2009). An Object-Oriented Framework for Robust Multivariate

Analysis." Journal of Statistical Software, 32(3), 1{47. ISSN 1548-7660. URL http:

//www.jstatsoft.org/v32/i03.

Todorov V, Templ M, Filzmoser P (2011). Detection of multivariate outliers in business

survey data with incomplete information." Advances in Data Analysis and Classi_cation

(ADAC), Vol. 5(1), 37{56.

Published

2016-02-29

How to Cite

Bill, M., & Hulliger, B. (2016). Treatment of Multivariate Outliers in Incomplete Business Survey Data. Austrian Journal of Statistics, 45(1), 3-23. https://doi.org/10.17713/ajs.v45i1.86

Issue

Section

Special Issue on R