Domain-Based Benchmark Experiments: Exploratory and Inferential Analysis
Benchmark experiments are the method of choice to compare learning algorithms empirically. For collections of data sets, the empirical performance distributions of a set of learning algorithms are estimated, compared, and ordered. Usually this is done for each data set separately. The present manuscript extends this single data set-based approach to a joint analysis for the complete collection, the so called problem domain. This enables
to decide which algorithms to deploy in a specific application or to compare newly developed algorithms with well-known algorithms on established problem domains.
Specialized visualization methods allow for easy exploration of huge amounts of benchmark data. Furthermore, we take the benchmark experiment design into account and use mixed-effects models to provide a formal statistical analysis. Two domain-based benchmark experiments demonstrate our methods: the UCI domain as a well-known domain when one is developing a new algorithm; and the Grasshopper domain as a domain where we want to find the best learning algorithm for a prediction component in an enterprise application software system.
Abernethy, J., and Liang, P. (2010). MLcomp. Website. (http://mlcomp.org/; visited on December 20, 2011)
Asuncion, A., and Newman, D. (2007). UCI machine learning repository. Website. Available from http://www.ics.uci.edu/~mlearn/MLRepository.html
Bates, D., and Maechler, M. (2010). lme4: Linear mixed-effects models using S4 classes [Computer software manual]. Available from http://lme4.r-forge.r-project.org/ (R package version 0.999375-35)
Becker, R. A., Cleveland, W. S., and Shyu, M.-J. (1996). The visual design and control of Trellis display. Journal of Computational and Graphical Statistics, 5(2), 123–155.
Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., , and Weingessel, A. (2009). e1071: Misc functions of the department of statistics (e1071), tu wien [Computer software manual]. Available from http://CRAN.R-project.org/package=e1071 (R package version 1.5-19)
Eugster, M. J. A. (2010). benchmark: Benchmark experiments toolbox [Computer software manual]. Available from http://CRAN.R-project.org/package=benchmark (R package version 0.3)
Eugster, M. J. A. (2011). Benchmark Experiments – A Tool for Analyzing Statistical Learning Algorithms. Dr. Hut-Verlag. Available from http://edoc.ub.uni-muenchen.de/12990/ (PhD thesis, Department of Statistics, Ludwig-Maximilians-Universität München, Munich, Germany)
Eugster, M. J. A., and Leisch, F. (2010). Exploratory analysis of benchmark experiments – an interactive approach. Computational Statistics. (Accepted for publication on 2010-06-08)
Eugster, M. J. A., Leisch, F., and Strobl, C. (2010). (Psycho-)analysis of benchmark experiments – a formal framework for investigating the relationship between data sets
and learning algorithms (Technical Report No. 78). Institut für Statistik, Ludwig-Maximilians-Universität München, Germany. Available from http://epub.ub.uni-muenchen.de/11425/
Federal Environment Agency, D.-D. (2004). CORINE Land Cover (CLC2006). Available from http://www.corine.dfd.dlr.de/ (Deutsches Zentrum für Luft- und Raumfahrt e.V.)
Gansner, E. R., and North, S. C. (2000). An open graph visualization system and its applications to software engineering. Software — Practice and Experience, 30(11), 1203–1233.
Hager, G., and Wellein, G. (2010). Introduction to High Performance Computing for Scientists and Engineers. CRC Press.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (second ed.). Springer-Verlag.
Henschel, S., Ong, C. S., Braun, M. L., Sonnenburg, S., and Hoyer, P. O. (2010). MLdata: Machine learning benchmark repository. Website. (http://mldata.org/; visited on December 20, 2011)
Hijmans, R. J., Cameron, S. E., Parra, J. L., Jones, P. G., and Jarvis, A. (2005). Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology, 25(15), 1965–1978. Available from http://worldclim.org
Hornik, K., and Meyer, D. (2007). Deriving consensus rankings from benchmarking experiments. In R. Decker and H.-J. Lenz (Eds.), Advances in Data Analysis (Proceedings of the 30th Annual Conference of the Gesellschaft für Klassifikation e.V., Freie
Universität Berlin, March 8–10, 2006 (pp. 163–170). Springer-Verlag.
Hornik, K., and Meyer, D. (2010). relations: Data structures and algorithms for relations [Computer software manual]. Available from http://CRAN.R-project.org/package=relations (R package version 0.5-8)
Hothorn, T., Bretz, F., and Westfall, P. (2008). Simultaneous inference in general parametric models. Biometrical Journal, 50(3), 346–363. Available from http://cran.r-project.org/package=multcomp
Hothorn, T., Hornik, K., Wiel, M. A. van de, and Zeileis, A. (2006). A Lego system for conditional inference. The American Statistician, 60(3). Available from http://CRAN.R-project.org/package=coin
Hothorn, T., Leisch, F., Zeileis, A., and Hornik, K. (2005). The design and analysis of benchmark experiments. Journal of Computational and Graphical Statistics, 14(3), 675–699.
Kemeny, J. G., and Snell, J. L. (1972). Mathematical Models in the Social Sciences. MIT Press.
Liaw, A., and Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22. Available from http://CRAN.R-project.org/doc/Rnews/
McGarigal, K., Cushman, S. A., Neel, M. C., and Ene, E. (2002). Fragstats: Spatial pattern analysis program for categorical maps [Computer software manual]. (Computer software program produced by the authors at the University of Massachusetts,
Pfahringer, B., and Bensusan, H. (2000). Meta-learning by landmarking various learning algorithms. In In Proceedings of the Seventeenth International Conference on Machine Learning (pp. 743–750). Morgan Kaufmann.
Pinheiro, J. C., and Bates, D. M. (2000). Mixed-Effects Models in S and S-PLUS. Springer.
R Development Core Team. (2010). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Available from http://
www.R-project.org (ISBN 3-900051-07-0)
Scharl, T., and Leisch, F. (2009). gcExplorer: Interactive exploration of gene clusters. Bioinformatics, 25(8), 1089–1090.
Schlumprecht, H., and Waeber, G. (2003). Heuschrecken in Bayern. Ulmer.
Therneau, T. M., and Atkinson, B. (2009). rpart: Recursive partitioning [Computer software manual]. Available from http://CRAN.R-project.org/package=rpart (R package version 3.1-43. R port by Brian Ripley)
Venables, W. N., and Ripley, B. D. (2002). Modern Applied Statistics with S (Fourth ed.). New York: Springer. Available from http://www.stats.ox.ac.uk/pub/MASS4 (ISBN 0-387-95457-0)
Vilalta, R., and Drissi, Y. (2002). A perspective view and survey of meta-learning. Artificial Intelligence Review, 18(2), 77–95.
Wellek, S. (2003). Testing Statistical Hypotheses of Equivalence. Chapman & Hall.
Wickham, H. (2009). ggplot2: Elegant Graphics for Data Analysis. Springer New York. Available from http://had.co.nz/ggplot2/book
How to Cite
The Austrian Journal of Statistics publish open access articles under the terms of the Creative Commons Attribution (CC BY) License.
The Creative Commons Attribution License (CC-BY) allows users to copy, distribute and transmit an article, adapt the article and make commercial use of the article. The CC BY license permits commercial and non-commercial re-use of an open access article, as long as the author is properly attributed.
Copyright on any research article published by the Austrian Journal of Statistics is retained by the author(s). Authors grant the Austrian Journal of Statistics a license to publish the article and identify itself as the original publisher. Authors also grant any third party the right to use the article freely as long as its original authors, citation details and publisher are identified.
Manuscripts should be unpublished and not be under consideration for publication elsewhere. By submitting an article, the author(s) certify that the article is their original work, that they have the right to submit the article for publication, and that they can grant the above license.