Active Learning in Black-Box Settings
DOI:
https://doi.org/10.17713/ajs.v40i1&2.204Abstract
Active learning refers to the settings in which a machine learning algorithm (learner) is able to select data from which it learns (selecting points and then obtaining their labels), and by doing so aims to achieve better accuracy (e.g., by avoiding obtaining training data that is redundant or unimportant). Active learning is particularly useful in cases where the labelingcost is high. A common assumption is that an active learning algorithm is aware of the details of the underlying learning algorithm for which it obtains the data. However, in many practical settings, obtaining precise details of the learning algorithm may not be feasible, making the underlying algorithm in essence a black box – no knowledge of the internal workings of the algorithm is available, and only the inputs and corresponding output estimates are accessible. This makes many of the traditional approaches not applicable, or
at the least not effective. Hence our motivation is to use the only data that is accessible in black box settings – output estimates. We note that accuracy will improve only if the learner’s output estimates change. Therefore we propose active learning criterion that utilizes the information contained within the changes of output estimates.
References
Andersen, R. (2008). Modern methods for robust regression (No. 152). Thousand Oaks, CA, USA: Sage Publications.
Bell, R. M., and Koren, Y. (2007). Lessons from the netflix prize challenge. SIGKDD Explorations Newsletter, 9, 75–79.
Boyd, S., and Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.
Chan, N. (1981). A-optimality for regression designs (Tech. Rep.). Stanford, CA, USA: Stanford University, Department of Statistics.
Cook, R. D. (1977). Detection of influential observation in linear regression. Technometrics, 19(1), 15–18.
Dette, H., and Studden, W. J. (1993). Geometry of e-optimality. The Annals of Statistics, 21(1), 416-443.
Hager, W. (1989). Updating the inverse of a matrix. SIAM review, 31(2), 221–239.
Hodge, V., and Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85–126.
John, R. C. S., and Draper, N. R. (1975, Feb.). D-optimality for regression designs: A review. Technometrics, 17(1), 15-23.
Riedl, J., and Konstan, J. (1998). Movielens data set. http://movielens.umn.edu.
Romano, D., and Kinnaert, M. (2005). An experiment-based methodology for robust design of optimal residual generators. In IEEE conference on decision and control (p. 6286 - 6291). Seville, Spain: IEEE.
Rubens, N., Kaplan, D., and Sugiyama, M. (2010). Recommender systems handbook. In (chap. Active Learning for Recommender Systems). New York, NY: Springer.
Settles, B. (2009). Active learning literature survey (Computer Sciences Technical Report No. 1648). Madison, Wisconsin, USA: University of Wisconsin–Madison.
Sugiyama, M., and Ogawa, H. (2000). Incremental active learning for optimal generalization. Neural Computation, 12(12), 2909–2940.
Yu, K., Bi, J., and Tresp, V. (2006). Active learning via transductive experimental design. In Proceedings of the 23rd int. conference on machine learning icml ’06 (pp. 1081–1088). New York, NY, USA: ACM.
Downloads
Published
How to Cite
Issue
Section
License
The Austrian Journal of Statistics publish open access articles under the terms of the Creative Commons Attribution (CC BY) License.
The Creative Commons Attribution License (CC-BY) allows users to copy, distribute and transmit an article, adapt the article and make commercial use of the article. The CC BY license permits commercial and non-commercial re-use of an open access article, as long as the author is properly attributed.
Copyright on any research article published by the Austrian Journal of Statistics is retained by the author(s). Authors grant the Austrian Journal of Statistics a license to publish the article and identify itself as the original publisher. Authors also grant any third party the right to use the article freely as long as its original authors, citation details and publisher are identified.
Manuscripts should be unpublished and not be under consideration for publication elsewhere. By submitting an article, the author(s) certify that the article is their original work, that they have the right to submit the article for publication, and that they can grant the above license.