Active Learning

Resources for research on "inquisitive" machine learning.

Mailing List

There is a group intended for announcements and discussions regarding active learning research.

Subscribe to Active Learning (Machine Learning)
Email:
Visit this group

Active Learning Book

Synthesis Lectures on Artificial Intelligence and Machine Learning
Morgan & Claypool Publishers, June 2012, 114 pages

Burr Settles
Carnegie Mellon University

Online supplementary materials coming soon.

Abstract

The key idea behind active learning is that a machine learning algorithm can perform better with less training if it is allowed to choose the data from which it learns. An active learner may pose "queries," usually in the form of unlabeled data instances to be labeled by an "oracle" (e.g., a human annotator) that already understands the nature of the problem. This sort of approach is well-motivated in many modern machine learning and data mining applications, where unlabeled data may be abundant or easy to come by, but training labels are difficult, time-consuming, or expensive to obtain.

This book is a general introduction to active learning. It outlines several scenarios in which queries might be formulated, and details many query selection algorithms which have been organized into four broad categories, or "query selection frameworks." We also touch on some of the theoretical foundations of active learning, and conclude with an overview of the strengths and weaknesses of these approaches in practice, including a summary of ongoing work to address these open challenges and opportunities.

Table of Contents: Automating Inquiry / Uncertainty Sampling / Searching Through the Hypothesis Space / Minimizing Expected Error and Variance / Exploiting Structure in Data / Theory / Practical Considerations

Active Learning Literature Survey. This book is partially based on a popular unpublished literature survey, the contents of which are subsumed by and expanded on in the book. For historical interest, here are archival versions of that survey: 26-jan-2010, 09-jan-2009

Other Links and Resources

Software

  • DUALIST - active learning tool for text processing soliciting feedback on both instances and features, with a web-based user interface in Java
  • Vowpal Wabbit - C++ library focused on large-scale and online machine learning, which includes selective sampling algorithms
  • Curious Snake - small active learning library for Python

Workshops

Online Lectures