Toybox is designed to enable an improved understanding of small sample learning and hand-object-vision interactions. The dataset contains video clips of structured, handheld transformations of 360 individual objects from 12 different categories (cups, mugs, spoons, balls, cars, trucks, airplanes, helicopters, horses, cats, ducks, and giraffes)—with over 2 million images in total.

Selected publications

  • Wang, X., Ma, T., Ainooson, J., Cha, S., Wang, X., Molla, A., Kunda, M. (2018). The Toybox dataset of egocentric visual object transformations.
  • Wang, X., Eliott, F., Ainooson, J., Palmer, J., and Kunda, M. (2017). An object is worth six thousand pictures: The egocentric, manual, multi-image (EMMI) dataset. In International Conference on Computer Vision Workshop on Egocentric Perception, Interaction, and Computing (EPIC@ICCV), Venice, Italy.

A sampling of some of the Toybox objects: