Datasets | AIVAS Lab

Toybox

Toybox is designed to enable an improved understanding of small sample learning and hand-object-vision interactions. The dataset contains video clips of structured, handheld transformations of 360 individual objects from 12 different categories (cups, mugs, spoons, balls, cars, trucks, airplanes, helicopters, horses, cats, ducks, and giraffes)—with over 2 million images in total.

Selected publications

Wang, X., Ma, T., Ainooson, J., Cha, S., Wang, X., Molla, A., Kunda, M. (2018). The Toybox dataset of egocentric visual object transformations. https://arxiv.org/abs/1806.06034
Wang, X., Eliott, F., Ainooson, J., Palmer, J., and Kunda, M. (2017). An object is worth six thousand pictures: The egocentric, manual, multi-image (EMMI) dataset. In International Conference on Computer Vision Workshop on Egocentric Perception, Interaction, and Computing (EPIC@ICCV), Venice, Italy.

A sampling of some of the Toybox objects:

toybox_organized