TAU image annotator

The server presents results obtained using the HGLMM Fisher Vector method, which as of January 2015 acheives state of the art results on the current image annotation and image search benchmarks.

The demo presents the capability of HGLMM in automatic image annotation. The text synthesis model used below was trained on a limited dataset of the 8,091 images of the flickr-8k dataset (Hodosh et al., 2013), and therefore does not cover all types of images. For the initial encoding of the images we employ the VGG convnet (K. Simonyan and A. Zisserman, 2014). For the initial encoding of the individual words, the word2vec representation is used (Mikolov et al., 2013). Please refer to the report below for the details of representing paragraphs and matching between the text and the images. The details of synthesizing the text will be added soon.

Benjamin Klein, Guy Lev, Gil Sadeh, Lior Wolf. Fisher Vectors Derived from Hybrid Gaussian-Laplacian Mixture Models for Image Annotation. arXiv:1411.7399, 2014.

Acknowledgments: This research is supported by the Intel Collaborative Research Institute for Computational Intelligence (ICRI-CI).