The server presents results obtained using the HGLMM Fisher Vector method, which as of January 2015 acheives state of the art results on the current image annotation and image search benchmarks.
The demo presents the capability of HGLMM in automatic image annotation. The text synthesis model used below was trained on a limited dataset of the 8,091 images of the flickr-8k dataset (Hodosh et al., 2013), and therefore does not cover all types of images. For the initial encoding the images we employ the VGG convnet (K. Simonyan and A. Zisserman, 2014). For the initial encoding of the individual words, the word2vec representation is used (Mikolov et al., 2013). Please refer to the report below for the details of representing paragraphs and matching between the text and the images. The details of synthesizing the text will be added soon.
Acknowledgments: This research is supported by the Intel Collaborative Research Institute for Computational Intelligence (ICRI-CI).