Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning

Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut


Abstract
We present a new dataset of image caption annotations, Conceptual Captions, which contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) and represents a wider variety of both images and image caption styles. We achieve this by extracting and filtering image caption annotations from billions of webpages. We also present quantitative evaluations of a number of image captioning models and show that a model architecture based on Inception-ResNetv2 (Szegedy et al., 2016) for image-feature extraction and Transformer (Vaswani et al., 2017) for sequence modeling achieves the best performance when trained on the Conceptual Captions dataset.
Anthology ID:
P18-1238
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Iryna Gurevych, Yusuke Miyao
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2556–2565
Language:
URL:
https://aclanthology.org/P18-1238/
DOI:
10.18653/v1/P18-1238
Bibkey:
Cite (ACL):
Piyush Sharma, Nan Ding, Sebastian Goodman, and Radu Soricut. 2018. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2556–2565, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning (Sharma et al., ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/P18-1238.pdf
Code
 google-research-datasets/conceptual-captions +  additional community code
Data
Conceptual CaptionsFlickr30kMS COCO