DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners
Determiners are an important word class that is used in the referencing and quantification of nouns. However existing datasets place less emphasis on determiners, compared to other word classes. Hence, we have designed the DetermiNet dataset, which is a visuolinguistic dataset comprising of the word class determiners. It comprises of 25 determiners with 10,000 examples each, totalling 250,000 samples. All scenes were synthetically generated using unity. The task is to predict bounding boxes to identify objects of interest, constrained by the semantics of the determiners
You may download DetermiNet comprising of 250,000 image caption pairs here: https://drive.google.com/drive/folders/1J5dleNxWvFUip5RBsTl6OqQBtpWO0r1k?usp=sharing
Download real dataset comprising of 100 image caption pairs here: https://drive.google.com/drive/folders/1J5dleNxWvFUip5RBsTl6OqQBtpWO0r1k?usp=sharing
@misc{lee2023determinet,
title={DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners},
author={Clarence Lee and M Ganesh Kumar and Cheston Tan},
year={2023},
eprint={2309.03483},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
DetermiNet uses a modified ground truth for multiple annotations, to run your models for evaluation, refer to our github