DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners

Clarence Lee1
M Ganesh Kumar2
Cheston Tan2
Design and Artificial Intelligence, SUTD1
Centre for Frontier AI Research, A*STAR2


Determiners are an important word class that is used in the referencing and quantification of nouns. However existing datasets place less emphasis on determiners, compared to other word classes. Hence, we have designed the DetermiNet dataset, which is a visuolinguistic dataset comprising of the word class determiners. It comprises of 25 determiners with 10,000 examples each, totalling 250,000 samples. All scenes were synthetically generated using unity. The task is to predict bounding boxes to identify objects of interest, constrained by the semantics of the determiners


[Paper] [Supp] [Github]

Download the Dataset

You may download DetermiNet comprising of 250,000 image caption pairs here: https://drive.google.com/drive/folders/1J5dleNxWvFUip5RBsTl6OqQBtpWO0r1k?usp=sharing

Real Dataset

Download real dataset comprising of 100 image caption pairs here: https://drive.google.com/drive/folders/1J5dleNxWvFUip5RBsTl6OqQBtpWO0r1k?usp=sharing

Citation Information

@misc{lee2023determinet,
      title={DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners}, 
      author={Clarence Lee and M Ganesh Kumar and Cheston Tan},
      year={2023},
      eprint={2309.03483},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Evaluation scripts

DetermiNet uses a modified ground truth for multiple annotations, to run your models for evaluation, refer to our github

DetermiNet examples (25 Determiners, 4 Determiner classes)

Articles

Possessives

Demonstratives

Quantifiers

Contact us