|
X. Li, J. Ding, M. Elhoseiny:
VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding
NeurIPS, 2024
|
|
Kirolos Ataallah, X. Shen, E. Abdelrahman, Essam Sleiman, Mingchen Zhuge, J. Ding, D. Zhu, Jürgen Schmidhuber, M. Elhoseiny:
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
ECCV, 2024
|
|
X. Li, J. Ding, Zhaoyang Chen, M. Elhoseiny:
Uni3DL: A Unified Model for 3D Vision-Language Understanding
ECCV, 2024
|