Publications

VisionCAIR Group

Years:

2025
	X. Shen, W. Zhang, J. Chen, M. Elhoseiny: Vgent: Graph-based Retrieval Reasoning Augmented Generation For Long Video Understanding NeurIPS, 2025
	X. Shen, Yunyang Xiong, Changsheng Zhao, Lemeng Wu, J. Chen, Chenchen Zhu, Zechun Liu, Fanyi Xiao, Balakrishnan Varadarajan, Florian Bordes, Zhuang Liu, Hu Xu, Hyunwoo J. Kim, Bilge Soran, Raghuraman Krishnamoorthi, M. Elhoseiny, Vikas Chandra: LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding ICML, 2025
	X. Shen, M. Elhoseiny: StoryGPT-V: Large Language Models as Consistent Story Visualizers CVPR, 2025
	E. Abdelrahman, L. Zhao, V. Tao Hu, M. Cord, P. Perez, M. Elhoseiny: TODDLERDIFFUSION: INTERACTIVE STRUCTURED IMAGE GENERATION WITH CASCADED SCHRÖDINGER BRIDGE ICLR, 2025

2024
	Y. Mohamed, Runjia Li, Ibrahim Said Ahmad, K. Haydarov, Philip Torr, Kenneth Church, M. Elhoseiny: No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages EMNLP, 2024 Project page
	Botos Csaba, W. Zhang, Matthias Müller, Ser-Nam Lim, M. Elhoseiny, Philip H.S. Torr, Adel Bibi: Label Delay in Online Continual Learning NeurIPS, 2024
	M. Ahmed, X. Li, Arpit Prajapati, M. Elhoseiny: 3DCoMPaT200: Language Grounded Large-Scale 3D Vision Dataset for Compositional Recognition, Datasets and Benchmarks NeurIPS, 2024
	X. Li, J. Ding, M. Elhoseiny: VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding NeurIPS, 2024
	Kirolos Ataallah, X. Shen, E. Abdelrahman, Essam Sleiman, Mingchen Zhuge, J. Ding, D. Zhu, Jürgen Schmidhuber, M. Elhoseiny: Goldfish: Vision-Language Understanding of Arbitrarily Long Videos ECCV, 2024
	X. Shen, F. Farooq Khan, Abdelrahman Mohamed, M. Elhoseiny: EmoTalker: Audio Driven Emotion Aware Talking Head Generation ACCV, 2024
	Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta, J. Chen, M. Elhoseiny, Ruohan Gao, Dinesh Manocha: Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time ECCV, 2024
	K. Haydarov, X. Shen, Avinash Madasu, Mahmoud Salem, Li-Jia Li, Gamaleldin Elsayed, M. Elhoseiny: Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversation ECCV, 2024
	X. Li, J. Ding, Zhaoyang Chen, M. Elhoseiny: Uni3DL: A Unified Model for 3D Vision-Language Understanding ECCV, 2024
	K. Haydarov, Aashiq Muhamed, X. Shen, Jovana Lazarevic, Ivan Skorokhodov, Chamuditha Jayanga Galappaththige, M. Elhoseiny: Adversarial Text to Continuous Image Generation CVPR, 2024
	H. Slim, M. Elhoseiny: ShapeWalk: A Benchmark for Compositional Shape Editing through Language-Guided Chains CVPR, 2024
	W. Zhang, Paul Janson, Rahaf Aljundi, M. Elhoseiny: Overcoming Generic Knowledge Loss with Selective Parameter Update CVPR, 2024
	D. Zhu, J. Chen, K. Haydarov, X. Shen, W. Zhang, M. Elhoseiny: ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions TMLR Journal, 2024
	Yuanpeng Li, Joel Hestness, M. Elhoseiny, Liang Zhao, Kenneth Church: Efficiently Disentangle Causal Representations CPAL, Proceedings of Machine Learning Research, 2024
	D. Zhu, J. Chen, X. Shen, X. Li, M. Elhoseiny: MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models ICLR, 2024
	E. Abdelrahman, Mohamed Ayman, M. Ahmed, H. Slim, M. Elhoseiny: CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding ICLR, 2024
	W. Zhang, Y. Mohamed, Bernard Ghanem, Philip Torr, Adel Bibi, M. Elhoseiny: Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation ICLR, 2024
	Salman Khan, Izzeddin Teeti, Andrew Bradley, M. Elhoseiny, Fabio Cuzzolin: A Hybrid Graph Network for Complex Activity Detection in Video WACV, 2024
	E. Abdelrahman, Pengzhan Sun, Li Erran Li, M. Elhoseiny: ImageCaptioner^2: Image Captioner for Image Captioning Bias Amplification Assessment AAAI, 2024

2023
	W. Zhang, Paul Janson, K. Yi, I. Skorokhodov, M. Elhoseiny: Continual Zero-Shot Learning through Semantically Guided Generative Random Walks ICCV, 2023
	E. Abdelrahman, Pengzhan Sun, X. Shen, F. Khan, Li Erran Li, M. Elhoseiny: HRS-Bench: Holistic, Reliable, and Scalable Benchmark for Text-to-Image Models ICCV, 2023
	J. Chen, D. Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Sean Culatana, M. Elhoseiny: Exploring Open-Vocabulary Semantic Segmentation without Human Labels ICCV, 2023
	F. Khan, X. Li, Andrew Temple, M. Elhoseiny: FishNet: A Large-scale Dataset and Benchmark for Fish Recognition, Detection, and Functional Traits Prediction ICCV, 2023
	Runjia Li, Shuyang Sun, M. Elhoseiny, Philip Torr: Humourous Image Captions (HIC): A Humour-oriented Image-text Dataset ICCV, 2023
	Hang Xu, W. Zhang, Jiawei Fei, Yuzhe Wu, TingWen Xie, Jun Huang, Yuchen Xie, M. Elhoseiny, Panos Kalnis: SLAMB: Accelerated Large Batch Training with Sparse Communication ICML, 2023
	Xinge Yang, Qiang Fu, M. Elhoseiny, Wolfgang Heidrich: Aberration-Aware Depth-from-Focus ICCP and TPAMI, 2023
	J. Chen, Ming Hu, Darren J Coker, Michael Berumen, Blair Roberts Costelloe, Sara Beery, Anna Rohrbach, M. Elhoseiny: MammalNet: A Large-scale Video Benchmark for Mammal Recognition and Behavior Understanding CVPR, 2023
	X. Shen, X. Li, M. Elhoseiny: MoStGAN-V: Video Generation with Temporal Motion Styles CVPR, 2023
	D. Zhu, Li Erran Li, M. Elhoseiny: Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning ICLR, 2023

2022
	Y. Mohamed, Mohamed Abdelfattah, Shyma Alhuwaider, Feifan Li, Kenneth Church, Xiangliang Zhang, M. Elhoseiny: ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on Diversity over Language and Culture EMNLP, 2022 Project page
	E. Abdelrahman, Yasmeen Youssef Alsaedy, M. Elhoseiny: Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding NeurIPS, 2022
	Guocheng Qian, Y. Li, Houwen Peng, Jinjie Mai, Hasan Abed Al Kader Hammoud, M. Elhoseiny, Bernard Ghanem: PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies NeurIPS, 2022
	Y. Li, H. Slim, Ujjwal Upadhyay, Ahmed Abdelreheem, Arpit Prajapati, Suhail Pothigara, Peter Wonka, M. Elhoseiny: 3D CoMPaT: Composition of Materials on Parts of 3D Things ECCV, 2022
	K. Yi, X. Shen, Yunhao Gou, M. Elhoseiny: Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification ECCV, 2022
	Abdullah Mohamed, D. Zhu, Warren Vu, M. Elhoseiny, Christian Claudel: Social-Implicit: Rethinking Trajectory Prediction Evaluation and The Effectiveness of Implicit Maximum Likelihood Estimation ECCV, 2022
	Y. Mohamed, F. Khan, K. Haydarov, M. Elhoseiny: It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection CVPR, 2022 Project page
	J. Chen, Han Guo, K. Yi, Boyang Li, M. Elhoseiny: VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning CVPR, 2022
	J. Chen, Aniket Agarwal, Sherif Abdelkarim, D. Zhu, M. Elhoseiny: RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition CVPR, 2022
	I. Skorokhodov, Sergey Tulyakov, M. Elhoseiny: StyleGAN-V: A Continuous Video Generation with the Price, Image Quality and Perks of StyleGAN2 CVPR, 2022
	Divyansh Jha, K. Yi, I. Skorokhodov, M. Elhoseiny: Creative Walk Adversarial Networks: Novel Art Generation with Probabilistic Random Walk Deviation from Style Norms ICCC, 2022
	M. Elhoseiny, K. Yi, Mohamed Elfeki: CIZSL++: Creativity Inspired Generative Zero-shot Learning TPAMI, 2022

2021
	I. Skorokhodov, Grigory Sotnikov, M. Elhoseiny: Aligning Latent and Image Spaces to Connect the Unconnectable ICCV, 2021
	Sherif Abdelkarim, Aniket Agarwal, Panos Achlioptas, J. Chen, Jiaji Huang, Boyang Li, Kenneth Church, M. Elhoseiny: Exploring Long Tail Visual Relationship Recognition with Large Vocabulary ICCV, 2021
	D. Zhu, Mohamed Zahran, Li Erran Li, M. Elhoseiny: HalentNet: Multimodal Trajectory Forecasting with Hallucinative Intents ICLR, 2021
	Ahmed Ayyad, Y. Li, Raden Muaz, Shadi Albarqouni, M. Elhoseiny: Semi-Supervised Few-Shot Learning with Prototypical Random Walks AAAI, Meta Learning workshop, 2021

2020
	Panos Achlioptas, Ahmed Abdelreheem, Fei Xia, M. Elhoseiny, Leonidas Guibas: ReferIt3D: Neural Listeners for Fine-Grained Object Identification in Real World 3D Scenes ECCV, 2020
	Panos Achlioptas, Ahmed Abdelreheem, Fei Xia, M. Elhoseiny, Leonidas Guibas: ReferIt3D: Neural Listeners for Fine-Grained Object Identification in Real World 3D Scenes ECCV, 2020
	Yuanpeng Li, Liang Zhao, Ken Church, M. Elhoseiny: Compositional Continual Language Learning ICLR, 2020
	Abdullah Mohamed, Kun Qian, M. Elhoseiny, Christian Claudel: Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction CVPR, 2020
	Sayna Ebrahimi, M. Elhoseiny, Trevor Darrell, Marcus Rohrbach: Uncertainty-guided Continual Learning with Bayesian Neural Networks ICLR, 2020

2019
	M. Elhoseiny, Mohamed Elfeki: Creativity Inspired Zero-Shot Learning ICCV, 2019
	Mohamed Elfeki, Camille Couprie, M. Elhoseiny: Learning Diverse Generations using Determinantal Point Processes ICML, 2019
	Mennatullah Siam, Chen Jiang, Steven Lu, Laura Petrich, Mahmoud Gamal, M. Elhoseiny, Martin Jagersand: Video Segmentation using Teacher-Student Adaptation in a Human Robot Interaction (HRI) Setting ICRA, 2019
	Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, M. Elhoseiny: Efficient Lifelong Learning with A-GEM ICLR, 2019
	Ji Zhang, Yannis Khaladis, Marcus Rohrbach, Manohar Paluri, Ahmed Elgammal, M. Elhoseiny: Large-Scale Visual Relationship Understanding AAAI, 2019

2018
	Ramprasaath Selvaraju, Prithvijit Chattopadhyay, M. Elhoseiny, Tilak Sharma, Dhruv Batra, Devi Parikh, Stefan Lee: Choose your Neuron: Incorporating Domain Knowledge through Neuron Importance ECCV, 2018
	Rahaf Aljundi, Francesca Babiloni, M. Elhoseiny, Marcus Rohrbach, Tinne Tuytelaars: Memory Aware Synapses: Learning what (not) to forget ECCV, 2018
	Yizhe Zhu, M. Elhoseiny, Bingchen Liu, Ahmed Elgammal: Imagine it for me: Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts CVPR, 2018
	Othman Sbai, M. Elhoseiny, Camille Couprie, Antoine Bordes, Yann LeCun: DesIGN: Design Inspiration from Generative Networks ECCVW, 2018
	Ahmed Elgammal, Bingchen Liu, Diana Kim, M. Elhoseiny, Marian Mazzone: The Shape of Art History in the Eyes of the Machine AAAI, 2018
	M. Elhoseiny, Francesca Babiloni, Rahaf Aljundi, Marcus Rohrbach, Tinne Tuytelaars: Exploring the Challenges towards Lifelong Fact Learning ACCV, 2018

2017
	M. Elhoseiny, Yizhe Zhu, Han Zhang, Ahmed Elgammal: Link the Head to the Peak: Zero Shot Learning from Noisy Text Descriptions at Part Precision CVPR, 2017
	Ji Zhang, M. Elhoseiny, Walter Chang, Scott Cohen, Ahmed Elgammal: Relationship Proposal Networks CVPR, 2017
	M. Elhoseiny, Scott Cohen, Walter Chang, Brian Price, Ahmed Elgammal: Sherlock: Scalable Fact Learning in Images AAAI, 2017
	Ahmed Elgammal, Bingchen Liu, M. Elhoseiny, Marian Mazzone: Creative Adversarial Networks: Generating Art by Learning About Styles and Deviating from Style Norms ICCC, 2017

2016
	M. Elhoseiny, Tarek El-Gaaly, Amr Bakry, Ahmed Elgammal: A Comparative Analysis and Study of Multiview Convolutional Neural Network Models for Joint Object Categorization and Pose Estimation ICML, 2016
	Amr Bakry, M. Elhoseiny, Tarek El-Gaaly, Ahmed Elgammal: Digging Deep into the Layers of CNNs: In Search of How CNNs Achieve View Invariance ICLR, 2016
	M. Elhoseiny, Jingen Liu, Hui Cheng, Harpreet Sawhney, Ahmed Elgammal: Zero Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos AAAI, 2016
	Han Zhang, Tao Xu, M. Elhoseiny, Xiaolei Huang, Shaoting Zhang, Ahmed Elgammal, Dimitris Metaxas: SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition CVPR, 2016

2015
	M. Elhoseiny, Ahmed Elgammal: Overlapping Domain Cover for Scalable and Accurate Kernel Regression Machines BMVC, 2015
	Sheng Huang, M. Elhoseiny, Ahmed Elgammal: Learning Hypergraph-Regularized Attribute Predictors CVPR, 2015
	M. Elhoseiny, Ahmed Elgammal: Generalized Twin Gaussian Processes Using Sharma-Mittal Divergence ECML-PKDD, 2015
	M. Elhoseiny, Sheng Huang, Ahmed Elgammal: Weather Classification with Deep Convolutional Neural Networks ICIP, 2015

2014
	Sheng Huang, M. Elhoseiny, Ahmed Elgammal: Improving Non-Negative Matrix Factorization via Ranking Its Bases ICIP, 2014