3DRefTransformer: Fine-Grained Object Identification in Real-World Scenes Using Natural Language

Winter Conference on Applications of Computer Vision (WACV) 2022

Abstract


A point cloud is an agile 3D representation, efficiently modeling an object's surface geometry. However, these surface-centric properties also pose challenges on designing tools to recognize and synthesize point clouds. This work presents a novel autoregressive model, reftransformer, which generates realistic point cloud samples from scratch or conditioned on given semantic contexts. Our model operates recurrently, with each point sampled according to a conditional distribution given its previously-generated points. Since point cloud object shapes are typically encoded by long-range interpoint dependencies, we augment our model with dedicated self-attention modules to capture these relations. Extensive evaluation demonstrates that reftransformer achieves satisfying performance on both unconditional and conditional point cloud generation tasks, with respect to fidelity, diversity and semantic preservation. Further, conditional reftransformer learns a smooth manifold of given image conditions where 3D shape interpolation and arithmetic calculation can be performed inside.

Public Video



Materials


Code and Models


Citation

@inproceedings{abdelreheem2022reftransformer,
title={3DRefTransformer: Fine-Grained Object Identification in Real-World Scenes Using Natural Language},
author={Abdelreheem, Ahmed and Upadhyay, Ujjwal and Skorokhodov, Ivan and Yahya, Rawan Al and Chen, Jun and Elhoseiny, Mohamed},
booktitle={Winter Conference on Applications of Computer Vision},
year={2022}
}