Valentin Gabeur
I am a postdoctoral researcher at FAIR, Meta AI. My research focuses on multi-modal learning for video understanding, at the intersection of computer vision, audio processing, speech recognition and natural language understanding.
I completed my PhD in October 2022 from Inria and Grenoble-Alpes University, where I worked in the Thoth team on multi-modal learning, advised by Cordelia Schmid and Karteek Alahari. During that time, I also worked as a Student Researcher at Google AI Research. I received a MS in Robotics from Toulouse III University in 2018.
Prior to that, I worked for 6 years on industrial automation and machine design in China, France and the USA, mostly as a mechanical engineer. I received a MS in Engineering from ICAM Lille in 2011.
Email  / 
CV  / 
Google Scholar  / 
LinkedIn  / 
Twitter  / 
GitHub
|
|
|
SAM 2: Segment Anything in Images and Videos
Nikhila Ravi*, Valentin Gabeur*, Yuan-Ting Hu*, Ronghang Hu*, Chaitanya Ryali*, Tengyu Ma*, Haitham Khedr*, Roman Rädle*, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan, Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, Christoph Feichtenhofer*
arXiv, 2024  
paper / demo / project page / blog / dataset / github
Promptable visual segmentation in images and videos.
|
|
AVATAR: Unconstrained Audiovisual Speech Recognition
Valentin Gabeur*, Paul Hongsuck Seo*, Arsha Nagrani*, Chen Sun, Karteek Alahari, Cordelia Schmid
INTERSPEECH, 2022  
arXiv / project page / bibtex
Leveraging the full frame visual context to improve speech recognition in videos.
|
|
Masking Modalities for Cross-modal Video Retrieval
Valentin Gabeur, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid
WACV, 2022  
arXiv / bibtex
Pre-training strategy for learning multi-modal fusion from unlabelled videos.
|
|
Multi-modal Transformer for Video Retrieval
Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid
ECCV, 2020 (Spotlight paper)  
arXiv / code, models, data / bibtex
Cross-modal architecture to encode language captions and videos in a common embedding space.
|
|
CVPR 2020 Video Pentathlon Challenge: Multi-modal Transformer for Video Retrieval
Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid
CVPR Video Pentathlon Workshop, 2020 (First place)  
report /
paper /
challenge /
recording
Winning approach for the CVPR 2020 Video Pentathlon Challenge, a video retrieval competition.
|
|
Moulding Humans: Non-parametric 3D Human Shape Estimation from Single Images
Valentin Gabeur, Jean-Sebastien Franco, Xavier Martin, Cordelia Schmid, Gregory Rogez
ICCV, 2019  
arXiv / bibtex
Efficient 3D shape representation through the combination of depth maps.
|
|