Detectron2 Tutorial - Search News

A Transformer-based Multimodal Feature Fusion Model for Video Captioning

Abstract: Video Captioning requires effective extraction and fusion of multimodal features, including visual, semantic, and textual information, to generate accurate natural language descriptions. To ...

IEEE

Traffic Sign Board Prediction Using Ensemble Model of YOLOv8 And Detectron2

Abstract: Road safety is greatly enhanced and smart decision-making for self-driving cars and intelligent modes of transportation is made possible by the detection and classification of traffic signs.

GitHub

Improving Open-Vocabulary Segmentation via Step-by-Step Visual Reasoning

Open-Vocabulary Segmentation (OVS) has drawn increasing attention for its capacity to generalize segmentation beyond predefined categories. However, existing methods typically predict segmentation ...

GitHub

JEDE: Universal Jersey Number Detector for Sports

Tested on PyTorch 1.8.1. conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch pip install opencv-python git clone https ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results