Audio Visual Identity Database

Audio-Visual Fusion With Temporal Convolutional Attention Network for Speech Separation

Abstract: Currently, audio-visual speech separation methods utilize the speaker's audio and visual correlation information to help separate the speech of the target speaker. However, these methods ...

IEEE

TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation

Abstract: Referring audio-visual segmentation (Ref-AVS) aims to segment objects within audio-visual scenes using multimodal cues embedded in text expressions. While the Segment Anything Model (SAM) ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Audio-Visual Fusion With Temporal Convolutional Attention Network for Speech Separation

TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation

Trending now