We use spatiotemporal attention module to enhance the extraction of visual features and combine the Graph Convolutional Network (GCN) and the Long and Short Term Memory (LSTM) to analyze the audio ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results