Abstract:

In order to learn more emotionally inclined video and speech representations through auxiliary tasks, and improve the effect of multi-modal fusion, this paper proposes a multi-modal sentiment recognition method based on multi-task learning. A multimodal sharing layer is used to learn the sentiment information of the visual and acoustic modes. The experiment on MOSI and MOSEI data sets shows that adding two auxiliary single-modal sentiment recognition tasks can learn more effective single-modal sentiment representations, and improve the accuracy of sentiment recognition by 0.8% and 2.5% respectively.

Logo

分享最新的 NVIDIA AI Software 资源以及活动/会议信息,精选收录AI相关技术内容,欢迎大家加入社区并参与讨论。

更多推荐