Index112タイトルTransformer based multimodal scene recognition in soccer videos出典IEEE International Conference on Multimedia & Expo (ICME) Workshops, pp.1-6