1Zhejiang University
2Tsinghua University
3Peking University
4Northeastern University
5Harbin Engineering University
†Indicates Corresponding Author
The talented pianist, 1900, mesmerized the audience with his virtuosic performance of "Christmas Eve" while wearing a pristine white tuxedo and bow tie.
Chris Gardner, a man with a box in his hand, runs frantically through the city, dodging people and cars while being chased by a taxi driver who is honking.
Dancing in the rain, Don Lockwood twirls with joy, umbrella in hand, amidst city streets.
Alice fled through the mushroom forest, her heart racing as the Bandersnatch's ominous hisses and growls echoed behind her.
Overview of MMAD: MMAD consists of multiple modality encoders used to generate movie narration
@inproceedings{ye2024mmad,
title={MMAD: Multi-modal Movie Audio Description},
author={Ye, Xiaojun and Chen, Junhao and Li, Xiang and Xin, Haidong and Li, Chao and Zhou, Sheng and Bu, Jiajun},
booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
pages={11415--11428},
year={2024}
}