1Zhejiang University
            
            2Tsinghua University
            
            3Peking University
            
            4Northeastern University
            
            5Harbin Engineering University
            
            
            
            †Indicates Corresponding Author
            
        The talented pianist, 1900, mesmerized the audience with his virtuosic performance of "Christmas Eve" while wearing a pristine white tuxedo and bow tie.
        Chris Gardner, a man with a box in his hand, runs frantically through the city, dodging people and cars while being chased by a taxi driver who is honking.
        Dancing in the rain, Don Lockwood twirls with joy, umbrella in hand, amidst city streets.
        Alice fled through the mushroom forest, her heart racing as the Bandersnatch's ominous hisses and growls echoed behind her.
          Overview of MMAD: MMAD consists of multiple modality encoders used to generate movie narration
@inproceedings{ye2024mmad,
  title={MMAD: Multi-modal Movie Audio Description},
  author={Ye, Xiaojun and Chen, Junhao and Li, Xiang and Xin, Haidong and Li, Chao and Zhou, Sheng and Bu, Jiajun},
  booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
  pages={11415--11428},
  year={2024}
}