Publications

Journal articles

2023

  1. wang2023graph.png
    Time-Domain Speech Separation Networks With Graph Encoding Auxiliary
    Tingting Wang, Zexu Pan, Meng Ge, Zhen Yang, and Haizhou Li
    IEEE Signal Processing Letters, 2023

2022

  1. pan2021reentry.png
    Selective Listening by Synchronizing Speech with Lips
    Zexu Pan, Ruijie Tao, Chenglin Xu, and Haizhou Li
    IEEE/ACM Trans. Audio, Speech, Lang. Process., 2022
  1. usev21.png
    USEV: Universal Speaker Extraction With Visual Cue
    Zexu Pan, Meng Ge, and Haizhou Li
    IEEE/ACM Trans. Audio, Speech, Lang. Process., 2022
  1. pan2022seg.png
    Speaker Extraction with Co-Speech Gestures Cue
    Zexu Pan, Xinyuan Qian, and Haizhou Li
    IEEE Signal Processing Letters, 2022

Conference proceddings

2023

  1. pan2023imaginenet.png
    ImagineNet: Target Speaker Extraction with Intermittent Visual Cue Through Embedding Inpainting
    Zexu Pan, Wupeng Wang, Marvin Borsdorf, and Haizhou Li
    In Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2023
  1. jiang2023target.png
    Target Active Speaker Detection with Audio-visual Cues
    Yidi Jiang, Ruijie Tao, Zexu Pan, and Haizhou Li
    In Proc. INTERSPEECH, 2023
  1. zhang2023absence.png
    Speaker Extraction with Detection of Presence and Absence of Target Speakers
    Ke Zhang, Marvin Borsdorf, Zexu Pan, Haizhou Li, Yangjie Wei, and Yi Wang
    In Proc. INTERSPEECH, 2023
  1. li2023rethinking.png
    Rethinking the Visual Cues in Audio-visual Speaker Extraction
    Junjie Li, Meng Ge, Zexu Pan, Rui Cao, Longbiao Wang, Jianwu Dang, and Shiliang Zhang
    In Proc. INTERSPEECH, 2023

2022

  1. pan2022hybrid.png
    A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction
    Zexu Pan, Meng Ge, and Haizhou Li
    In Proc. INTERSPEECH, 2022
  1. tavcse2022.png
    VCSE: Time-Domain Visual-Contextual Speaker Extraction Network
    Junjie Li, Meng Ge, Zexu Pan, Longbiao Wang, and Jianwu Dang
    In Proc. INTERSPEECH, 2022

2021

  1. pan2020muse.png
    Muse: Multi-Modal Target Speaker Extraction with Visual Cues
    Zexu Pan, Ruijie Tao, Chenglin Xu, and Haizhou Li
    In Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2021
  1. tao2021someone.png
    Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection
    Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, and Haizhou Li
    In Proc. of the 29th ACM Int. Conf. on Multimedia, 2021
  1. qian2021multi.png
    Multi-target DoA Estimation with an Audio-visual Fusion Mechanism
    Xinyuan Qian, Maulik Madhavi, Zexu Pan, Jiadong Wang, and Haizhou Li
    In Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2021

2020

  1. pan2020multi.png
    Multi-Modal Attention for Speech Emotion Recognition
    Zexu Pan, Zhaojie Luo, Jichen Yang, and Haizhou Li
    In Proc. INTERSPEECH, 2020

ArXiv Preprint

2023

  1. pan2022towards.png
    Towards End-to-end Speaker Diarization in the Wild
    Zexu Pan, Gordon Wichern, François G Germain, Aswin Subramanian, and Jonathan Le Roux
    Submitted to Autom. Speech Recognit. Understanding Workshop, 2023