[4th July, 11:30 ] Dr. Ramazan S. Aygun: Unsupervised Speaker Identification for TV News
Place: TH:A-1455
Television (TV) networks produce a tremendous amount of information every day. Identifying the speakers throughout a video would help to analyze and understand the video content. Previous research has usually identified speakers on pre-trained faces of famous people for TV shows and movies.
News videos are challenging because new faces (or people) often appear. By using an unsupervised method, this paper proposes to label speakers using just the available information in the news video without external information. Our proposed framework segments the audio by speaker, parses closed captions for speaker names, identifies talking persons, and performs optical character recognition for speaker names.
The presentation will show
- how speaker diarizarion, face recognition, face landmarking, natural language processing, and optical character recognition tools can be effectively used utilized for speaker identification,
- how speakers who are not famous could be recognized using different modalities, and
- present results for identifying speakers for CNN news with overall accuracy of 63.6% including speakers just appearing once.
About the speaker
Read more: [4th July, 11:30 ] Dr. Ramazan S. Aygun: Unsupervised Speaker Identification for TV News