Abstract

Videos are one of the most engaging and interesting mediums of effective information delivery and constitute the majority of the content generated online today. As human attention span shrinks, it is imperative to shorten videos while maintaining most of its information. The premier challenge is that summaries more intuitive to a human are difficult for machines to generalize. We present a simple approach to video summarization using Kernel Temporal Segmentation (KTS) for shot segmentation and a global attention based modified memory network module with LSTM for shot score learning. The modified memory network termed as Global Attention Memory Module (GAMM) increases the learning capability of the model and with the addition of LSTM, it is further able to learn better contextual features. Experiments on the benchmark datasets TVSum and SumMe show that our method outperforms the current state of the art by about 15%.