Video Scene Location Recognition Using AI: Abstract and Introduction

26 Jun 2024


(1) Lukáš Korel, Faculty of Information Technology, Czech Technical University, Prague, Czech Republic;

(2) Petr Pulc, Faculty of Information Technology, Czech Technical University, Prague, Czech Republic;

(3) Jirí Tumpach, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic;

(4) Martin Holena, Institute of Computer Science, Academy of Sciences of the Czech Republic, Prague, Czech Republic.

Abstract and Introduction

ANN-Based Scene Classification



Conclusion and Future Research, Acknowledgments and References

Abstract: This paper provides an insight into the possibility of scene recognition from a video sequence with a small set of repeated shooting locations (such as in television series) using artificial neural networks. The basic idea of the presented approach is to select a set of frames from each scene, transform them by a pre-trained singleimage pre-processing convolutional network, and classify the scene location with subsequent layers of the neural network. The considered networks have been tested and compared on a dataset obtained from The Big Bang Theory television series. We have investigated different neural network layers to combine individual frames, particularly AveragePooling, MaxPooling, Product, Flatten, LSTM, and Bidirectional LSTM layers. We have observed that only some of the approaches are suitable for the task at hand.

1 Introduction

People watching videos are able to recognize where the current scene is located. When watching some film or serial, they are able to recognize that a new scene is on the same place they have already seen. Finally, people are able to understand scenes hierarchy. All this supports human comprehensibility of videos.

The role of location identification in scene recognition by humans motivated our research into scene location classification by artificial neural networks (ANNs). A more ambitious goal would be a make system able to remember unknown video locations and using this data identify video scene that is located in that location and mark it with the same label. This paper reports a work in progress in that direction. It describes the employed methodology and presents first experimental results obtained with six kinds of neural networks.

The rest of the paper is organized as follows. The next section is about existing approaches to solve this problem. Section 3 is divided to two parts. The first one is about data preparation before their usage in ANNs. The second one is about design of the ANNs in our experiments. Finally, Section 4 – the last section before the conclusion shows our results of experiments with these ANNs.

This paper is available on arxiv under CC0 1.0 DEED license.