Crowd Behavior Analysis and Prediction using the Feature Fusion Framework

The increasing number of people is a major cause of disasters that occur due to overcrowding. The gatherings of crowds in public places are a source of panic, which results in disaster. An analytical study was performed on crowd management. This is highly essential for the design of a well-planned public space, the possibility of surveillance in every area, and transportation systems. The disasters that occur due to uncontrollable crowd behaviour involve loss of property, fatalities, or casualties. To avoid this, the crowd’s behaviour was analysed. A MFF (multi-level feature fusion) framework was designed in this paper to predict behaviour. The first level of multi-level feature fusion employs motion and appearance, the second level employs spatial connections, and the third level employs temporal features. The combination of these characteristics aids in the exploitation of crowd behaviour. Furthermore, MFF was evaluated considering the web dataset, considering accuracy, precision, and recall as parameters. Comparative analysis was carried out with various existing methodologies with an accuracy of above 99 %.


INTRODUCTION
Crowd management is highly essential as it directly affects human lives.There are various situations in which massive crowd gatherings occur, such as games, religious gatherings, political events, concerts, society gatherings; educational events, and parties. (1)There are also occurrences of events that are unfamiliar.These events are distinguished and classified based on the objects that move in a crowded scenario.Patterns of motion that are detected in the surveillance videos, are known as "abnormal behaviour detection". (2)n surveillance videos, a mechanism based on Multiview clustering is applied to crowd gatherings. (3)A framework is proposed that is exclusive of parameters for the identification of structural properties.This mechanism was used to generate feature points for crowd conduct detection.Moreover, an analysis performed on the behaviour shows it can be categorized into subspaces.A methodology for classification is introduced in this proposed work to manage a crowd that has an imbalance.All the categories of temporal-spatial features are meant to share a relationship.Although there is a requirement for features in the dynamic sense for an efficient analysis of crowd. (4) framework of deep learning that involves semi-supervision is proposed for the examination of events that are abnormal in crowd gatherings.This deep learning mechanism aids in distinguishing crowd behaviour irregularities.This system detects the object point in surveillance videos of crowd masses automatically.There are efficient predictions that are performed using the concept of entropy reduction. (5)nother proposed system, based on a deep learning model known as CNN (Convolutional Neural Network), is used to detect crowd irregular behaviour in crowd videos using information based on motion.The architecture used in this system helps to differentiate abnormal from normal behaviour. (6)rowd conduct analysis has become a necessity due to the disasters that follow at crowd gatherings by occurrence of unusual events such as stampedes, natural disasters, attacks, bombings and, most commonly, traffic jams that occur under normal conditions.These events need proper management in order to avoid these disasters.As a result, video surveillance is critical for extracting and analysing human behaviour.Moreover, a smart automated monitoring system could handle these issues efficiently.This system would require feature optimization and analysis for accurate crowd management. (7)o analyse the unfamiliar events that occur in crowded scenarios, a framework based on deep learning methodology with a semi-supervised approach has been introduced.In crowded gatherings, this deep learning mechanism efficiently detects irregularities and differences.Additionally, automatic feature extraction of particular points and meaningful information about these features are provided.A concept based on entropy minimization is also proposed for effective and efficient predictions. (8)here are also several implications on security when it comes to crowd analysis.Any human operating on surveillance may miss any hints that could lead to an emergency situation; therefore, using an automated methodology is more enhanced and accurate.There have been numerous crowd disasters in the past, such as the disaster that occurred during the Love Parade in Germany in 2010.This disaster could have been avoided if a crowd analysis system had been in place. (9)mong other approaches, a data mining methodology based on WIFI sensing was used to analyse crowd behaviour in crowded conditions.In addition, for the collection of probe requests and the estimation of object patterns, a detailed and specific data analysis is performed.This study evaluates crowded conditions by using patterns based on spatial objects. (10)owever, crowd-based analysis is a difficult task that requires accurate and precise features for detection and other functions to be performed using surveillance videos.The extracted features are used in a crowd feature optimization process using a fused feature methodology to improve the efficiency of the result produced.

Motivation and contribution
The current situation regarding the crowd at social gatherings and public places is a cause of crowd calamities and disasters.This can be avoided by using an automated system for crowd analysis.The increasing need of individuals to maintain social distance among one another in public places also makes the development of these methodologies highly essential.For better control, the management of transportation systems, local stations, and the design of public spaces to be visited by the public for event gatherings necessitates an automated crowd management system.The absence of this system could result in a disaster such as a stampede, which has occurred in previous events around the world.The system proposed in this paper is a crowd analysis system that extracts physical features and creates a fused feature system for improved and accurate management.Moreover, considering the above motivation, this research work has the following contributions: • At first, the surveillance videos are used to extract the necessary physical features of individuals to learn the crowd conduct.These features are thoroughly studied considering the structure of the crowd.
• This research work introduces MFF (Multi-level feature fusion) to exploit the crowd behaviour feature; multilevel feature comprises three distinctive levels of feature exploitation.The first level of feature fusion exploits the physical feature, the second level of feature fusion exploits the spatial features, and the third Salud, Ciencia y Tecnología.2022; 2:251 2 level feature fusion exploits the temporal features.
• In order to evaluate the MFF performance, a web dataset was considered with evaluation parameters: accuracy, precision and recall.
• Moreover, performance analysis also includes the comparison with various algorithms; MFF outperforms the existing methodologies.
Each feature in this research work was cautiously evaluated to help study and enhance new features as well as learning the application of these features for the further part of this research.
The organization of this research work is such that the first section discusses the previous crowd analysis, it emphasizes on the background of crowd management systems with their feature extraction process, along with the motivation and contribution to carry out this work.The second phase includes the existing methodologies, as well as their shortcomings and various techniques that were used.The third section focuses on the development of a mathematical model for feature extraction and fused feature process.The fourth section contains a comparative study of the features for a better evaluation.

METHODS
In this section, the mathematical modelling of the crowd features was obtained using fused feature methodology.The methodology for crowd analysis uses physical features that are collectively termed as "crowd features".These crowd features are used for calculation and optimization, which further result in fused features.The fused feature mechanism provides an elaborate and detailed description of every feature that is being used in this study.The proposed mechanism for feature fusion is based on motion features.These crowd feature optimizations include features such as direction, motion, and speed, among others, obtained from surveillance videos by detecting objects such as individuals in the crowd.Figure 1 shows multilevel future fusion and how different levels of feature fusion will occur.

Figure 1. Multilevel Feature Fusion
Assuming a video frame of a crowd surveillance video is considered from a dataset is represented in the form of a matrix row such that it is a vector quantity.This is then represented as: In which the total number of frames in the crowd videos along with the total number of pixels that are available in each frame is denoted as the coefficient i and variable j respectively.his is used to obtain a mathematical model of low dimensionality.A matrix is created for eliminating the inconsistencies between matrix Q and matrix Y. Therefore, the optimization for the inconsistencies was done as given below; it E was also used for the minimization parameter.
Consider a video frame m, where the foreground frame Dm has a mapping size that is rearranged to the actual size of the frame.Considering the information of the foreground frame of i and j for generation of feature weights F h =λ h (D m ) and F k =λ k (D m ) respectively.This is performed on the basis of the motion flow method.Therefore, considering the frame information, the Crowd Feature Objects was designed as follows: In equation 2, D 0 is the threshold coefficient.This value remains constant and the foreground frame D m has pixels that are denoted as c(h,k).

Second Level Feature Fusion
In this section, the crowd features are mathematically represented, in which the model we proposed consists of a fused feature methodology based on the available data and the mechanisms that are available for the performance of encoding and decoding.In this case, the representation of the feature is done considering the crowd feature optimization J={j 1 ,j 2 ,…….j ω } in a dataset that is Considering the equation 3, the notation ω denotes the total number of crowd features that are trained whereas the notation represents the factor that is controlling.Y={p 1 ,p 2 ,…….p ω } is a set of all the features that are implied considering the total crowd features H={h 1 ,h 2 ,…….h ω }.Although, the coefficients Q c and Y are used for normalizing the model of the data which has been shown as represents regularization functions:

Third Level Feature Fusion
In the third level phase of crowd feature optimization is performed in the temporal domain while considering the fused feature methodology.Furthermore, the feature optimization performed in the first phase while considering the spatial domain is carried forward in the second phase.
The set of features for the kth frame has been present in the mth block is represented as H m,k ={k 1 ,k 2 ,…….k ω }.Here, the crowd feature optimizer is defined as by h ç,ç ∈[1, ω].The total number of optimizers used are denoted as the variable ω.Here, the set of features are evaluated by the use of a mechanism based on optical flow.This is represented as Y m,k ={p 1 ,…….kω }. and the set of features is given by x.Hence, a matrix of feature and coefficients is formed ω*x and the xth column of the matrix has been expressed as p x x .Therefore, the set of columns is expressed as a m,lk = {p 1 ,……p x }.Then, the features of the motion are given as For every frame in each of the blocks, the features of the motion have been generated for obtaining representation at the frame level.Furthermore, all the frames that have been used in the training of features has been denoted as k and every frame is divided into c blocks.Furthermore, the set of features that are used for testing are given by considering frame tests k which is denoted as Ø' and is given by equation 5.
In that equation, K m is denoted to represent the upper limit for the set of control features.Hence, the model of the data by considering the temporal domain is denoted as: Whereas: It´s used to represent the regularization term of the temporal domain.

In this case, the block representations are given as
There is also a distinction that has to be done between an abnormal and normal conduct by the use of classifiers that is given as Y m ∈ G e×f .This is the performance of the crowd feature optimizer using fused feature methodology in the temporal domain.

Performance evaluation
This section emphasizes on experimental results that were obtained by the use of multi-level feature fusion framework process performed with a fused feature methodology through video surveillance of crowd gatherings.The fused feature methodology aims at improving the efficiency of the extracted features by considering the evaluation of every particular feature that is extracted.The methodology proposed generates clear and specific descriptions of the features that are used for crowd pattern analysis.The analysis performed on these features were done through various visual analytics methods such as heat maps, video frames, etc.The analysis was performed on various patterns of crowd behaviour with different structures.
Motion features were mainly used for estimation of the tracking and detection system that was used further.These motion features are the physical attributes that are collectively termed as crowd feature optimization used on various types of datasets that represent different patterns of crowd gatherings.Performance of the crowd features and fused feature methodology was measured using multiple performance metrics such as precision, accuracy, area under the curve and recall, these metrics help in performing a comparative analysis.A comparative analysis was performed considering several datasets to which different algorithms were applied and these metrics evaluations were considered to find the more efficiently working model.The experimental results demonstrated performance efficiency of the fused features also considering the structure features, shapes along with motion features.Elimination or dropout of unwanted features are also done while the required features are fused.

Dataset Details
The performance of the proposed system using fused feature methodology was tested using a web dataset that is gathered through online sources such as Getty Images and ThoughtEquity.The web dataset consists of videos in urban areas which are of high quality, they also include documentary videos.This dataset includes both categories of abnormal as well as normal videos.The normal category of videos is 12 in number whereas the abnormal category of videos is 9 in number.Few examples of normal category of crowd videos include a pedestrian walking in the park and abnormal category of crowd videos include crowd protesting, fighting, and trying to escape from attacks. Figure 2 shows a comparative analysis using different algorithms for web data sets where multiple crowds in various environments and also different frames can be seen.Figure 3 shows results of accuracy by considering the different corridor scenarios for various video frames.

Figure 2. Comparative Analysis using various algorithms for web data set
Comparative Analysis using Various Algorithms for Web dataset Figure 3 also represents the accuracy of several models that are compared using different crowd analysis methodologies.Namely these analysis techniques for crowd behavior include Irregularity-Aware Semi-Supervised Deep Learning Model (IASSLM) (11) , Convolutional Neural Networks (CNN) (12) , WideResNet (13) and IMFF model using web datasets.The highest accuracy recorded was with the MFF algorithm (99,36 % -Table 1).This is the highest recorded accuracy of the MFF algorithm which is of great marginal value compared to the other crowd analysis methodologies that were used.Table 1 shows the accuracy results of all the algorithms that perform crowd analysis using web dataset.Using this web dataset, various algorithms such as the Social Force Model (SFM) (14) , Lagrangian Particle Trajectories (14) , Two-Stream CNN (14) , Cognitive Deep Model (14) , Two Stream Deep VGG (14) , Force Field (14) , Cognition (ConvLSTM) (14) , and MFF were also compared.
We also compared the precision and recall of the FCDLF mechanism with the proposed MFF mechanism (Table 2).Moreover, it is observed that MFF has a 99,82 % precision in comparison with 94,5 % of FCDLF.Furthermore, MFF observed a recall of 98,89 % in comparison with 93,35 % of FCDLF.Table 1 also gives the exact accuracies of these algorithms making the distinction between them even more precise.The highest accuracy recorded is by the IMFF algorithm which is 99,56 %.This is the highest recorded accuracy of the IMFF algorithm has a large marginal value in comparison to the other used crowd analysis methodologies. (15)

CONCLUSION
The current situation regarding the crowd at social gatherings and public places is a cause of crowd calamities and disasters.This can be avoided by using an automated system for crowd analysis.The increasing need of individuals maintaining social distance among one another in public places also make the development of these methodologies highly essential.Our research work introduces the multi-level feature fusion to predict the crowd behaviour.Multi-level feature fusion also known as MFF comprises three different levels.The first level of feature fusion focuses on the physical features, the second level of feature fusion focuses on the spatial features, and the third level feature fusion focuses on the temporal feature fusion.Moreover, MFF (Multi-level feature fusion) exploits the crowd feature and enhances the prediction.MFF was evaluated considering the Web dataset in terms of accuracy, precision and recall.Further comparative analysis was carried out with the existing model with an accuracy of 99,36 %, a precision score of 99,82 % and a recall value of 98,89 %.Although MFF outperforms the existing model, there are still several research gaps since crowd behaviour is highly unpredictable and individual, requiring the use of multiple datasets for further evaluation.

Figure 3 .
Figure 3. Accuracy results considering the corridor scenario

Figure 4 and 6 Figure 4 .
Figure 4. Efficiency considering the courtyard data frame

Figure 5 .
Figure 5. Classification error considering the courtyard data frame

Table 1 .
Methodologies Comparison In Terms Of Accuracy

Table 2 .
Precision and Recall Comparison