Application of Spatiotemporal Analysis and Knowledge Discovery for Databases in the Bureau of Fire Protection as Incident Report System: Tool for Improving Fire Services

Purpose – This study aims to contribute to the fire research by developing a fire report management system for the BFP that can analyze spatiotemporal attributes of fire and


INTRODUCTION
The discovery of fire in prehistoric times has allowed humankind to achieve great strides in civilization. It can be assumed that it was also at the dawn of the Fire Age when our ancestors have discovered the destructive power that fire brings with it. In the modern world, fire incidents -whether caused by natural disasters (such as bush fires) or human negligence-poses a grave threat to people, properties, and the environment, resulting in physical, emotional, and economic damages (Yao & Zhang, 2016).

521
The Philippines has seen a decline in the number of fire incidents in recent years, though figures for human casualties and property losses remain high. The Bureau of Fire Protection (BFP) recorded a total of 14,197 fire incidents nationwide in 2017, considerably lower than the 19,292 cases recorded in 2016, though the number of civilian deaths (from 285 to 304) and amount of damages (from P3.08B to P7.86B) have both gone up (Bautista, 2018). The National Capital Region (NCR), with the city of Manila at its heart, has perennially been the region with the most number of reported fires nationwide, with 4,645 incidents and 105 civilian fatalities in 2017. Fire incidents, though not usually classified as disasters, remain to be a sizable challenge for the country with the huge losses they bring about. Velasco (2013), however, notes that unlike natural disasters, fire incidents can be prevented and the losses can be mitigated through risk reduction.
The City Headquarters of the BFP in Manila collect fire incident reports from the 16 fire stations located in the city monthly. The agency keeps these records mainly for reporting purposes. The data, however, can be made more useful if important information can be extracted from them for analysis. The results of such analysis may provide clues such as patterns of fire incidence that can be considered when planning fire prevention and risk reduction strategies. Such analysis can be facilitated by the use of specialized software and applications. Incidentally, extraction and analysis of information from databases is an area that information technology (IT) professionals have focused their recent research efforts on.
In 2018 the University of the Philippines in Cebu started a fire hazard mapping and fire spread modeling project called FireCheck (Bongcac, 2019;Nazario, 2019). The FireCheck Project team has developed a mobile application that shows three-dimensional (3D) community maps where information necessary for pre-fire response planning, such as building properties and road attributes, can be displayed (Nazario, 2019). Whether the project employs data of fire incidents from the BFP in its fire risk assessment and evaluation process has not been reported, however.
Also, Alqassim & Daeid (2014) and, locally, Bringula & Balahadia (2018) conducted spatiotemporal analyses of fire causes. Apart from the previously mentioned study that involved one of the authors, there is little research in the Philippines that uses a similar approach. Without backing from such research, fire prevention programs in the country may not have a solid grounding from which to base policies and programs that address the hazards that fire incidents cause.
This study aims to bridge this gap in local fire research by developing a fire report management system for the BFP that employs data mining and Knowledge Discovery in Databases (KDD) methods to analyze fire incidents data for spatiotemporal patterns that are instrumental in the prediction of levels of fire risk, efficient allocation of fire prevention resources, and effective planning and policy formulation for fire prevention programs.
This paper consists of Section 2-Methodologies, Section 3-Proposed System, and Section 4-Conclusions and Recommendation. Figure 1 presents the Knowledge Discovery in Databases (KDD) consists of five major phases, namely: Selection, Preprocessing, Transformation, Data Mining, and Evaluation (Cheng & Wang, 2006).

Data Selection
This study used data from fire incidents in Manila, the capital and the secondlargest city of the Philippines. With a population of 1.78 million in 2016 and 42,857 individuals per square kilometer, it is one of the world's most densely populated cities. A large numbers of its population, in particular the urban poor, live in makeshift houses built with light materials that account for the rapid spread of fire during fire incidents.

523
The data were obtained from fire incident reports provided by the city headquarters of the BFP in Manila. There were 3,506 cases recorded from January 1, 2011 to December 31, 2016. The entries in the reports contained information on the time, date, location, cause, alarm level, cost of damages, and establishment involved in the fire incidents shown in Table 1. The reporting format was standardized throughout the 14 municipalities comprising the capital city. The reports were imported into spreadsheet software and the records were kept for future reference. Year when a fire occurs Location Location or address of the fire incident Causes Causes of the fire incident Establishment Involved Type of establishment involved in the fire Alert Level Fire level alert of the fire occurrence Amount of Damages The determined amount of fire damages

Data Preprocessing
The spreadsheet containing the fire incident records were examined and formatted to ensure a consistent layout. The entries had columns for date, time, location, cause of the fire, alarm level, amount of damages, number of fatalities, and establishment involved. Additional columns were created for geographical coordinates of fire locations, which were manually geocoded based on the date of the incident. Incomplete or unidentified addresses or locations were either looked up on the Internet or verified by residents from the barangays affected. The causes of fire incidents were categorized into groups and the other attributes such as time, day, month, year, and district were coded accordingly.
A focus group discussion with the Operations and Arson Department of the BFP was held to discuss unclear entries in the fire reports, as well as to gain an understanding of the nature of fire incidents, the protocols that the fire agency follow when responding to such incidents, and how reporting is done. Additional documents were also checked to fill gaps in a few records. After the review, a master list of all fire incident reports was compiled in one spreadsheet file using Microsoft Excel.
Data preprocessing was done with SMOTE using WEKA and One-Hot Encoding to identify features of the datasets that affect the results of the evaluation. SMOTE, which stands for synthetic minority oversampling strategy, is a method for adjusting an imbalanced dataset to create an adjusted dataset. This technique is a useful approach to enhance arbitrary oversampling by appropriating the cases for the dominant part class and the minority class similarly. One-Hot Encoding, on the other hand, is a process in which categorical variables are converted into a form that machine learning (ML) algorithms can use for better prediction results.

Data transformation
A final transformation was done on the dataset before performing data mining and knowledge discovery by formatting all textual attributes to lowercase form, removing commas and periods in the numerical figures, and, most importantly, by applying the Microsoft Excel function of converting categorical data to numerical data for the application of One-Hot Encoding. This procedure was necessary because classifiers in the Python sckitlearn package are case-sensitive and treat each word as different from one another, even if they have similar meanings. These transformations are crucial in knowledge extraction to avoid compromising the reliability of the results. Lastly, each data model's results were verified using Python's Testing and Training Validation feature.

Data Mining
The output of several classifiers, which were supervised machine learning algorithms, were tested and evaluated to find the most accurate predictive model for producing new knowledge or identifying patterns from the fire incidents dataset. The classifiers used were: KNN, Logistic Regressions, Naïve Bayes, SVM, MLP, and Decision Tree.

Interpretation/evaluation
Interpretation of the mined data was performed to extract knowledge by visualizing the patterns and the data models generated and, when necessary, by iteratively reviewing the previous steps of the process. The dataset was divided into two parts, the training set, and the testing set. Entries in the dataset were first grouped according to the fire alert level, then 30% were randomly selected from each alert level. These entries comprised the training set while the remaining 70% was used as the testing set which was also done from the previous study (Balahadia, et al., 2019). Several data analysis algorithms were tested to determine the best classifier model based on accuracy, f-measure, and precision, as shown in Table 2.

Visualization
The selected classifiers were evaluated in terms of accuracy in classifying given labels. The classifier with the highest accuracy was then further evaluated in terms of precision and recall through a Confusion Matrix table. The output produced by the best classifier was shown to the BFP personnel for review and consideration, and recommendations were provided for specific areas of fire prevention and risk reduction programs that can be improved based on the results of the previous analyses.

Spatiotemporal Analysis
Spatial attributes and temporal attributes of the fire incidents datasets were arranged in order and numbered separately. These spatiotemporal characteristics were then loaded into the system developed by the researchers for visualization through mapping. Geographical coordinates for locations of fire incidents were produced using Geocode software and plotted on a map of Manila using Google Maps API. These procedures allowed for the identification of fire-prone areas and the frequency of fire incidents based on the previous six-year records of fire incidents in the city, which should be useful in planning preemptive fire incident response and prevention strategies.

Software Development Model
The Agile Model was employed in the development of the fire report management system for the BFP. It banks on teamwork, constant user feedback, continuous improvement, and the ability to adapt to changing movements for the success of a project.
The process consisted of several phases: Phase 1: Gathering data from the BFP and reviewing the literature on fire incident patterns and spatiotemporal analysis. Phase 2: All gathered data were reviewed and evaluated to formulate plans, objectives, target clients, and concepts of the system. Phase 3: After data evaluation, all requirements needed in the study were collected. Specialized software applications such as Python, WEKA, Java, and Adobe Photoshop for the design of the website were used.

526
Phase 4: The flow of the application was analyzed, its layout was designed and, finally, the model for forecasting and training the data was produced. Phase 5: After the development stage, the application was tested to check for errors or bugs and to verify if the classifiers/algorithm were properly working so that the appropriate solutions can be applied. Phase 6: Testing and evaluation methods were applied to assess the performance of the system. Phase 7: After the evaluation, the researcher set forth a plan to improve the maintenance of the application accordingly, followed by the implementation of such improvements.

Developed System
The fire report management system is a web-based application that allows access to historical fire incident reports from the BFP. It is made up of five main modules: (1) Announcement and Posting Module; (2) Fire Report Module; (3) Mapping Module; (4) Fire Pattern Module; and (5) Account and Setting Module. The users (BFP Admin, BFP Stations Personnel, and Community) have different levels of authority in accessing the system.

Announcement and Posting Module
This module allows community users and BFP Stations Personnel to view all announcements and activities posted by the BFP Admin. It also displays News, Updates, BFP Activities and Programs, Gallery, and Do's and Don'ts during Fire Occurrences. Its main purpose is to spread awareness and educate the community about fire occurrences.

Fire Report Module
BFP Stations Personnel were provided with user accounts to enable them to transmit fire incident reports through the system. All the reports are saved in a database for easy retrieval. Descriptive analytics and spatiotemporal analysis of all fire reports generated by the BFP Admin are applied in this module, from which a summary of fire incident reports from all 16 BFP fire stations in Manila is displayed through graphs and tables.

Mapping Module
The historical data stored in the database can be used to plot locations of fire incidents on a map through the use of geographical coordinates. Clicking a marker on the map will show details about the fire incidents in that location. The map also provides information about the degree of fire concentration (low, moderate, or high) in every municipality/district of Manila, as shown in Figure 2. This will inform the community of fire-prone areas, or fire hot spots, in the city.

Fire Pattern Module
This module will generate the knowledge discovered during the data mining process of the KDD. The classifier and developed model will identify the alert level of fire in a specific location in Manila, generate an output showing patterns of fire incidents, current fire prevention activities in the area, and provide recommendations, as shown in Figure 3. Also, this module allows the user to generate a dataset based on what on attributes chosen in the combo boxes through the Generate Recommendations section, as shown in

Account and Setting Module
System access for BFP station personnel is authorized through this module upon the approval of the BFP administrator. Additionally, the admin can add, edit, and update all the announcements, news, galleries, activities, and programs of the BFP in the system through this module.

KDD Pre-Processing ( SMOTE)
The SMOTE technique was employed in preprocessing the dataset for KDD. SMOTE is a powerful oversampling method used in machine learning with imbalanced high-dimensional data. In processing the dataset the focus was on the following attributes: alert levels 1/2/3, verification, as per request, and not indicated, which all contained the needed number to perform the data preprocessing before building the model.
Trial and error of smoothing were done until a balanced dataset was reached with SMOTE (200%) as presented in Figure 5. The given classes of Fire Alert 1, 2, 3, and another low (not indicated, verification, as per request) with 459 cases per alert gave a better performance in balancing the dataset.

KDD Pre-Processing ( One-Hot-Encoding)
A preprocessing technique called One-Hot Encoding was adopted to transform textual data into numerical data that machines can understand. The technique was applied to data from the BFP fire incident records for the following attributes: time of the day (in 12-hour format), date, month, year, day, district, type of causes, type of establishment, alert level, and amount of damages, as presented in Table 3. From the original 10 features, a total of 123 features were generated through this technique.

Knowledge Discovery for Database (KDD) Data Mining
Extensive preprocessing of data produced a merged dataset consisting of 1836 entries (number of fire incidents). The merged dataset was divided into two parts, the training set, and the testing set, using Python. The entries were categorized according to the fire alert level, 30% of which were randomly selected and used as the training set, while the rest comprised the testing set. After data validation, a series of experiments were performed to identify the most appropriate data mining model. The Decision Tree (DT) model turned out to be the best classifier, with 95.9% accuracy.
Several IF-THEN-ELSE tables containing decision rules were created to facilitate interpretation of the Decision Tree (DT) results. These rules are shown in Tables 4, 5, and 6. Alert Level 1 covers cases with a low amount of damages, mostly falling under the category 'Under Investigation'. The rules for Alert Level 2 illustrate different scenarios that applied mostly for incidents that affected residential-commercial types of establishments in the district of Intramuros. The DT rules for Alert Level 3, which indicate a high alert level, involve a high amount of damages. Fire incidents categorized as Alert Level 3 mostly occurred in the mornings, during Sundays when most people are in their homes, and a large number of which happened in the district of Pandacan.