Published
2021-12-15
Metrics
Metrics Loading ...

Big Data technology in the analysis of the state of the covid-19 pandemic in Colombia

DOI: https://doi.org/10.22490/25394088.5612
Section
Original article
Jorge Luis Quintero López
Andrés Arismendi Ramírez
Ángela Liceth Pérez Rendón

At the present time of the pandemic, there is a need to process large volumes of information generated by reported positive cases, in order to identify patterns that lead to facing the emergency with timely contingency measures. In the present study, the treatment of a data set of the general population of Colombia is proposed, with information from the month of March and April 2021, in order to characterize, georeference and predict to give value to the data, in search of an understanding of the dynamics of the virus, for which three Naive Bayes, Random Forest and J-48 tree models were used, seeking to identify the virus with greater precision; When using the Weka application, it is concluded that the model that best fits the prediction is the J-48 tree classification algorithm with a classification level of correct instances of 99.24%, with a Kappa value of 0.9266 reporting that there is close to 100% concordance in class classification, with an amount, for this case, of study of 221,583 classes and the prediction with 30 classes taken from the original base consisting of approximately 2,774,465 data. By applying statistical tests, it is possible to identify the correlation between the attributes, which leads to guaranteeing the correct modeling for the prediction. This process becomes a potential input to support the management processes of society and that benefits the decisions that are made in terms of public health.