Big Data y diferentes enfoques de clustering subespacial: De la promoción en redes sociales al mapeo genómico
Artículo revisado por pares
Enviado: 27-04-2023
Revisado: 13-05-2023
Aceptado: 19-06-2023
Publicado: 20-06-2023
Editor: Fasi Ahamad Shaik, https://orcid.org/0000-0002-1216-5035
DOI:
https://doi.org/10.56294/saludcyt2023413Palabras clave:
Big Data, Clustering, Subespacio, Clasificación, Revisión integradoraResumen
En la era actual de las tecnologías de la información, la información es el factor más importante para determinar cómo progresarán los distintos paradigmas. Esta información debe extraerse de un enorme tesoro informático. El aumento de la cantidad de datos analizados e interpretados es consecuencia directa de la proliferación de plataformas de procesamiento más potentes, el incremento del espacio de almacenamiento disponible y la transición hacia el uso de plataformas electrónicas. En este trabajo se describe un estudio exhaustivo de Big Data, sus características y el papel que desempeña el algoritmo de clustering Subspace. La contribución más importante que hace este trabajo es que lee muchas investigaciones anteriores y luego hace una presentación exhaustiva sobre las diferentes formas en que otros autores han clasificado los métodos de clustering subespacial. Además, se han proporcionado, con una breve explicación, algoritmos significativos que pueden servir de referencia para cualquier desarrollo futuro.
Métricas
Citas
David JM, Balakrishnan K. Prediction of Key Symptoms of Learning Disabilities in School-Age Children using Rough Sets. Int J Comput Electr Eng. 2011;3(1):163-169.
Gupta R. Journey from data mining to Web Mining to Big Data. IJCTT. 2014;10(1):18-20.
Sharma PP, Navdeti CP. Securing Big Data Hadoop: A Review of Security Issues, Threats and Solution. IJCSIT. 2014;5(2):2126-2131.
Gupta R, Gupta S, Singhal A. Big Data: Overview. IJCTT. 2014;9(5).
Jain A. Data clustering: 50 years beyond k-means. Pattern Recognition Letters. 2010;31(8):651-666.
Jain A, Dubes R. Algorithms for Clustering Data. Prentice Hall; 1988.
Karger DR. Random sampling in cut, flow, and network design problems. STOC. 1994;648-657.
Laney D. 3-D data management: Controlling data volume, velocity and variety. Application Delivery Strategies by META Group Inc. [Internet]. 2001 [cited 2023 Jun 10]. Available from: http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf
Chen H, Chiang RHL, Storey VC. Business intelligence and analytics: From big data to big impact. MIS Quarterly. 2012;36(4):1165-1188.
Kwon O, Lee N, Shin B. Data quality management, data usage experience and acquisition intention of big data analytics. Int J Inf Manage. 2014;34(3):387-394.
TechAmerica Foundation’s Federal Big Data Commission. Demystifying big data: A practical guide to transforming the business of Government. [Internet]. 2012 [cited 2023 Jun 10]. Available from: http://www.techamerica.org/Docs/fileManager.cfm?f=techamerica-bigdatareport-final.pdf
Gartner IT Glossary. [Internet]. n.d. [cited 2023 Jun 10]. Available from: http://www.gartner.com/it-glossary/big-data/
Cukier K. The Economist, Data, data everywhere: A special report on managing information. February 25, 2010. [Internet]. [cited 2023 Jun 10]. Available from: http://www.economist.com/node/15557443
Chen L. Curse of Dimensionality. In: Liu L, Özsu MT, editors. Encyclopedia of Database Systems. Springer; 2009. p. 133.
Agrawal R, Gehrke J, Gunopulos D, Raghavan (1998) Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. SIGMOD. 1998;27(2):94-105.
Berchtold S, Bohm C, Keim D, Kriegel H-P. A Cost Model for Nearest Neighbour Search in High Dimensional Data Space. PODS. 1997;78-86.
Lance P, Haque E, Liu H. Subspace Clustering for High Dimensional Data: A Review. ACM SIGKDD Explorations Newsletter. 2004;6(1):90-105.
Ilango MR, Mohan V. A survey of Grid Based Clustering Algorithms. Int J Eng Sci Technol. 2010;2(8):3441-3446.
Karlton S, Zaki M. SCHISM: A New Approach to Interesting Subspace Mining. Int J Bus Intell Data Min. 2005;1(2):137-160.
Goil S, Nagesh H, Choudhary A. MAFIA: Efficient and Scalable Subspace Clustering for Very Large Data Sets. Technical Report CPDC-TR-9906-010. Northwestern University; 1999.
Kailing K, Kriegel H-P, Kroger P. Density-Connected Subspace Clustering for High Dimensional Data. SIAM International Conference on Data Mining. 2004;46-257.
Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS. Fast Algorithms for Projected Clustering. ACM SIGMOD International Conference on Management of Data. 1999;61-72.
Ng RT, Han J. CLARANS: A Method for Clustering.
Aggarwal C, Yu P. Finding Generalized Projected Clusters in High Dimensional Spaces. ACM SIGMOD International Conference on Management of Data. 2000;70–81.
Procopiuc C, Jones M, Agarwal PK, Murali TM. A Monte Carlo Algorithm for Fast Projective Clustering. ACM SIGMOD International Conference on Management of Data. 2002;418-427.
Kriegel HP, Kroger P, Zimek A. Clustering High-Dimensional Data: A Survey on Subspace Clustering, Pattern-Based Clustering, & Correlation Clustering. ACM TKDD. 2009;3(1):1.
Wang H, Wang W, Yang J, Yu P. Clustering by Pattern Similarity in Large Data Sets. ACM SIGMOD International Conference on Management of Data. 2002;394-405.
Bohm C, Kailing K, Kriegel H-P, Kroger P. Density Connected Clustering with Local Subspace Preferences. IEEE International Conference on Data Mining. 2004;27-34.
Friedman J, Meulman J. Clustering objects on subsets of attributes. J R Stat Soc Ser B. 2004;66:815-849.
Kriegel HP, Kroger P, Renz M, Wurst S. A Generic Framework for Efficient Subspace Clustering of High Dimensional Data. IEEE International Conference on Data Mining. 2005;250-257.
Blum A, Langley P. Selection of Relevant Features and Examples in Machine Learning. Artif Intell. 1997;97:245–271.
Müller E, Günnemann S, Assent I, Seidl T. Evaluating Clustering in Subspace Projections of High Dimensional Data. VLDB Endowment. 2009;2(1):1270-1281.
Cheng CH, Fu AW, Zhang Y. Entropy-based subspace clustering for mining numerical data. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1999;84-93.
Ng R, Han J. Efficient and effective clustering methods for spatial data mining. VLDB Conference. 1994;144-155.
Ester M, Kriegel H-P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial data sets with noise. Proc Int Conf Knowl Discov Data Min. 1996;226–231.
Katayama N, Satoh S. The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries. ACM SIGMOD International Conference on Management of Data. 1997.
Ester M, Kriegel H-P, Sander J, Wimmer M, Xu X. Incremental Clustering for Mining in a Data Warehousing Environment. VLDB Conference. 1998.
Assent I, Krieger R, Müller E, Seidl T. INSCY: Indexing Subspace Clusters with In Process-Removal of Redundancy. IEEE International Conference on Data Mining. 2008;414–425.
Assent I, Krieger R, Muller E, Seidl T. DUSC: Dimensionality Unbiased Subspace Clustering. IEEE Intl. Conf. on Data Mining (ICDM). 2007;409-414.
Müller E, Assesnt I, Gunnemann S, Seidl T. Scalable Density based Subspace Clustering. ACM Conference on Information and Knowledge Management (CIKM’11). 2011;1076-1086.
Sangapu SC, Prasad KSN, Kannan RJ, et al. Impact of class imbalance in VeReMi dataset for misbehavior detection in autonomous vehicles. Soft Comput. 2023. https://doi.org/10.1007/s00500-023-08003-4.
Descargas
Publicado
Cómo citar
Número
Sección
Categorías
Licencia
Derechos de autor 2023 Vijaya Kishore Veparala, Vattikunta Kalpana

Esta obra está bajo una licencia internacional Creative Commons Atribución 4.0.
Este artículo se distribuye bajo la licencia Creative Commons Attribution 4.0 License. A menos que se indique lo contrario, el material publicado asociado se distribuye bajo la misma licencia.