Big Data y diferentes enfoques de clustering subespacial: De la promoción en redes sociales al mapeo genómico

Autores/as

DOI:

https://doi.org/10.56294/saludcyt2023413

Palabras clave:

Big Data, Clustering, Subespacio, Clasificación, Revisión integradora

Resumen

En la era actual de las tecnologías de la información, la información es el factor más importante para determinar cómo progresarán los distintos paradigmas. Esta información debe extraerse de un enorme tesoro informático. El aumento de la cantidad de datos analizados e interpretados es consecuencia directa de la proliferación de plataformas de procesamiento más potentes, el incremento del espacio de almacenamiento disponible y la transición hacia el uso de plataformas electrónicas. En este trabajo se describe un estudio exhaustivo de Big Data, sus características y el papel que desempeña el algoritmo de clustering Subspace. La contribución más importante que hace este trabajo es que lee muchas investigaciones anteriores y luego hace una presentación exhaustiva sobre las diferentes formas en que otros autores han clasificado los métodos de clustering subespacial. Además, se han proporcionado, con una breve explicación, algoritmos significativos que pueden servir de referencia para cualquier desarrollo futuro.

Métricas

Cargando métricas ...

Citas

David JM, Balakrishnan K. Prediction of Key Symptoms of Learning Disabilities in School-Age Children using Rough Sets. Int J Comput Electr Eng. 2011;3(1):163-169.

Gupta R. Journey from data mining to Web Mining to Big Data. IJCTT. 2014;10(1):18-20.

Sharma PP, Navdeti CP. Securing Big Data Hadoop: A Review of Security Issues, Threats and Solution. IJCSIT. 2014;5(2):2126-2131.

Gupta R, Gupta S, Singhal A. Big Data: Overview. IJCTT. 2014;9(5).

Jain A. Data clustering: 50 years beyond k-means. Pattern Recognition Letters. 2010;31(8):651-666.

Jain A, Dubes R. Algorithms for Clustering Data. Prentice Hall; 1988.

Karger DR. Random sampling in cut, flow, and network design problems. STOC. 1994;648-657.

Laney D. 3-D data management: Controlling data volume, velocity and variety. Application Delivery Strategies by META Group Inc. [Internet]. 2001 [cited 2023 Jun 10]. Available from: http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf

Chen H, Chiang RHL, Storey VC. Business intelligence and analytics: From big data to big impact. MIS Quarterly. 2012;36(4):1165-1188.

Kwon O, Lee N, Shin B. Data quality management, data usage experience and acquisition intention of big data analytics. Int J Inf Manage. 2014;34(3):387-394.

TechAmerica Foundation’s Federal Big Data Commission. Demystifying big data: A practical guide to transforming the business of Government. [Internet]. 2012 [cited 2023 Jun 10]. Available from: http://www.techamerica.org/Docs/fileManager.cfm?f=techamerica-bigdatareport-final.pdf

Gartner IT Glossary. [Internet]. n.d. [cited 2023 Jun 10]. Available from: http://www.gartner.com/it-glossary/big-data/

Cukier K. The Economist, Data, data everywhere: A special report on managing information. February 25, 2010. [Internet]. [cited 2023 Jun 10]. Available from: http://www.economist.com/node/15557443

Chen L. Curse of Dimensionality. In: Liu L, Özsu MT, editors. Encyclopedia of Database Systems. Springer; 2009. p. 133.

Agrawal R, Gehrke J, Gunopulos D, Raghavan (1998) Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. SIGMOD. 1998;27(2):94-105.

Berchtold S, Bohm C, Keim D, Kriegel H-P. A Cost Model for Nearest Neighbour Search in High Dimensional Data Space. PODS. 1997;78-86.

Lance P, Haque E, Liu H. Subspace Clustering for High Dimensional Data: A Review. ACM SIGKDD Explorations Newsletter. 2004;6(1):90-105.

Ilango MR, Mohan V. A survey of Grid Based Clustering Algorithms. Int J Eng Sci Technol. 2010;2(8):3441-3446.

Karlton S, Zaki M. SCHISM: A New Approach to Interesting Subspace Mining. Int J Bus Intell Data Min. 2005;1(2):137-160.

Goil S, Nagesh H, Choudhary A. MAFIA: Efficient and Scalable Subspace Clustering for Very Large Data Sets. Technical Report CPDC-TR-9906-010. Northwestern University; 1999.

Kailing K, Kriegel H-P, Kroger P. Density-Connected Subspace Clustering for High Dimensional Data. SIAM International Conference on Data Mining. 2004;46-257.

Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS. Fast Algorithms for Projected Clustering. ACM SIGMOD International Conference on Management of Data. 1999;61-72.

Ng RT, Han J. CLARANS: A Method for Clustering.

Aggarwal C, Yu P. Finding Generalized Projected Clusters in High Dimensional Spaces. ACM SIGMOD International Conference on Management of Data. 2000;70–81.

Procopiuc C, Jones M, Agarwal PK, Murali TM. A Monte Carlo Algorithm for Fast Projective Clustering. ACM SIGMOD International Conference on Management of Data. 2002;418-427.

Kriegel HP, Kroger P, Zimek A. Clustering High-Dimensional Data: A Survey on Subspace Clustering, Pattern-Based Clustering, & Correlation Clustering. ACM TKDD. 2009;3(1):1.

Wang H, Wang W, Yang J, Yu P. Clustering by Pattern Similarity in Large Data Sets. ACM SIGMOD International Conference on Management of Data. 2002;394-405.

Bohm C, Kailing K, Kriegel H-P, Kroger P. Density Connected Clustering with Local Subspace Preferences. IEEE International Conference on Data Mining. 2004;27-34.

Friedman J, Meulman J. Clustering objects on subsets of attributes. J R Stat Soc Ser B. 2004;66:815-849.

Kriegel HP, Kroger P, Renz M, Wurst S. A Generic Framework for Efficient Subspace Clustering of High Dimensional Data. IEEE International Conference on Data Mining. 2005;250-257.

Blum A, Langley P. Selection of Relevant Features and Examples in Machine Learning. Artif Intell. 1997;97:245–271.

Müller E, Günnemann S, Assent I, Seidl T. Evaluating Clustering in Subspace Projections of High Dimensional Data. VLDB Endowment. 2009;2(1):1270-1281.

Cheng CH, Fu AW, Zhang Y. Entropy-based subspace clustering for mining numerical data. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1999;84-93.

Ng R, Han J. Efficient and effective clustering methods for spatial data mining. VLDB Conference. 1994;144-155.

Ester M, Kriegel H-P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial data sets with noise. Proc Int Conf Knowl Discov Data Min. 1996;226–231.

Katayama N, Satoh S. The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries. ACM SIGMOD International Conference on Management of Data. 1997.

Ester M, Kriegel H-P, Sander J, Wimmer M, Xu X. Incremental Clustering for Mining in a Data Warehousing Environment. VLDB Conference. 1998.

Assent I, Krieger R, Müller E, Seidl T. INSCY: Indexing Subspace Clusters with In Process-Removal of Redundancy. IEEE International Conference on Data Mining. 2008;414–425.

Assent I, Krieger R, Muller E, Seidl T. DUSC: Dimensionality Unbiased Subspace Clustering. IEEE Intl. Conf. on Data Mining (ICDM). 2007;409-414.

Müller E, Assesnt I, Gunnemann S, Seidl T. Scalable Density based Subspace Clustering. ACM Conference on Information and Knowledge Management (CIKM’11). 2011;1076-1086.

Sangapu SC, Prasad KSN, Kannan RJ, et al. Impact of class imbalance in VeReMi dataset for misbehavior detection in autonomous vehicles. Soft Comput. 2023. https://doi.org/10.1007/s00500-023-08003-4.

Publicado

2023-06-20

Cómo citar

1.
Kishore Veparala V, Kalpana V. Big Data y diferentes enfoques de clustering subespacial: De la promoción en redes sociales al mapeo genómico. Salud, Ciencia y Tecnología [Internet]. 20 de junio de 2023 [citado 28 de noviembre de 2023];3:413. Disponible en: https://revista.saludcyt.ar/ojs/index.php/sct/article/view/413

Número

Sección

Revisiones bibliográficas

Categorías