Analisis Persebaran Flora di Sumatera Melalui Sistem Data Lakehouse Menggunakan Interpolasi Spatial Analysis Berbasis Hadoop dan Apache Spark

Authors

  • Elok Fiola Elok Institut Teknologi Sumatera
  • Asa Do’a Uyi1 Institut Teknologi Sumatera
  • Dea Mutia Risani Institut Teknologi Sumatera
  • Yohana Manik Institut Teknologi Sumatera
  • Ardika Satria Institut Teknologi Sumatera
  • Luluk Muthoharoh Institut Teknologi Sumatera

DOI:

https://doi.org/10.57203/session.v4i2.2026.12-20

Keywords:

Hadoop, Apache Spark, Data Lakehouse, Spatial Analysis, Flora Distribution

Abstract

Flora biodiversity on Sumatra Island is increasingly under pressure due to environmental changes and the limited ability to manage large-scale biodiversity data. This condition requires an approach that can efficiently integrate and analyze data to support data-driven conservation. This study aims to develop a spatial analysis system based on a data lakehouse using Hadoop and Apache Spark to map flora distribution in Sumatra. Data processing is carried out using the Medallion architecture (Bronze, Silver, Gold) and the Extract–Transform–Load (ETL) process with Apache Spark on data from the Global Biodiversity Information Facility (GBIF) for the period 2019–2023. The results show a significant improvement in processing performance, up to 16 times faster, with storage efficiency increased by 28%. This improvement enables large-scale data integration, allowing flora distribution patterns to be identified more clearly and comprehensively. Analysis of 12,840 species shows a dominance of Near Threatened (58.4%), followed by Least Concern (40.8%) and Endangered (0.7%), with distributions concentrated in the western and central regions of Sumatra. These findings indicate that most flora are in a vulnerable condition and confirm the effectiveness of integrating data lakehouse and spatial analysis in supporting data-driven conservation decision-making.

 

References

[1] Quipper, “Flora dan Fauna: Pengertian, Jenis-jenis, Karakteristik, dan Faktor yang Mempengaruhi Persebarannya,” 2023. [Online]. Available: https://www.quipper.com/id/blog/mapel/biologi/flora-dan-fauna/

[2] J. M. M. R. P. G. W. Jetz, “Big data approaches for biodiversity monitoring and conservation: Opportunities and challenges,” Biol. Conserv., 2022.

[3] S. P. Dewi and F. Rahmawati, “EKOBIODIV: Sistem Informasi Biodiversitas Berbasis Web,” 2021. [Online]. Available: https://e-jurnal.pnl.ac.id/semnaspnl/article/view/2708

[4] E. ~A. ~M. Zuhud et al., “IPB Biodiversity Informatics (IPBiotics) untuk Pembangunan Berkelanjutan,” Media Konservasi, vol. 19, no. 1, pp. 12–18, 2014, [Online]. Available: https://media.neliti.com/media/publications/231277-ipb-biodiversity-informatics-ipbiotics-u-673e1169.pdf

[5] C. Dunn, “Dell Data Lakehouse Sparks Big Data with Apache Spark,” 2024. [Online]. Available: https://www.dell.com/en-us/blog/dell-data-lakehouse-sparks-big-data-with-apache-spark/

[6] A. A. Harby, “Data Lakehouse: A Survey and Experimental Study,” ScienceDirect, 2025, [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0306437924001182

[7] Wired, “Apache Spark: Open Source Superstar Rewrites Future of Big Data,” 2013, [Online]. Available: https://www.wired.com/2013/06/yahoo-amazon-amplab-spark/

[8] C. Dunn, “Application of Geographic Information System (GIS) for Mapping of Spatial Distribution Characteristics of the Sumatran Tigers (Panthera tigris sumatrae) Prey in Besitang,” 2024. [Online]. Available: https://www.researchgate.net/publication/350925373_Application_of_Geographic_Information_System_GIS_for_mapping_of_spatial_distribution_characteristics_of_the_Sumatran_Tigers_Panthera_tigris_sumatrae_prey_in_Besitang?utm_source=chatgpt.com

[9] ArcGIS, “Spatial Join - An ArcGIS Pro Tutorial,” 2024. [Online]. Available: https://www.youtube.com/watch?v=PZwa0Mzq-Dk&utm_source=chatgpt.com

[10] S. Li et al., “Geospatial Big Data Handling Theory and Methods: A Review and Research Challenges,” ISPRS Int. J. Geoinf., vol. 5, no. 5, p. 55, 2016, doi: 10.3390/ijgi5050055.

[11] W. Jiang, Y. Wang, Z. Xiong, X. Song, Y. Long, and W. Cao, “Detecting urban events by considering long temporal dependency of sentiment strength in geotagged social media data,” ISPRS Int. J. Geoinf., vol. 10, no. 5, May 2021, doi: 10.3390/ijgi10050322.

[12] ResearchGate, “Waterfall Methodology, Prototyping and Agile Development,” 2021. [Online]. Available: https://www.researchgate.net/publication/353324450_Waterfall_Methodology_Prototyping_and_Agile_Development

[13] J. Smith and H. Lee, “A Systematic Review of Software Development Methodologies: Focus on Waterfall and Agile,” Journal of Software Engineering and Applications, vol. 14, no. 3, pp. 123–135, 2021, [Online]. Available: https://www.scirp.org/journal/paperinformation.aspx?paperid=109876

[14] L. Zhang, X. Chen, and W. Li, “Spatial Analysis of Biodiversity Patterns,” ResearchGate Preprint, 2020, [Online]. Available: https://www.researchgate.net/publication/341056743_Spatial_analysis_of_biodiversity_patterns

[15] D. E. Community, “Docker for Big Data Systems,” 2024. [Online]. Available: https://medium.com/@dataeng/docker-for-big-data-systems

[16] Global Biodiversity Information Facility, “Global Biodiversity Information Facility (GBIF),” 2025. [Online]. Available: https://www.gbif.org/

[17] M. Corporation, “Medallion Architecture Example Scenario,” 2023. [Online]. Available: https://learn.microsoft.com/en-us/azure/architecture/example-scenario/databricks/medallion-architecture

[18] P. V. Team, “Folium: Interactive Maps in Python,” 2024. [Online]. Available: https://python-visualization.github.io/folium/

[19] N. Nurainas, T. A. Taufiq, H. H. Handika, and S. S. Syamsuardi, “Flora of Sumatra: Vascular plant collection of selected families deposited at Herbarium of Andalas University (ANDA),” Mar. 2019, Herbarium of Andalas University.

[20] T. Robertson and et al., “The Global Biodiversity Information Facility: Data, trends, and challenges,” Biodivers. Data J., 2019.

[21] W. Jiang, Y. Wang, Z. Xiong, X. Song, Y. Long, and W. Cao, “Detecting Urban Events by Considering Long Temporal Dependency of Sentiment Strength in Geotagged Social Media Data,” ISPRS Int. J. Geoinf., vol. 10, no. 5, p. 322, 2021, doi: 10.3390/ijgi10050322.

Published

01-04-2026

How to Cite

Analisis Persebaran Flora di Sumatera Melalui Sistem Data Lakehouse Menggunakan Interpolasi Spatial Analysis Berbasis Hadoop dan Apache Spark. (2026). Software Development, Digital Business Intelligence, and Computer Engineering, 4(2), 12-20. https://doi.org/10.57203/session.v4i2.2026.12-20

Similar Articles

1-10 of 28

You may also start an advanced similarity search for this article.