Analisis Persebaran Flora di Sumatera Melalui Sistem Data Lakehouse Menggunakan Interpolasi Spatial Analysis Berbasis Hadoop dan Apache Spark
DOI:
https://doi.org/10.57203/session.v4i2.2026.12-20Keywords:
Hadoop, Apache Spark, Data Lakehouse, Spatial Analysis, Flora DistributionAbstract
Flora biodiversity on Sumatra Island is increasingly under pressure due to environmental changes and the limited ability to manage large-scale biodiversity data. This condition requires an approach that can efficiently integrate and analyze data to support data-driven conservation. This study aims to develop a spatial analysis system based on a data lakehouse using Hadoop and Apache Spark to map flora distribution in Sumatra. Data processing is carried out using the Medallion architecture (Bronze, Silver, Gold) and the Extract–Transform–Load (ETL) process with Apache Spark on data from the Global Biodiversity Information Facility (GBIF) for the period 2019–2023. The results show a significant improvement in processing performance, up to 16 times faster, with storage efficiency increased by 28%. This improvement enables large-scale data integration, allowing flora distribution patterns to be identified more clearly and comprehensively. Analysis of 12,840 species shows a dominance of Near Threatened (58.4%), followed by Least Concern (40.8%) and Endangered (0.7%), with distributions concentrated in the western and central regions of Sumatra. These findings indicate that most flora are in a vulnerable condition and confirm the effectiveness of integrating data lakehouse and spatial analysis in supporting data-driven conservation decision-making.
References
[1] Quipper, “Flora dan Fauna: Pengertian, Jenis-jenis, Karakteristik, dan Faktor yang Mempengaruhi Persebarannya,” 2023. [Online]. Available: https://www.quipper.com/id/blog/mapel/biologi/flora-dan-fauna/
[2] J. M. M. R. P. G. W. Jetz, “Big data approaches for biodiversity monitoring and conservation: Opportunities and challenges,” Biol. Conserv., 2022.
[3] S. P. Dewi and F. Rahmawati, “EKOBIODIV: Sistem Informasi Biodiversitas Berbasis Web,” 2021. [Online]. Available: https://e-jurnal.pnl.ac.id/semnaspnl/article/view/2708
[4] E. ~A. ~M. Zuhud et al., “IPB Biodiversity Informatics (IPBiotics) untuk Pembangunan Berkelanjutan,” Media Konservasi, vol. 19, no. 1, pp. 12–18, 2014, [Online]. Available: https://media.neliti.com/media/publications/231277-ipb-biodiversity-informatics-ipbiotics-u-673e1169.pdf
[5] C. Dunn, “Dell Data Lakehouse Sparks Big Data with Apache Spark,” 2024. [Online]. Available: https://www.dell.com/en-us/blog/dell-data-lakehouse-sparks-big-data-with-apache-spark/
[6] A. A. Harby, “Data Lakehouse: A Survey and Experimental Study,” ScienceDirect, 2025, [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0306437924001182
[7] Wired, “Apache Spark: Open Source Superstar Rewrites Future of Big Data,” 2013, [Online]. Available: https://www.wired.com/2013/06/yahoo-amazon-amplab-spark/
[8] C. Dunn, “Application of Geographic Information System (GIS) for Mapping of Spatial Distribution Characteristics of the Sumatran Tigers (Panthera tigris sumatrae) Prey in Besitang,” 2024. [Online]. Available: https://www.researchgate.net/publication/350925373_Application_of_Geographic_Information_System_GIS_for_mapping_of_spatial_distribution_characteristics_of_the_Sumatran_Tigers_Panthera_tigris_sumatrae_prey_in_Besitang?utm_source=chatgpt.com
[9] ArcGIS, “Spatial Join - An ArcGIS Pro Tutorial,” 2024. [Online]. Available: https://www.youtube.com/watch?v=PZwa0Mzq-Dk&utm_source=chatgpt.com
[10] S. Li et al., “Geospatial Big Data Handling Theory and Methods: A Review and Research Challenges,” ISPRS Int. J. Geoinf., vol. 5, no. 5, p. 55, 2016, doi: 10.3390/ijgi5050055.
[11] W. Jiang, Y. Wang, Z. Xiong, X. Song, Y. Long, and W. Cao, “Detecting urban events by considering long temporal dependency of sentiment strength in geotagged social media data,” ISPRS Int. J. Geoinf., vol. 10, no. 5, May 2021, doi: 10.3390/ijgi10050322.
[12] ResearchGate, “Waterfall Methodology, Prototyping and Agile Development,” 2021. [Online]. Available: https://www.researchgate.net/publication/353324450_Waterfall_Methodology_Prototyping_and_Agile_Development
[13] J. Smith and H. Lee, “A Systematic Review of Software Development Methodologies: Focus on Waterfall and Agile,” Journal of Software Engineering and Applications, vol. 14, no. 3, pp. 123–135, 2021, [Online]. Available: https://www.scirp.org/journal/paperinformation.aspx?paperid=109876
[14] L. Zhang, X. Chen, and W. Li, “Spatial Analysis of Biodiversity Patterns,” ResearchGate Preprint, 2020, [Online]. Available: https://www.researchgate.net/publication/341056743_Spatial_analysis_of_biodiversity_patterns
[15] D. E. Community, “Docker for Big Data Systems,” 2024. [Online]. Available: https://medium.com/@dataeng/docker-for-big-data-systems
[16] Global Biodiversity Information Facility, “Global Biodiversity Information Facility (GBIF),” 2025. [Online]. Available: https://www.gbif.org/
[17] M. Corporation, “Medallion Architecture Example Scenario,” 2023. [Online]. Available: https://learn.microsoft.com/en-us/azure/architecture/example-scenario/databricks/medallion-architecture
[18] P. V. Team, “Folium: Interactive Maps in Python,” 2024. [Online]. Available: https://python-visualization.github.io/folium/
[19] N. Nurainas, T. A. Taufiq, H. H. Handika, and S. S. Syamsuardi, “Flora of Sumatra: Vascular plant collection of selected families deposited at Herbarium of Andalas University (ANDA),” Mar. 2019, Herbarium of Andalas University.
[20] T. Robertson and et al., “The Global Biodiversity Information Facility: Data, trends, and challenges,” Biodivers. Data J., 2019.
[21] W. Jiang, Y. Wang, Z. Xiong, X. Song, Y. Long, and W. Cao, “Detecting Urban Events by Considering Long Temporal Dependency of Sentiment Strength in Geotagged Social Media Data,” ISPRS Int. J. Geoinf., vol. 10, no. 5, p. 322, 2021, doi: 10.3390/ijgi10050322.
Published
Issue
Section
License
Copyright (c) 2026 Elok Fiola Elok, Asa Do’a Uyi1, Dea Mutia Risani, Yohana Manik, Ardika Satria, Luluk Muthoharoh

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright :
Authors who publish their manuscripts in this journal agree to the following conditions.
- The copyright on each article belongs to the author(s).
- The author acknowledges that Jurnal SESSION: Software Development, Digital Business Intelligence, and Computer Engineering has the right to be the first to publish with a Creative Commons Attribution 4.0 International license (Attribution 4.0 International (CC BY 4.0).
- Authors can submit articles separately, arrange for the nonexclusive distribution of manuscripts that have been published in this journal into other versions (e.g., sent to the author's institutional repository, publication into books, etc.), by acknowledging that the manuscript has been published for the first time in the Jurnal SESSION: Software Development, Digital Business Intelligence, and Computer Engineering.
License:
![]()
Jurnal SESSION: Software Development, Digital Business Intelligence, and Computer Engineering is licensed under a Creative Commons Attribution 4.0 International License.
This license permits anyone to copy and redistribute this material in any form or format, compose, modify, and make derivatives of this material for any purpose, including commercial purposes, as long as the author credits the original work.

