PALM-SEA: Historical Palm Leaf Manuscript Analysis for Southeast Asia Regions

Nimol Thuon*, Jun Du, Jianshu Zhang, Zhenrong Zhang, Jiefeng Ma, Ranysakol Thuon, Panhapin Theang, Chan Theara Sok

NERC-SLIP, USTC, China | iFlyTek Research, China | Institute of Technology of Cambodia | Universitas Gadjah Mada, Indonesia

Project Overview

The PALM-SEA project is dedicated to the digital preservation and analysis of historical palm leaf manuscripts from Southeast Asia. These manuscripts are invaluable cultural artifacts, containing centuries of knowledge on topics ranging from religious texts and historical records to literature and traditional medicine. However, their organic nature makes them highly susceptible to degradation, posing a significant risk to this heritage.

Our project addresses this challenge by creating the largest publicly available, multi-script dataset of palm leaf manuscripts, featuring scripts like Sundanese, Balinese, and Khmer. We develop and benchmark advanced computational methods for critical tasks such as document enhancement, isolated glyph classification, and full text recognition. By creating robust digital tools, we aim to unlock the rich information held within these manuscripts for scholars, historians, and future generations.

Collage of palm leaf manuscripts

Project Tasks & Details

Our research is structured around four core tasks, each addressing a specific challenge in the digital analysis of palm leaf manuscripts. Explore the details for each task below.

Manuscript Collections

Sundanese Manuscripts

Originating from West Java, these manuscripts are written in the Old Sundanese script. The collection showcases the script's distinct rounded letterforms and complex ligatures. Key challenges include high character shape variability due to natural wear and the presence of overlapping text lines, requiring advanced image enhancement.

Balinese Manuscripts

From Bali and Lombok, these texts cover a rich array of topics. The intricate Balinese script features a mix of base consonants and vowel diacritics, creating complex ligatures and stacked forms. The presence of decorative elements intertwined with the script complicates segmentation and recognition.

Khmer Manuscripts

From Cambodia, these use one of the oldest scripts in Southeast Asia. The Khmer script is known for its curvilinear shapes and unique subscript characters, adding complexity to the text structure. Many manuscripts are severely degraded, with faint or fragmented characters requiring specialized restoration techniques.

Mixed Script Dataset

This collection combines samples from all three scripts to enable robust multi-script analysis, script identification, and cross-lingual studies. It is curated to include variations in script styles and degradation patterns, simulating real-world challenges and fostering the development of generalized models.

Publications

Multi-low resource languages in palm leaf manuscript recognition: Syllable-based augmentation and error analysis

Nimol Thuon, et al. (2025). Pattern Recognition Letters,.

A Low-Intervention Dual-Loop Iterative Process for Efficient Dataset Expansion and Classification in Palm Leaf Manuscript Analysis

Nimol Thuon, et al. (2025). International Journal on Document Analysis and Recognition (IJDAR), Special Issue track ICDAR-IJDAR 2025.

Generate, transform, and clean: the role of GANs and transformers in palm leaf manuscript generation and enhancement.

Nimol Thuon, et al. (2024). International Journal on Document Analysis and Recognition (IJDAR), Special Issue track ICDAR-IJDAR 2024.

KhmerFormer: Multi-Scale CNNs-Transformer with External Attention for Ancient Khmer Isolated Glyph Classification

Thuon, N., et al. (2024). Asia-Pacific Signal and Information Processing Association Annual Summit (APSIPA 2024).

Improving Isolated Glyph Classification Task for Palm Leaf Manuscripts

Thuon, N., et al. (2024). International Conference on Frontiers in Handwriting Recognition 2022 (ICFHR 2022).

Syllable Analysis Data Augmentation for Khmer Ancient Palm leaf Recognition

Thuon, N., et al. (2024). sia-Pacific Signal and Information Processing Association Annual Summit (APSIPA 2022).

Conclusion & Impact

Research Contribution

This research introduces novel methodologies (e.g., PALM-GANs for enhancement, EFF for classification, and SADA for text synthesis) that establish new state-of-the-art benchmarks for the analysis of complex, low-resource historical manuscripts.

Cultural Impact

Beyond technical advancements, this work plays a crucial role in the digital preservation of Southeast Asian cultural heritage. By making the contents of endangered manuscripts accessible, we empower new forms of scholarly inquiry and public engagement.

Future Work

Future directions include expanding the dataset to more scripts (e.g., Javanese, Lontara), integrating multimodal approaches (combining visual and linguistic cues), and developing interactive, AI-assisted tools for historians and linguists.

Acknowledgment

This work was primarily conducted by Nimol Thuon, with funding support from the Chinese Academy of Sciences (CAS), The World Academy of Sciences (TWAS, Italy), and the National Natural Science Foundation of China (NSFC). The author also acknowledges the valuable contributions and collaboration from partners in Cambodia, China, and Indonesia.

References

[1] Kesiman, M. W. A., et al. (2018). ICFHR 2018 competition on document image analysis tasks for southeast asian palm leaf manuscripts. In 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[2] Valy, D., et al. (2017). A new khmer palm leaf manuscript dataset for document analysis and recognition: Sleukrith set. In 4th International Workshop on Historical Document Imaging and Processing.

[3] Suryani, M., et al. (2017). The handwritten sundanese palm leaf manuscript dataset from 15th century. In 14th IAPR international conference on document analysis and recognition (ICDAR).

[4] Burie, J. C., et al. (2016). ICFHR2016 competition on the analysis of handwritten text in images of balinese palm leaf manuscripts. In 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[5] Thuon, N., et al. (2024). Generate, transform, and clean: the role of GANs and transformers in palm leaf manuscript generation and enhancement. International Journal on Document Analysis and Recognition (IJDAR).