Flora Batava (1800-1934): From Historical Citizen Science to Plant Humanities Dataset

Occurrence Observation
Latest version published by FLORON Plant Conservation Netherlands on Apr 12, 2026 FLORON Plant Conservation Netherlands

Download the latest version of this resource data as a Darwin Core Archive (DwC-A) or the resource metadata as EML or RTF:

Data as a DwC-A file download 11,565 records in Dutch (447 KB) - Update frequency: not planned
Metadata as an EML file download in English (11 KB)
Metadata as an RTF file download in English (9 KB)

Description

Flora Batava: people, plants, locations lists 11,500+ records of all species in the first illustrated flora of the Netherlands, published in 28 volumes between 1800 and 1934. The dataset includes information about the plants, the people who observed them in each locality, and the publication of each volume. KB, the National Library of the Netherlands holds both original and digitized source material. From the latter, data was segmented and extracted using a generative AI model (OpenAI’s GPT-4), then checked and corrected manually. Including social (e.g., observers’ names, sex) and historical information (e.g., old plant names, publication history), this dataset facilitates research in plant humanities, botanical heritage, and social history of science.

Data Records

The data in this occurrence resource has been published as a Darwin Core Archive (DwC-A), which is a standardized format for sharing biodiversity data as a set of one or more data tables. The core data table contains 11,565 records.

This IPT archives the data and thus serves as the data repository. The data and resource metadata are available for download in the downloads section. The versions table lists other versions of the resource that have been made publicly available and allows tracking changes made to the resource over time.

Versions

The table below shows only published versions of the resource that are publicly accessible.

How to cite

Researchers should cite this work as follows:

Teixeira-Costa L, van Gelder E, Sparrius L, Karsdorp, F (2026). Flora Batava (1800-1934): From Historical Citizen Science to Plant Humanities Dataset. Version 1.1. FLORON Plant Conservation Netherlands. Occurrence dataset. https://www.verspreidingsatlas.nl/ipt/resource?r=flora-batava&v=1.1

Rights

Researchers should respect the following rights statement:

The publisher and rights holder of this work is FLORON Plant Conservation Netherlands. This work is licensed under a Creative Commons Attribution (CC-BY 4.0) License.

GBIF Registration

This resource has been registered with GBIF, and assigned the following GBIF UUID: 863890c7-c5ce-4fd2-ad32-c3bdf510c2b2.  FLORON Plant Conservation Netherlands publishes this resource, and is itself registered in GBIF as a data publisher endorsed by Netherlands Biodiversity Information Facility.

Keywords

Occurrence; Observation

Contacts

Laurens Sparrius
  • Originator
  • Point Of Contact
FLORON Plant Conservation Netherlands
Nijmegen
NL
Luiza Teixeira-Costa
  • Metadata Provider
Royal Netherlands Academy of Arts & Sciences (KNAW)
Amsterdam
NL
Esther van Gelder
  • Custodian Steward
KB nationale bibliotheek
Den Haag
NL
Folgert Karsdorp
  • Principal Investigator
Royal Netherlands Academy of Arts & Sciences (KNAW)
Amsterdam
NL

Geographic Coverage

The Netherlands and surroudings

Bounding Coordinates South West [49.325, 1.978], North East [53.645, 8.438]

Temporal Coverage

Start Date / End Date 1790-01-01 / 1934-12-31

Project Data

No Description available

Title Flora Batava

The personnel involved in the project:

Luiza Teixeira-Costa
Esther van Gelder
Laurens Sparrius
Folgert Karsdorp

Sampling Methods

Scans of the journal's pages were processed with Optical Character Recognition and Handwritten Text Recognition. Text segmentation was used to classify paragraphs with labels such as “species names”, “flowering time”, “classification”, “sexual characteristics”, “species traits”, “habitat”, “medicinal use”, “domestic use”. Observations were extracted from the habitat sections using Generative AI. Geocoding for locality names was done with Nominatim (R) and Generative AI. Data was enriched by combining data from the national checklists and a database with historical observers of flora and fauna.

Study Extent Observations of plants and fungi published in the Flora Batava journal (1800–1934).
Quality Control After data extraction, all entries were individually checked against the source material and manually corrected if needed regarding spellings and correctness of the information. Entries were again manually checked during geocoding and data enrichment.

Method step description:

  1. For a detailed description see https://openhumanitiesdata.metajnl.com/articles/10.5334/johd.497

Bibliographic Citations

  1. Teixeira-Costa L, van Gelder E, Sparrius LB, Karsdorp, F (2026). Flora Batava (1800-1934): From Historical Citizen Science to Plant Humanities Dataset. Journal of Open Humanities Data 12: 4. https://doi.org/10.5334/johd.497

Additional Metadata

Alternative Identifiers 863890c7-c5ce-4fd2-ad32-c3bdf510c2b2
https://www.verspreidingsatlas.nl/ipt/resource?r=flora-batava