AI helps identify species with 97% accuracy thanks to Austria’s HPC and Friederike Barkmann’s ecological research.
At first glance, butterflies and supercomputers have nothing in common. However, upon closer inspection, they are more deeply connected than one might think—at least in the world of Friederike Barkmann, an ecologist at the University of Innsbruck. Using artificial intelligence (AI) and a supercomputer, she has significantly simplified the identification of butterflies based on photos. EuroCC Austria supported her in training the model on one of Austria’s high-performance computers. Now, the first interim results are in: the model achieves an accuracy rate of 97%, and the computation time has been reduced by 90%.
Butterflies are not only popular photography subjects but also crucial indicators of biodiversity. Therefore, monitoring these insects is of great importance for environmental protection. In Austria, the monitoring is carried out as part of the initiative "Viel-Falter Monitoring" (viel-falter.at), in which scientists from the University of Innsbruck work together with volunteers. In addition to the systematic observations, many volunteers also collect incidental sightings of butterflies.
Over the past few years, volunteers have uploaded several hundred thousand images of native butterflies via the free Butterfly App (schmetterlingsapp.at) from Blühendes Österreich, a non-profit initiative of the BILLA Private Foundation. These images serve as valuable data sources for purposes such as Red List assessments or species distribution maps – and as the foundation for Friederike Barkmann’s research on new machine learning models.
Testing various models to find the one
Traditionally, identifying butterflies and verifying uploaded photos has been a manual task for experts. It’s not only a time-consuming and costly process, but also an unreliable one - after all, no expert can classify all the specimens. This issue led to the idea of letting artificial intelligence (AI) do the job and to train a machine learning model. The goal is to test various models to determine which one best identifies butterflies and other animals. This would not only give wings to the project of butterfly identification but also benefit other biodiversity initiatives. However, achieving this requires enormous amounts of data, which, in turn, demands significant computing power.
Accelerated data processing with machine learning
Ecologist Friederike Barkmann worked with 529,835 images of 162 butterfly species to train different machine learning models. In projects like this, the more images available for a given species, the easier it is for the model to correctly classify them. But biodiversity data typically exhibit a phenomenon known as "class imbalance," where some species have over 10,000 photos, while others have as few as ten. The challenge is to ensure that "smaller" species are weighted appropriately in the machine learning model.
Parallel processing improves performance
Barkmann began her work on the University of Innsbruck’s supercomputer, LEO5. Modern high-performance computers like LEO5 are equipped with efficient Graphics Processing Units (GPUs), which are particularly well-suited for training AI models. However, a single GPU is not as powerful as multiple GPUs working together. That’s why supercomputing experts connect multiple processors using a method known as "parallelisation".
Initially, Barkmann trained the model using a massive dataset on a single GPU. During the lengthy computation time, she could have leisurely gone for a walk and butterfly-watching on the city outskirts. However, supercomputing expert Andreas Lindner from EuroCC Austria saw a way to optimise the process and implemented parallelisation on four GPUs on the Innsbruck supercomputer. This reduced the training time for a single epoch (out of 50 total) from two hours to just twelve minutes – a 90 % time savings. Furthermore, the accuracy of the model was exceptionally high: as of early December 2024, the test dataset achieved a 97 % identification accuracy, meaning 97 % of butterflies were correctly classified.
High-quality data leads to accurate models
For every AI image recognition tool, the rule is simple: the more training images available, the more accurate the model can become. However, Barkmann’s project also included butterfly species with relatively few images. Here, the class imbalance issue arose again: while some species had 5,000 images, others had only 70. To address this, Barkmann augmented the dataset by editing images – rotating them or cropping smaller sections – so that the model had more data to learn from. Despite these efforts, species with fewer images remained more difficult to classify accurately. The study found that when a species had at least 1,662 images, the model could predict it with at least 90 % accuracy.
Links
Out of her work with more than 500,000 images of butterfly and moth species, Friederike Barkmann developed the largest dataset of Austrian butterflies and moths worldwide, published in August 2025 in Scientific Data (Nature). The dataset is publicly available to scientists across the globe: Machine learning training data: over 500,000 images of butterflies and moths (Lepidoptera) with species labels.
Department of Ecology, University of Innsbruck: https://www.uibk.ac.at/de/ecology
Butterfly app: https://www.schmetterlingsapp.at
Viel-Falter Monitoring: www.viel-falter.at
Code for LEONARDO on Github: https://github.com/FriederikeBarkmann/CNN_butterfly_identification
First trained model: https://huggingface.co/RikeB/CNN_butterfly_identification