NCC Finland
NCC Finland is part of the EuroCC2 project, which establishes national HPC competence centers in different European countries. NCC Finland's mission is to support and improve the capabilities of Finnish business users to utilize the opportunities of high- performance computing, data analytics and artificial intelligence.Through NCC Finland, companies have access to the computing capacity of the EuroHPC LUMI Supercomputer. NCC Finland is operated by the CSC – IT Center for Science.
Industrial organisations involved:
The Finnish company Root Signals (now Scorable), based in Helsinki and in Palo Alto, USA, is a leader in automated quality control of AI applications, chatbots and agents that are powered by large language models (LLM). It provides companies with versatile tools to measure and improve the reliability and efficiency of LLM applications. Their newest work Root Judge LLM has been developed with the LUMI supercomputer.
Technical/scientific Challenge:
LLM evaluation and comparison have been extremely hard problems since the start of the LLM boom. “ LLM behaviour is sometimes hard to predict. You need a specific LLM to evaluate its reliability. Root Judge is designed and trained with millions and millions of evaluation tasks. Massive amounts of data, both open-source and synthetic evaluation data, was needed to train an evaluation specific LLM – a judge LLM.
Solution:
Root Signals´ Root Judge LLM has been developed with the LUMI supercomputer.
“We used almost 400 GPUs to develop our Root Judge evaluation LLM. We released it open source together with a totally open weighed model available also for commercial use. We benchmarked against the leading LLMs such as Open AI and Anthropic as well as other open-source LLMs and it outperformed them in the evaluation tasks,” AI team leader Oguzhan Gencoglu, from Root Signals reveals.
Not many have access to a huge number of GPUs, so the commercially significant decision of Root Signals was to quantise the Judge model, in order to make it use less GPUs and still perform well with GPUs that can be bought off the shelf.
Business impact:
Root Judge is now fast, accessible, and affordable, which has reinforced the market position of Root Signals LLM evaluation tools, accelerating their growth and development of a large service portfolio around Root Judge with rich features for measuring, detecting and intervening on AI reliability issues etc.
Root Judge, is specifically finetuned to judge the reliability of another LLM, i.e. to detect hallucinations and to provide transparent justifications for scoring. This helps end-users and developers to evaluate and optimize their LLMs, ultimately building trust in AI-driven evaluation.
Benefits:
* Vast computing power to develop evaluation LLM and availability of expert support
* Development of groundbreaking open-source model Root Judge, finetuned to judge the reliability of another LLM
* Build trust to AI driven evaluation

Success story # Highlights:
* Keywords: LLM, Judge LLM
* Industry Sector: AI, Software development
* Technology: EuroHPC LUMI
Contact:
NCC Finland : Development manager Dan Still, CSC, dan.still@csc.fi/
Tiina Leiponen, tiina.leiponen@csc.fi
Acknowledgements: This project has received support from Business Finland
