Journals - MOST Wiedzy

TASK Quarterly

Game of Questions: An automated method for unconventional evaluation of Large Language Models

Abstract

The rapid advancement of Large Language Models (LLMs) has created a need for methods to evaluate their performance, particularly in assessing their domain-specific knowledge and the ability to apply such knowledge in reasoning tasks. Current benchmarks often require substantial manual effort for test case construction and answer scoring. We address this limitation by providing a robust, automatic evaluation method that relies only on unstructured domain text. We introduce the Game of Questions, a method that allows the model's knowledge to be tested via an interaction with another model, inspired by the popular web-based game Akinator. The approach requires minimal input from the evaluator and no prepared questions, making it convenient to apply.

Keywords:

Large Language Model, benchmark

Details

Issue
Vol. 29 No. 4 (2025)
Section
Research article
Published
2026-05-25
DOI:
https://doi.org/10.34808/tq2025/29.4/b
Licencja:

Copyright (c) 2026 TASK Quarterly

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors

Download paper