Game of Questions: An automated method for unconventional evaluation of Large Language Models
Abstract
The rapid advancement of Large Language Models (LLMs) has created a need for methods to evaluate their performance, particularly in assessing their domain-specific knowledge and the ability to apply such knowledge in reasoning tasks. Current benchmarks often require substantial manual effort for test case construction and answer scoring. We address this limitation by providing a robust, automatic evaluation method that relies only on unstructured domain text. We introduce the Game of Questions, a method that allows the model's knowledge to be tested via an interaction with another model, inspired by the popular web-based game Akinator. The approach requires minimal input from the evaluator and no prepared questions, making it convenient to apply.
Keywords:
Large Language Model, benchmarkDetails
- Issue
- Vol. 29 No. 4 (2025)
- Section
- Research article
- Published
- 2026-05-25
- DOI:
- https://doi.org/10.34808/tq2025/29.4/b
- Licencja:
-
Copyright (c) 2026 TASK Quarterly

This work is licensed under a Creative Commons Attribution 4.0 International License.
