×
About this game
This game uses the 1,400-question probe set from Bojie Li's paper, Incompressible Knowledge Probes, and the public IKP repository.
The paper calibrates factual probe performance against model size with a log-linear curve. Here, your automatically graded answers are mapped onto that same curve, so the result is an LLM-equivalent factual capacity estimate, not a claim about human brains.
The original benchmark uses an LLM judge with strict rules. This site uses a deterministic approximation: normalized exact/fuzzy matching, numeric tolerance, alternate-answer handling, and refusal detection.
The displayed IKP score is penalized, not raw accuracy: correct answers count as +1, refusals as 0, and wrong guesses as -0.5, then the displayed score is floored at 0. The paper's evaluation code uses this hallucination penalty and floors tier scores at zero before averaging tiers; this game applies the same penalty to your answered sample.
The estimate starts from a deliberately conservative prior: before seeing your answers, the 90% equivalent-parameter interval is 10M to 1B. After each answer, your observed IKP score is mapped through log10(params_B) = 6.790 * score - 0.899. The score uncertainty from your answered sample and the paper's approximate calibration error are combined with that prior in log-parameter space. The displayed 90% CI (equiv) is the resulting posterior interval.