Member-only story
Humanities Last Exam: Evaluating AI’s Understanding of Complex Knowledge
Introduction
The Humanities Last Exam (HLE) is a benchmark designed to test large language models (LLMs) on humanities-related subjects. This exam comprises a comprehensive set of exam questions curated from various disciplines within the humanities. The goal is to evaluate the reasoning, comprehension, and problem-solving abilities of LLMs in the context of humanistic knowledge.
Architecture of HLE
1. Structure & Design
HLE consists of 3,000 exam questions sourced from over a hundred subjects, grouped into high-level categories. These categories include:
- Math (42%)
- Physics (11%)
- Biology/Medicine (11%)
- Computer Science/Artificial Intelligence (9%)
- Humanities/Social Science (8%)
- Chemistry (6%)
- Engineering (5%)
- Other (8%)
These categories cover a broad spectrum of topics, ensuring a diverse evaluation set for the LLMs.