Member-only story

Humanities Last Exam: Evaluating AI’s Understanding of Complex Knowledge

U.V.

4 min readJan 29, 2025

Introduction

The Humanities Last Exam (HLE) is a benchmark designed to test large language models (LLMs) on humanities-related subjects. This exam comprises a comprehensive set of exam questions curated from various disciplines within the humanities. The goal is to evaluate the reasoning, comprehension, and problem-solving abilities of LLMs in the context of humanistic knowledge.

Architecture of HLE

1. Structure & Design

HLE consists of 3,000 exam questions sourced from over a hundred subjects, grouped into high-level categories. These categories include:

Math (42%)
Physics (11%)
Biology/Medicine (11%)
Computer Science/Artificial Intelligence (9%)
Humanities/Social Science (8%)
Chemistry (6%)
Engineering (5%)
Other (8%)

These categories cover a broad spectrum of topics, ensuring a diverse evaluation set for the LLMs.

Humanities Last Exam: Evaluating AI’s Understanding of Complex Knowledge

Introduction

Architecture of HLE

1. Structure & Design

2. Dataset Pipeline & Processing

Written by U.V.

No responses yet