Member-only story

Humanities Last Exam: Evaluating AI’s Understanding of Complex Knowledge

U.V.
4 min readJan 29, 2025

--

Introduction

The Humanities Last Exam (HLE) is a benchmark designed to test large language models (LLMs) on humanities-related subjects. This exam comprises a comprehensive set of exam questions curated from various disciplines within the humanities. The goal is to evaluate the reasoning, comprehension, and problem-solving abilities of LLMs in the context of humanistic knowledge.

Architecture of HLE

1. Structure & Design

HLE consists of 3,000 exam questions sourced from over a hundred subjects, grouped into high-level categories. These categories include:

  • Math (42%)
  • Physics (11%)
  • Biology/Medicine (11%)
  • Computer Science/Artificial Intelligence (9%)
  • Humanities/Social Science (8%)
  • Chemistry (6%)
  • Engineering (5%)
  • Other (8%)

These categories cover a broad spectrum of topics, ensuring a diverse evaluation set for the LLMs.

2. Dataset Pipeline & Processing

--

--

U.V.
U.V.

Written by U.V.

I track the latest AI research and write insightful articles, making complex advancements accessible and engaging for a wider audience.

No responses yet