On the Measure of Intelligence

id:

1911.01547

Authors:

François Chollet

Published:

2019-11-05

arXiv:

https://arxiv.org/abs/1911.01547

PDF:

https://arxiv.org/pdf/1911.01547

DOI:

N/A

Journal Reference:

N/A

Primary Category:

cs.AI

Categories:

cs.AI

Comment:

N/A

github_url:

_

abstract

To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI. We summarize and critically assess these definitions and evaluation approaches, while making apparent the two historical conceptions of intelligence that have implicitly guided them. We note that in practice, the contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks such as board games and video games. We argue that solely measuring skill at any given task falls short of measuring intelligence, because skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to “buy” arbitrary levels of skills for a system, in a way that masks the system’s own generalization power. We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience. Using this definition, we propose a set of guidelines for what a general AI benchmark should look like. Finally, we present a benchmark closely following these guidelines, the Abstraction and Reasoning Corpus (ARC), built upon an explicit set of priors designed to be as close as possible to innate human priors. We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans.

premise

Intelligence is not demonstrated by skill at any particular task, but rather by the efficiency with which a system can turn its experience and priors into skill at new, previously unknown tasks. This efficiency should be measured with respect to information (priors + experience) and generalization difficulty.

generated by ALTER

outline

Context and History

  • Need for actionable intelligence definition

  • Two divergent views: task-specific vs general learning

  • Evolution of AI evaluation methods

A New Perspective

  • Critical assessment of measuring skill alone

  • Intelligence as skill-acquisition efficiency

  • Grounding general intelligence in human scope

  • Core knowledge from developmental psychology

Benchmark Proposal: ARC

  • Description and goals: evaluating general intelligence

  • Core Knowledge priors as foundation

  • Comparison with psychometric tests

  • Discussion of strengths, weaknesses, alternatives

quotes

“Intelligence lies in the process of acquiring skills at tasks you cannot prepare for. This process involves multiple features: rapid abstraction of the structure of new challenges, the ability to solve tasks not seen during training, and crucially, the efficient use of a broad knowledge prior.”

—(p.27)

“To make progress towards the promise of our field, we need precise, quantitative definitions and measures of intelligence – in particular human-like general intelligence.”

—(p.3)

“In plain English: intelligence is the rate at which a learner turns its experience and priors into new skills at valuable tasks that involve uncertainty and adaptation.”

—(p.40)

generated by ALTER

notes

summary

1. Brief Overview

This paper, “On the Measure of Intelligence,” by François Chollet, critically examines existing definitions and evaluation methods of intelligence in both psychology and AI. It argues that current AI benchmarks, focused on task-specific skills, fail to adequately measure true intelligence, which is better characterized as skill-acquisition efficiency. Chollet proposes a new formal definition of intelligence based on Algorithmic Information Theory and introduces the Abstraction and Reasoning Corpus (ARC) as a novel benchmark designed to evaluate human-like general fluid intelligence in AI systems.

2. Key Points

  • Current AI benchmarks prioritize skill in specific tasks, neglecting the broader ability to acquire new skills efficiently.

  • Skill acquisition efficiency, encompassing scope, generalization difficulty, priors, and experience, is proposed as a more accurate definition of intelligence.

  • The g-factor (general intelligence) from psychometrics is analogous to extreme generalization in AI.

  • ARC, a new benchmark, addresses limitations of previous AI evaluation methods by focusing on human-like general fluid intelligence using novel tasks and explicit priors.

  • The paper formalizes the concept of intelligence using Algorithmic Information Theory, providing quantitative measures for generalization difficulty, priors, and experience.

  • ARC incorporates key aspects of psychometrics, such as the use of broad batteries of tasks and a hierarchy of abilities, but avoids issues concerning crystallized abilities and prior knowledge inherent in traditional psychometric tests.

3. Notable Quotes

  • “Looked at in one way, everyone knows what intelligence is; looked at in another way, no one does.” - Robert J. Sternberg, 2000

  • “In the distant future I see open fields for far more important researches. Psychology will be based on a new foundation, that of the necessary acquirement of each mental power and capacity by gradation.” - Charles Darwin, 1859

  • “Presumably the child brain is something like a notebook as one buys it from the stationer’s. Rather little mechanism, and lots of blank sheets.” - Alan Turing, 1950

4. Primary Themes

  • The inadequacy of current AI evaluation metrics: The paper argues that existing methods focus too narrowly on task-specific skills, neglecting the broader capacities of intelligence, such as generalization and learning.

  • A new definition of intelligence: The author proposes a formal definition grounded in Algorithmic Information Theory, emphasizing skill-acquisition efficiency as the core characteristic of intelligence.

  • The importance of generalization: The paper extensively explores different facets of generalization, from robustness and flexibility to extreme generalization, linking these to the hierarchical structure of cognitive abilities identified in psychometrics.

  • The ARC benchmark: The paper introduces a new benchmark dataset (ARC) designed to test for human-like general intelligence, addressing shortcomings of previous approaches. It explicitly defines the assumed innate knowledge priors used in designing the tasks.

  • Bridging AI and psychometrics: The paper draws parallels between AI evaluation and the field of psychometrics, advocating for the adoption of certain principles from psychometrics in AI evaluation, while avoiding pitfalls inherent in the direct application of traditional psychometric tests to AI.