benchmarks and performance measures in artificial intelligence · benchmarks and performance...

Benchmarks and performance measures inartificial intelligence

Anders JonssonArtificial Intelligence and Machine Learning Group

Universitat Pompeu Fabra

HUMAINT workshop6 March 2018

Anders Jonsson

Benchmarks in AI

Sequential decision making

Perform a sequence of actions to achieve a given objectiveEach decision has an impact on future decisions!

Anders Jonsson

Benchmarks in AI

Sequential decision making

Reinforcement learning:

Effect of actions initially unknown

Intervention: perform actions to test hypotheses about them

Aim: in a given state s, estimate the value Q(s, a) of anaction a and/or a policy π(·|s) for action selection

AI planning:

System has a model of the actions

Aim: compute a sequence of actions in advance

Anders Jonsson

Benchmarks in AI

Evaluation criteria

Theoretical analysis:

Performance bounds: how far from optimal is an AI algorithm?

Time complexity: how fast is it?

Memory complexity: how much memory does it use?

Empirical evaluation:

How does an AI algorithm perform in practice?

Anders Jonsson

Benchmarks in AI

Empirical performance measures

What do we measure?

Winning

Scoring points

What do we compare to?

Optimal or near-optimal

Other algorithms

Anders Jonsson

Benchmarks in AI

Problems with empirical evaluation

Strong incentive to boost performance of ones own algorithm

Practices in reinforcement learning [Henderson et al. 2017]:

Run X trials, report average of 3 best runsOmit network architecture, random seeds, hyperparametersImplement own version of other researchers’ algorithms

Claims of human-level performance

Anders Jonsson

Benchmarks in AI

Benchmarks

Set of instances that are representative of problem difficulty

More unbiased comparison

Difficult to artificially boost the performance of an algorithm

More likely that results generalize

Anders Jonsson

Benchmarks in AI

Benchmarks

Open AI Gym

Project Malmo

Anders Jonsson

Benchmarks in AI

Competitions

International Planning Competition

General Video Game AI Competition

AIIDE StarCraft AI Competition

Anders Jonsson

Benchmarks in AI

Image classification

Anders Jonsson

Benchmarks in AI

Problems with benchmarks

Might not accurately reflect real-world problems

Might require large amounts of computational power

Excessive focus on winning leads to algorithms that do notreally advance the state-of-the-art (e.g. portfolio algorithms)

Anders Jonsson

Benchmarks in AI

AI in the real world

Clear and relevant performance criteria

Appropriate, publicly available benchmarks

Proper statistical comparisons

Independent verification and reproduction

Anders Jonsson

Benchmarks in AI

Lifelong learning

System that operates for long periods of time

Task is not fixed but changes, some tasks initially unknown

New objects and actions become available over time

Anders Jonsson

Benchmarks in AI

Questions

Anders Jonsson

Benchmarks in AI

benchmarks and performance measures in artificial intelligence · benchmarks and performance...

Documents

av: katja vahedi handledare: eva jonsson

anders denken over werk, anders denken over communiceren

anders celcius

anders. corruption

tax benchmarks and variations statement · web viewtax...

erik jonsson school of engineering and computer …

simonsen anders

anders verlonen

saharon shelah- more jonsson algebras

hosts magnus jonsson and tommy andersson

the inﬂuence of k-dependence on the complexity of...

stress calculations on multiply connected domains ·...

rfid antenna design - utd jonsson school - home

benchmarks ramon zatarain. index benchmarks and benchmarking...

one edb card - christer jonsson

acceptance test driven development - mikael gunnefur, anders...

jonsson workwear

stefan anders

anders lindgren

jonsson school commencement