benchmarks and performance measures in artificial intelligence · benchmarks and performance...
Post on 01-Feb-2020
10 Views
Preview:
TRANSCRIPT
Benchmarks and performance measures inartificial intelligence
Anders JonssonArtificial Intelligence and Machine Learning Group
Universitat Pompeu Fabra
HUMAINT workshop6 March 2018
Anders Jonsson
Benchmarks in AI
Sequential decision making
Perform a sequence of actions to achieve a given objectiveEach decision has an impact on future decisions!
Anders Jonsson
Benchmarks in AI
Sequential decision making
Reinforcement learning:
Effect of actions initially unknown
Intervention: perform actions to test hypotheses about them
Aim: in a given state s, estimate the value Q(s, a) of anaction a and/or a policy π(·|s) for action selection
AI planning:
System has a model of the actions
Aim: compute a sequence of actions in advance
Anders Jonsson
Benchmarks in AI
Evaluation criteria
Theoretical analysis:
Performance bounds: how far from optimal is an AI algorithm?
Time complexity: how fast is it?
Memory complexity: how much memory does it use?
Empirical evaluation:
How does an AI algorithm perform in practice?
Anders Jonsson
Benchmarks in AI
Empirical performance measures
What do we measure?
Winning
Scoring points
?
What do we compare to?
Optimal or near-optimal
Human
Other algorithms
?
Anders Jonsson
Benchmarks in AI
Problems with empirical evaluation
Strong incentive to boost performance of ones own algorithm
Practices in reinforcement learning [Henderson et al. 2017]:
Run X trials, report average of 3 best runsOmit network architecture, random seeds, hyperparametersImplement own version of other researchers’ algorithms
Claims of human-level performance
Anders Jonsson
Benchmarks in AI
Benchmarks
Set of instances that are representative of problem difficulty
More unbiased comparison
Difficult to artificially boost the performance of an algorithm
More likely that results generalize
Anders Jonsson
Benchmarks in AI
Benchmarks
Atari
Open AI Gym
Project Malmo
Anders Jonsson
Benchmarks in AI
Competitions
International Planning Competition
General Video Game AI Competition
AIIDE StarCraft AI Competition
Anders Jonsson
Benchmarks in AI
Image classification
Anders Jonsson
Benchmarks in AI
Problems with benchmarks
Might not accurately reflect real-world problems
Might require large amounts of computational power
Excessive focus on winning leads to algorithms that do notreally advance the state-of-the-art (e.g. portfolio algorithms)
Anders Jonsson
Benchmarks in AI
AI in the real world
Clear and relevant performance criteria
Appropriate, publicly available benchmarks
Proper statistical comparisons
Independent verification and reproduction
Anders Jonsson
Benchmarks in AI
Lifelong learning
System that operates for long periods of time
Task is not fixed but changes, some tasks initially unknown
New objects and actions become available over time
Anders Jonsson
Benchmarks in AI
Questions
Anders Jonsson
Benchmarks in AI
top related