Portugal
Improving theAutomatic Evaluation of
Problem Solutions in Programming Contests
Pedro Ribeiro and Pedro Guerreiro
2Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Presentation Overview
• Automatic Evaluation: Past and Present– The case of IOI
• A possible path for improving evaluation– Developing only a function (not a complete program)– Abstract Input/Output– Repeat the same function call (+ clock precision)– No hints on expected complexity– Examine runtime behaviour as tests increase in size
• Some preliminary results• Conclusions
3Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
• All programming contests need an efficient and fair way of distinguishing submitted solutions
(Automatic) Evaluation
• What do we evaluate?– Correction: does the program produce correct
answers for all instances of the problem?
– Efficiency: does it do it fast enough? Does it have the necessary time and memory complexity?
Programming Contests
4Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
• Classic way of evaluating– Set of pre-defined tests (inputs)– Run program with tests and check output
• IOI has been doing this almost the same way since the beginning with two major advances:– Manual evaluation > Automatic evaluation– Individual Tests -> Grouped tests
• Although IOI has 3 different types of tasks, the main core of the event are still batch tasks
Programming Contests
5Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
IOI Types of Tasks
IOI Year Batch Tasks Reactive Tasks Output Only
2009 7 1 0
2008 6 0 0
2007 5 1 0
2006 4 0 2
2005 5 1 0
2004 5 0 1
6Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Programming Contests• Correction: almost “black art”
– “Program testing can be a very effective way to show the presence of bugs, but it is hopelessly inadequate for showing their absence” (Dijkstra)
• Efficiency:– Typically judges create set of model solutions of different
complexities– Tests designed in that model solutions achieve planned
number of points– Considerable amount of tuning (environment)– Considerable amount of man power needed– More difficult to introduce new languages
7Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
• Solve the problem by writing a specific function (as opposed to a complete program)
• Motivation:– Concentrate on the core algorithm (less distractors)– Can be used on earlier stages of learning– Opportunities for new ways of testing
(more control on submitted code)
• It is already done on other types of contests:– TopCoder– Teaching Environments (Ribeiro and Guerreiro, 2008)
Ideas: Single function
8Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Ideas: I/O Abstraction• The Input and Output should be “abstract” and
not specific to a language
• How to do it:– Input already in memory, passed as function arguments
(simple form, no complex data structure)– Output as the function return value(s)
• Motivation:– Less information processing details– Less complicated problem statements– We can measure time spent in solution (not in I/O)– More balanced performance between languages
9Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Idea: Repeat function calls• In the past we used smaller input sizes
increased speed
of computers• Currently we use huge input sizes
– Clock resolution is poor: small instances > instant– Need to distinguish small asymptotic complexities– Historic fact: Smaller time limit used on IOI:
• IOI 2007, problem training: 0.3 seconds
• Future?– Always more speed > bigger input size
10Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Idea: Repeat function calls• Problems completely detached from reality:
– Ex: IOI 2007 Sails, ship with 100,000 masts
11Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Idea: Repeat function calls• Problems completely detached from reality:
– Ex: IOI 2007 Sails, ship with 100,000 masts
12Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Idea: Repeat function calls• Real world: How can we measure the thickness
of a sheet of paper if we have a standard ruler without enough accuracy?
stack of 100 sheets measures 1cm, then each sheet is ~0.1mm
• We can use the same idea on functions!– Run once with small instances may be instantaneousBut– Running multiple times takes more than 0.00s!
13Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Idea: Repeat function calls• Run the same functions several times and
compute average time
• Pros– Input size can be smaller and related to problem– We can concentrate on quality of test cases and rely
less on randomization to produce big test cases that are impossible to verify manually
• Cons– We must be careful with memory persistence
between successive function calls
14Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Idea: No hints on complexity• When we give limits for the input:
– we simplify implementation details and avoid the need for dynamic memory allocation.
but– We disclose the complexity required for the problem
• Trained students can identify precisely the complexity needed
• This has great impact on problem solving aspect:– Different mindset: I know which complexity I’m looking
for and I settle for a solution that does thatvs– Scientific approach with real world open problem
– Ex: is there a polynomial solution for a problem?
15Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Idea: No hints on complexity• Give limits for implementation purposes, but
make it clear that those are not related to sought efficiency
• More scientific and open ended approach• Need to think how to really solve the problem
(and not how to produce a program that passes the test cases)
• Not overemphasize runtime of particular language– (let me make a test with maximum limits and see if it
runs in X seconds on this machine with this language)
16Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Idea: Runtime behaviour as tests increase
• Typically we measure efficiency by creating set of tests such that different model solutions achieve different number of points
But
• not passing does not imply that the required complexity was not achieved (other factors)– Just means that the test case is solved within the constraints
• A lot of man power needed for model solutions and fine tuning (compiler version, computer speed, language used, etc)
17Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Idea: Runtime behaviour as tests increase
• How can we improve on that?
• Pen and Paper not an option for large scale evaluation– Need for automatic processes
• We have different tests, we have different time measures, why don’t we use all this information?
• Plot the runtime as data increases and do some curve fitting– Impossible to determine complexity for all programs, but even
a trivial (imperfect) curve can show more information than just knowing which test cases are passed
18Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Some Preliminary Results• As a proof of concept a simple problem:
– Input: Sequence of integers– Output: Subsequence of consecutive integers with
maximum sum
• Only ask for function with I/O already given• Small input limit (only 100)• Measure time by running multiple times
(until aggregated time reached 1s)• Use random data for 1,4,8,12,…64
1 -2 3 10 -4 3 -6 4 -1 1
19Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Some Preliminary Results• Implemented 3 model solutions:
– O(N^3) – Iterate all possible intervals in O(N^2) plus iterate trough each interval to discover sum in O(N)
– O(N^2) – Iterate all possible intervals in O(N^2) plus O(1) checking of each sum with accumulated sums
– O(N) – Iterate trough sequence and keep partial sum, whenever the partial sum is negative, it cannot contribute to best and therefore “reset” to zero and continue
A
B
C
20Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Some Preliminary Results• Plot Time(N) / Time(1)
• Simple correlation measure with another functionSolution log N N N log N N^2 N^3 N^4 2^N N!
A 0.6848 0.9264 0.9515 0.9912 0.9993 0.9869 0.6033 0.5722
B 0.7524 0.9666 0.9835 0.9998 0.9848 0.9564 0.5469 0.5183
A 0.8624 0.9952 0.9927 0.9586 0.8985 0.8417 0.4136 0.3906
21Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Some Preliminary Results• Out of scope to give more detailed mathematical
analysis– We could use other statistical measures
• We know that it is impossible to automatically compute and prove complexities
but• This simple approach gives meaningful results
– runtime is somehow consistent and correlated with a certain function and therefore appears to grow following a pattern that we were able to identify
• Ex: Linear > appears to take twice the time when data doubles
22Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Some Preliminary Results• What could this do?
– More information from the same test cases– Possibility of giving students automatic feedback on
runtime behavior– Possibility of identifying runtime behaviors for which
no model solutions were created (less man power!)– Independent of language specific details
Ex: Archery Problem, IOI 2009, Day 1There were solutions with O(N^2R), O(N^3), O(N^2 log N),
O(N^2), O(N log N), …
No need to code them all in all languages and then tune!
23Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
Conclusion• 20 Years of IOI: computers are much faster, style of
evaluation is still the same• Setting up test cases is time consuming and requires
man power• Need to think of ways to improve evaluation• Our proposal, geared to more informal contests or
teaching environments, can offer:– No distraction with I/O– No large data sets– More natural problem statements– No hint on complexity (open ended approach)– No need for implementing many model solutions– New languages can be added without changing tests
• Still more work to obtain robust system but we feel this ideas (or some of them) can be used in practice
• Future: can evaluation be improved in other ways?
24Plovdiv – IOI’2009P. Ribeiro, P. Guerreiro
Improving the Automatic
Evaluation of Problem
Solutions in Programming
Contests
The End
• And that’s all!:-)
Questions?
Pedro Ribeiro ([email protected])
Pedro Guerreiro ([email protected])