trading agent competition (tac) jon lerner, silas xu, wilfred yeung cs286r, 3 march 2004

Trading Agent Competition(TAC)

Jon Lerner, Silas Xu, Wilfred Yeung

CS286r, 3 March 2004

TAC Overview

International Competition Intended to spur research into trading agent

design First held in July 2000 TAC Classic and TAC SCM Scenarios

TAC Classic

Each team in charge of virtual travel agent Agents try to find travel packages for virtual

clients All clients wish to travel over same five day

period Clients not all equal, each has different

preferences for certain types of travel packages

Travel Packages

Each contains flight info, hotel type, and entertainment tickets

To gain positive utility from client, agents must construct feasible packages. Feasible means: Arrival date strictly less than departure date Same hotel reserved during all intermediate nights At most one entertainment event per night At most one of each type of entertainment ticket

Flights

Clients have preferences for ideal arrival/departure dates

Infinite supply of flights sold through continuously clearing auctions

Prices set by a random walk Prices later set to drift upwards to discourage

waiting No resale or exchange of flights permitted

Hotels

Two hotels – high quality and low quality, 16 rooms per hotel per night

Sold through ascending, multi-unit, sixteenth-price auctions: one auction for all rooms for single hotel on single night

Periodically a random auction closes to encourage agents to bid

Clients have different values for high and low quality hotels

Entertainment

Three types of entertainment available Clients have value for each type Each agent has initial endowment of tickets Buy and sell tickets through continuous

double auction

Agent Themes

Agents have to address: When to Bid What to Bid On How Much to Bid

Combinatorial preferences, but not combinatorial auctions

Strategies

What strategies come to mind? What AI techniques might be useful? Simple vs. Complicated Strategies

How quickly should you adapt as game progresses? Use of historical data vs. Focus on current game

only Play the game vs. Play the players

living agents (Living Systems AG)

Winner: TAC 2001 Makes two assumptions 1. Steadily increasing flight prices favor early decisions for

flight tickets. 2. Especially the good performing teams are following a

strategy to maximize their own utility. They are not trying to take the risk to reduce other team’s utility.

Simple strategy Makes substantial use of historical data. Barely any monitoring/adapting to changing conditions Benefits from other agents’ complicated algorithms to

control price; Open-loop, Play the Players

living agents: Determining Hotel and Flight Bids

Assume hotel auction will clear at historical levels

Using these as hotel prices, initial flight prices, and client preferences, determine optimal client trips

Immediately place bids based on this optimum Purchase corresponding flights immediately Place offers for required hotels at prices high

enough to ensure successful acquisition

Entertainment Auction

Immediately makes fixed decision as to which entertainment to attempt to buy/sell assuming the historical clearing price of about $80. Opportunistically buy and sell around this point Put in final reservation prices at seven minute

mark.

How good is living agents? Risky

If hotel bids are not high enough, fails to complete trips, resulting in huge loss of points.

If hotel clears at living agents’ bid, potentially pays much more than necessary

After placing initial bid, does not monitor hotel or flight auctions at all

Clearly not all agents could use this strategy (Hotel auctions)

Simple Buys flights immediately, avoiding cost of waiting Relies on historical data

Contains information from many games But how sensitive is evolution of game to changes in client

preferences, or changes in opponents’ strategy?

Applicability

Use of historical data for predictive information

Feasibility of simple strategies that ignore feedback

Play against the players (not prices), under the assumption that other agents keep things relatively efficient.

ATTac (AT&T Research)Winner: TAC 2002

Uses sophisticated machine-learning techniques to predict future hotel prices based on the current situation

Buys flights based on cost-benefit analysis of committing versus waiting

Minute-by-minute reoptimization of bids based on holdings and predictions

The heart of ATTac

Assumption: Because of many unknowns, exactly predicting the price of a hotel room is hopeless.

Instead, regard the closing price as a random variable that needs to be estimated, conditional on our current state of knowledge Number of minutes remaining in game Ask price of each hotel Flight prices Historical Date

Construct a model of the probability distribution over clearing prices (based on a boosting algorithm), stochastically sample prices, and compute expected profit

The high-level algorithm

Denote the most profitable allocation of goods at any time by G* When first flight quotes are posted:

Compute G* with current holdings and expected prices Buy the flights in G* for which the expected cost of postponing

commitment exceeds the expected benefit of postponing commitment

Starting 1 minute before each hotel close: Compute G* with current holdings and expected prices Buy the flights in G* for which expected cost of postponing

commitment exceeds expected benefit of postponing commitment Bid hotel room expected marginal values given holdings, new

flights, and expected hotel purchases Last minute: Buy remaining flights as needed by G* In parallel (continuously): Buy/sell entertainment tickets

base on their expected values

The boosting algorithm: solving conditional density estimation problems Start with ordered pairs (x,y), with x being a vector

that describes auction-specific features, y being the difference between closing price and current price

Aim of boosting is, given current x, to estimate the conditional distribution of y

Construct conditional distribution function that minimize the sum of negative log likelihood of y given x, for all training samples.

Use this condition distribution function to map x to y

living agents vs. ATTac

Two very different approaches Statistically insignificant difference in scores

in TAC2001

Open and Closed Loop Processes Closed-loop: system feeds information back into

itself. Examines the world in an effort to validate the world model. appropriate for real-world environments in which feedback

is necessary to validate agent actions. Open-loop: no feedback from the environment to

the agent. Output from processes are considered complete upon execution. appropriate for simulated rather than real environments

(tasks not performed perfectly by agent generally.) generally more efficient for the same reason.

Walverine: (Closed-loop)

Model Based: Flight and Hotel Predicts hotel prices by Walrasian equilibrium Derives expected demand from 64 clients’

preferences and initial flight prices, which influence clients’ choice of travel days, and

Construct bids that max expected value of bid Model Free: Entertainment

Q-Learning from thousands of auction instances (aside on model vs model-free learning)

No empirically tuned parameters

SouthamptonTAC: (Closed-loop) Adaptive agent, varies strategy to mkt cond. 3 classifications for environments:

Non-competitive (agent gets hotel at low prices) Semi-competitive (medium prices) Competitive (prices of hotels high) Based on curr game and outcomes of recent

games Non-competitive:

Buys all flights at beginning of game Never change itinerary of clients

SouthamptonTAC: (Closed-loop) Competitive:

Rapidly rising prices – buy at beginning Stagnant prices – buy near the end

Fuzzy reasoning to predict hotel clearing prices 3 rule bases Factors inc: price of hotel, counterpart, price

change in prev minute, price change in counterpart hotel in prev minute

Continuously assesses game type

ROXY-BOT: (Open-loop)

Two phase bidding policy: Solve completion problem

Optimization based on a tree structure using beam search that only partially expands the tree. [Greenwald]

Valuate goods in that set Marginal utility calculator MU(x) = V(N) – V(N|x)

Computing Prices: (historical data) Point estimates (’00) Estimated price distributions (’01)

Averaging MU across many samples of estimated price dist Monte-Carlo simulation to evaluate bidding policy (’02)

Whitebear (Winner in ’02, Open-loop) Flights:

A: buy everything B: buy only what is absolutely necessary Combination: buy everything except dangerous tickets

Hotels: (predictions simply historical averages) A: bid small increment greater than current prices B: bid marginal utility Combination: Use A, unless MU is high, use B

Domain specific, extensive experimentation No necessarily optimal set of goods, no learning

Summary: Open vs Closed

All else equal open-strategy better: Simple Avoids waiting costs (higher prices)

Predictability of price is determining factor Perfectly predictable – open-loop Large price variance – closed-loop

Open-loop picks the good at the start and may pay a lot Small price variance – optimal closed loop

But complexity for potentially small benefit

trading agent competition (tac) jon lerner, silas xu, wilfred yeung cs286r, 3 march 2004

Documents

future hotel prices

auctions prices

hotel bids

profit slide

hotel type

initial flight prices

single hotel

clearing prices