a theory of universal ai - uic computer sciencepiotr/cs594/prashant-universalai.pdfa theory of...

A Theory of Universal AI

LiteratureMarcus Hutter

Kircherr, Li, and Vitanyi

PresenterPrashant J. Doshi

CS594: Optimal Decision Making

A Theory of Universal Artificial Intelligence – p.1/18

Roadmap

Claim

Background

Key Concepts

The AI � Model in Functional Form

The AI � Model in Recursive Form

The Universal AI

�

Model

Example Constants and Limits

ApplicationSequential Decision Theory

Conclusions


Claim

Development of a universally optimal AI model

Universal � parameterless, unbiased, model-free

Optimal � No other program can learn or solve the taskfaster


BackgroundDecision Theory

Solves the problem of rational agent behaviour in uncertainworlds given an environment

Known prior probability distribution � over the environment

Solomonoff’s Universal InductionSolves the problem of sequence prediction for an unknown priordistribution

Predict the continuation �� of a given binary sequence ��

� ��


Background

Solomonoff’s result:� expected Euclidean distance between

�

and � is finite�� "! � � �� # $&% ' ( )+* (-, . � �

Convergence )/10� 2 � � �� 43 � ��

�

is the Universal Probability Distribution� � � 576 (� 8 9�: ; , < � �

� � � � (� 8 9� ; $>= . � �

is the Kolmogorov Complexity of �


Key Concepts

The Cybernetic or Agent Model

?@ A B C D B

|| EGFH I J ? KML N I O

P is partial function / chronological Turing machineQ @ D B C A B|| L FH I J Q K ERFH I O

S is partial function / chronological Turing machine


Key Concepts

History :

TVU W X U W YZ T[ X [ T\ X \ T] X ]_^ ^ ^ T Wa` [ X Wa` [Probability of input given the history

b c X W d T U W X U W T W e Z b c TG[f W X [f W e

b c T U W X U W e

b c Tg[f W X [f W e Z b c X W d T U W X U W T W e b c X W` [ d T U Wa` [ X U Wa` [ T W` [ e^ ^ ^

^ ^ ^ b c X \ d T U \ X U \ T\ e b c X [ d T[ e


The AI Model in Functional Form

Task: Derive h i

which maximizes the total credit over apredefined lifetime(T)

For a known deterministic environment jk �ml _npo j �rq � lts � � u � s�

Optimal policy

n v q � w ux 0 w �y k �z _npo j �|{ k �z n vo j � 6 k �z n o j �} n

For a prior distribution over environments � j �

Let

~�� q � � jq j ~4� � �� ~ �� be the set of all environments thatproduces the history

~ � � �~ �� q � ~ � � ~ �� ~ � # ~ � # ~ � �� ~ � �� k � �ml _n � ~ � � �~ �� rq � � � �� j � k �ml npo j �


The AI Model in Functional Form

AI � maximizes expected future reward over the next� W Z � W� �g� �

(horizon) cyclesOptimal policyn v �q � w ux 0 w �y� y 9 �� ; � � � � � � � k � �ml _n � ~ � � �~ ��

n v ~ �� ~ � � �� q � n v � ~ �� n v �� ~ �� n v� �

Best output

~ � �q � w ux 0 w �� 0 w �y k � � l n � ~ � � �~ ��

h i

is computable if�

,�

and � W are finite


The AI Model in Recursive Form

Task: Derive expected reward sum in cycles�

to �using expected reward sum in cycles

�� to �k � �ml ~ � � �~ �� q � � � � u � �� k v ��$�a� l ~ � � � �~ �� ~ � � �~ ��

Optimal expected rewardk v ��ml ~ � � �~ �� q � 0 w �� k � �ml ~ � � �~ ��

k v �� l ~ � � �~ �� q � 0 w �� u � �� k v ��$� � l ~ � � �~ �� ~ � � �~ ��

�V��

output~ � �q � w ux 0 w �� k � �ml ~ � � �~ �� Expectimax sequence~ � �q � w ux 0 w �� 0 w �� 0 w �� u � �� u � l ��

� � �� l � ~ � � �~ �� l �


The AI Model in Recursive Form

Functional AI � Recursive AI �

� c TR[f W X [f W e¢¡£f £ ¤¦¥§G¨ © ªp« ¬§G¨ ©

� cM e


The Universal AI Model

Task: Replace the true but unknown prior probability �

with

�

In the Functional AI � model

~ � �q � w ux 0 w �� 0 w �y� y 9 �� ; � �� 9 �� ; � �� j � k �ml _npo j �

®

~ � �q � w ux 0 w �� 0 w �y� y 9 �� ; � �� 9 �� ; � �� (� 8 9 � ; k �ml _npo j �



Task: Replace the true but unknown prior probability �

with

�

In the Recursive AI

�

model

~ � �q � w ux 0 w �� 0 w �� 0 w �� u � �� u � l ��

� � �� l � ~ � � �~ �� l �®

~ � �q � w ux 0 w �� 0 w �� 0 w �� u � �� u � l ��

� � �� l � ~ � � �~ �� l �



Task: Show the convergence of AI

�

to AI �Utilize the Solomonoff’s result generalized from

¯= 2 to an

arbitrary alphabet� sare pure spectators�� ! � � �� # $t% '( )+* (, . � �

)/0 � 2 � � � �� 3 � � ��

Outputs

° T W of the AI�

model converge to the outputs

° T W

of the AI � model atleast for the bounded horizon� W Z �� A Theory of Universal Artificial Intelligence – p.14/18

Example Constants and Limits

1

± ² T W X W ³ � ´ µ d �·¶ � d1

¸ ¹ [º » ¹ \ ¼ ½ ´ ¹ ] \ ¾ ¹º ¿ ¿ ]º

(a) The agents interface is wide

(b) The interface can be sufficiently explored

(c) The death is far away

(d) Most input/output combinations do not occur

These limits are never used in proofs but we are only interested in theorems which do not degenerateunder the above limits


Application

Sequential Decision Theory (MDP)Bellman equation for optimal policy

n v / �� w ux 0 w �À Á Â À sÁ Ã v Ä �o Ã v / �� Å / �� 0 w �À Á Â À sÁ Ã v Ä �

Apply the AI � model

ÆÈÇ ÉÊ Ë Ì ÍÏÎ Ð ÇÊ Î Ñ Ç Ò © Î Ó ÔÖÕ× Ç Ø É+Ù © Ú Ò>Û © Ù Û © Ò © Í

Ü Ç Ò Û © Ù Û © Î Ý É Ü Í Ç Þ ÉÙ ©+ß § ÍÏÎ à á É Ü Í Ç â á©+ß §äã å É Ò>Û © Ù Û © Í Ç Þ ÉÙ ©+ß § Í�æ â á© å É Ò>Û © Ù Û © Í

ç Ç Ò§¨ © Ù §¨ © Î Ý É ç Í Ç Þ É+Ù © ÍÏÎ à á É ç Í Ç â á© å É Ò§¨ © Ù §¨ © Í Ç Þ ÉÙ © Íæ â á©è § ã å É Ò§¨ © Ù §G¨ © Í


Application

ObservationsWe use the complete history as the environment state

The AI � model does not assumeMarkovian propertystationary environmentaccessible environment

Other applicationsGame Playing

Function Minimization

Supervised Learning

Bold Claim: AI � is the most general model


Conclusion

A parameterless model of AI based on Decision Theoryand Algorithmic Probability is presented

Makes minimal assumptions about the environment

Is the AI

�

model computable?

Future WorkDerive value and reward bounds for AI

�

model

Apply AI

�

model to more problem classes


a theory of universal ai - uic computer sciencepiotr/cs594/prashant-universalai.pdfa theory of...

Documents