dtai thesis topics - ku leuven · travian: a massively multiplayer real-time strategy game...

Post on 23-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DTAI Thesis TopicsDept. Computer Science KU Leuven 2018-2019 http://people.cs.kuleuven.be/~luc.deraedt/dtaithesis18-19.pdf Luc De Raedt

2

Lab for Declarative Languages and Artificial Intelligence

Machine Learning

Declarative Languages and Systems

4 ZAP, 1 res. manager, 1 res. expert ± 4 post-docs ± 25 Ph.D. students

5 ZAP ± 3 post-docs ±12 Ph.D. students Bruynooghe

retired

AI is hot!Self-driving cars -

Eve (the robot scientist)

Siri

IBM Watson in Jeopardy and “Machine Reading”

AlphaGo — (Deep) learning …

3

DTAI's focus on AI

4

Machine Learning & Data Mining

how to extract knowledge from data

Uncertainty reasoning

how to represent and reason about uncertainty

Knowledge Representation

how to represent and reason about knowledge

DTAI's focus on Declarative Languages

5

Declarative = specify the what rather than the how

Different types of languages

Logic

Functional

Constraints

Probabilistic

DTAI's methodology involves

6

Fundamental research (theoretical as well as empirical)

Systems, Solvers and Software Applications

Thesis can focus on one or more aspects, depending on interests student

This presentation does not go in depth about techniques but every thesis does

This presentationOverview of research

illustrations of possible thesis topics.

List of contact persons for topics

Full information — see online

Own topic

should be aligned with interests professor

7

Research Topics

Probabilistic Programming and Statistical Relational Learning

Predictive Learning and Clustering

Graph and Network Mining

Static Analysis for Declarative Programming Languages

Exploratory Data MiningPrivacy, Non-discrimination and Ethical aspects

Knowledge-Base Systems Constraints

(dtai.cs.kuleuven.be/research)

Functional Programming

Automated Data Science

Verification of AI and ML

9

Sports Analytics

Text and Web

Health

Games

Applications

Engineering & Sensors

Humor (Comp. Creativity)

Robotics

Applications of the Knowledge Base System Paradigm

(dtai.cs.kuleuven.be/research)

Research topics

11

“Standard” machine learning develop new algorithms for machine learning

Decision Trees Predictive Clustering Probabilistic Graphical Models

evaluation of machine learning (ROC etc.)

Predictive learning and clustering

Contact: Hendrik Blockeel, Jesse Davis

12

Automated Data Science

Contact: Luc De Raedt, Anton Dries

Can we (partly) automate data science ? Can we automatically derive the right features ? the right representations ? Can we automatically discover what we can learn / predict ? Can we learn constraints ? Example

database about students, professors, courses, and marks …

The SYNTH project — the democratisation of Data Science the automation of Data Science

13

Automated Data Science

Contact: Luc De Raedt, Anton DriesInductive Programming

FlashFill in Excel

Motivation: Flash-fill

2 / 28

14

Automated Data Science

Contact: Luc De Raedt, Anton DriesLearning constraints Can we recover formulas from a CSV file?

I What are the formulas here?I

T1[:, 6] = SUM(T1[:, 3:5], row)

IT2[:, 2] = SUMIF(T1[:, 1]=T2[:, 1], T1[:, 6])

4 / 28

15

Key open question in AI — integrate

Probabilistic reasoning

Logical or relational representationsMachine learning

Contact: Luc De Raedt, Hendrik Blockeel, Jesse Davis, Gerda Janssens

statistical relational learning

Probabilistic Programming and Statistical Relational Learning

probabilistic programming

16

E.g. ProbLog: a probabilistic Prolog

P( hears_alarm(john) | burglary = true) ?

Challenges on inference, learning, implementation, application, ...

2 Bogdan Moldovan, Ingo Thon, Jesse Davis, and Luc de Raedt

disjoint sum problem in contrast to other PPLs (e.g., Prism) that make themutually exclusiveness assumption to avoid this NP-hard problem. The disjointsum problem arises when two proofs overlap. We solve this using the Karp andLuby algorithm [9]. Third, only those possible worlds that agree with the ev-idence are relevant for approximating the conditional probability. We employan AND/OR tree rooted at the evidence, representing all such possible worlds,and probabilistically traverse the tree to generate only those samples where theevidence holds. The AND/OR tree is needed to deal with ProbLog’s underlyingnon-deterministic nature, also distinguishing our approach from those applied tofunctional programming languages. Finally, in contrast to some other languages,we also provide support for numeric random variables and discrete distributions.

2 Background

We first review some basic concepts of logic programming: An atom pred(t1, ..., tn)consists of a predicate pred/n of arity n and t

i

terms. A term is either a (lower-case) constant, a (uppercase) variable, or a functor func/n applied on n terms.A definite clause is an expression of the form h b1, ..., bn, where h and the b

i

are atoms. It states that h is true whenever all bi

are true. If n is 0, we have a factf , which expresses that f is true. A substitution ✓ = {X1 = t1, ..., Xn

= tn

}maps each variable X

i

to a term ti

. Applying a substitution ✓ to an atom ayields a✓, in which each occurrence of X

i

in a is replaced with ti

.A ProbLog [12, 2] program consists of a set of labeled facts p

i

:: ci

, where pi

is a probability value and ci

a fact, and a set of definite clauses. Each groundinstance of such a fact represents a random variable that is true with probabilitypi

. We use the following ProbLog program as a running example in the paper:

0.05 :: burglary.

0.01 :: earthquake.

0.7 :: hears_alarm(john).

0.6 :: hears_alarm(mary).

alarm :- burglary.

alarm :- earthquake.

calls(Pers) :- alarm, hears_alarm(Pers).

It has the random variables: burglary, earthquake, hears alarm(john) andhears alarm(mary), and states that there is an alarm whenever there is burglaryor an earthquake. The last clause states that if there is an alarm and a personhears the alarm, that person will call.

To model univariate discrete distributions (e.g., uniform, Poisson), we alsoallow for discrete distribution probabilistic facts X ⇠ � :: f . X is a logicalvariable appearing in atom f and � a probability density function. For example,X ⇠ uniform(7) :: apples(X) specifies that apples(X) is true with X sampledfrom the set of integers between 1 and 7 with equal probability. Each grounding ofall the variables (except X) in f denotes a random variable. All random variables(discrete distributions or probabilistic facts) are marginally independent.

The semantics of the ProbLog program is then given by probability distribu-tions over subsets of the facts f

i

(called subprograms) and sample values for thenumeric variables in the uniform and Poisson distributions. Each ground proba-

Probabilistic Programming and Statistical Relational Learning

Action and activity learning / Dynamics

17

Travian: A massively multiplayer real-time strategy game

Commercial game run by TravianGames GmbH

~3.000.000 players spread over different “worlds”

[Thon et al. ECML 08]

Can we build a model of this world ? Can we use it for playing better ?

Probabilistic Programming and Statistical Relational Learning

18

Contact: Luc De Raedt, Hendrik Blockeel, Jesse Davis, Bettina Berendt & Wannes Meert

Verification of software has a long tradition (eg model checking techniques) How to verify systems that learn ? that use AI ? Our approach — combined principles of probabilistic logics with verification Topics

inductive synthesis of specifications Markov Decision Processes (& reinforcement leanring) Derive properties of learned systems …

Verifying AI & ML systems

19

Learn probabilistic - logic model

Shelf

push

Shelftap

Shelf

grasp

Moldovan et al. ICRA 12, 13, 14

RoboticsContact: Luc De Raedt

Contact: Luc De Raedt

Robotics (and Vision)

The visual genome

21

Contact: Bettina Berendt

Help users manage friends and privacy by

data mining

Socially Aware Data Mining

Focus on Privacy and (anti-discrimination)

Graph and Network Mining

Contact: Bettina Berendt, Jesse Davis Extraction of information from the web / social media

Taxonomy learning

Machine reading / Natural language processing

NaturalMachine reading …

22

Text and Web

Contact: Marc Denecker, Gerda Janssens

IDP

Advanced KBS system developed by group

FO(.) language rooted in predicate logic and logic programming

separation of domain knowledge and problem solving

Language extensions to increase expressivity

E.g. design patterns for FO(.) (past thesis)

Better solvers and more inference methods

E.g. a solver for rational numbers (past thesis)

23

Knowledge-Base Systems

Contact: Marc Denecker, Gerda Janssens

Three themes for students :

logical modeling of interesting AI problem + expressing AI knowledge domains

logical analysis and implementation of software systems and tasks + software  by applying inference on specifications

Advanced algorithmics and implementation + extending/optimising the IDP software package.

24

Knowledge-Base Systems

Analysing medieval manuscriptsDAG coloring & extension

- monks copied texts - resulting in variants (colors) - reconstruct history

vocabulary Vms { extern vocabulary V IsSource(Manuscript )}

theory Tms : Vms { { ! x : IsSource(x) <- ~ ? y : CopiedBy(y, x) & VariantIn(y) = VariantIn(x). }}

term NbOfSources : Vms {#{ x : IsSource(x) }}

procedure minSources(feature) { setvocabulary(feature, Vms) return minimize(Tms, feature, NbOfSources)[1]}

25

- special-purpose datamining-program: 400 lines of Perl, bugs - description problem in IDP: 15 lines, correct, somewhat faster

Logical modeling of AI problems

Applications of the Knowledge Base System Paradigm

Contact: Marc Denecker, Gerda Janssens

Software = Knowledge Base + Logical Inference + User Interface

E.g., An interactive configuration system for an insurance company

AIM : Build cheap, correct, reusable, maintainable software from a  logical specification

26

Applications of the Knowledge Base System Paradigm

27

Applications of the Knowledge Base System Paradigm Winning the RuleML Challenge

Insurance application

Propagation constraints

and choices

Fill out necessary values

Contact: Marc Denecker, Gerda Janssens

Advanced algorithmics and implementation + extending/optimising the IDP software package.

help us win the next CP or ASP competition

+ E.g., structuring search space as a hierarchy of search problems

+ E.g., linear programming techniques in IDP

+ E.g., improved computation of definitions

+ E.g., algorithms for revision inference (updating solutions)

28

Knowledge-Base Systems

Contact: Tom Schrijvers, Marc Denecker, & Luc De Raedt

29

Constraints

• Hyper heuristics to solve constraint satisfaction and optimization problems — formalisation

• Search Heuristics • Role in IDP • Role in Data Mining • Learning of constraints

30

Functional Programming Contact: Tom Schrijvers

Functional Programming

Haskell

★ Explicit Side-Effects

★ Advanced Type Systems

★ Domain-Specific Languages

★ Much more…

Monads Effect HandlersTransformers

Type Classes Polymorphism Kinds

Design Infrastructure Applications

UITLEG:

Je kent Functional Programming van de taal Haskell uit het vak Declaratieve Talen.

Op onderzoeksgebied werken we rond alle aspecten van functionele talen, en Haskell in het bijzonder.

Actuele onderwerpen zijn:- expliciete side-effects zoals monads,- gevorderde type system features- domein-specifieke talen- en nog veel meer

31

Functional Programming

Anonymous Functions

λ calculus

1936

Alonzo

Church

1958Lisp

John

McCarthy

1973ML

Robin

Milner

1987Haskell

Haskell

Committee

2014Java 8

Swift2011

C++11

Functional Languages Mainstream

2007C#

12

FP now

mainstream

Widespread AdoptionEarly Adopters

Haskell Language + GHC Compiler

Finance Many OthersTelecom

25

Haskell in

industry

UITLEG:

Heel wat interessante uitdagingen komen voort uit de groeiende mainstream adoptie van Functional Programming.

Hoe langer hoe meer bedrijven gaan aan de slag met functionele talen zoals Haskell en F# (F-sharp),

en mainstream talen zoals Java en C# adopteren functionele concepten.

201: The Oracle of Haskell

32

Functional Programming

GHC compiler

abs x | x >= 0 = x | x < 0 = -x

your oracle

✓exhaustive guardsUITLEG:

ontwikkel een orakel dat nagaat of guards in Haskell-programma’s alle gevallen dekken

Static Analysis for Declarative Programming Languages Contact: Tom Schrijvers, Gerda Janssens

33

Declarative Programming Languages

★ Type Checking

★ Termination Analysis

★ Reasoning about Coroutines

UITLEG:

Je kent de Declaratieve Taal Prolog uit het vak Declaratieve Talen.

Op onderzoeksgebied werken we rond de automatische analyse van Prolog-programma’s.

Actuele onderwerpen zijn:- een type checker om Prolog statisch getypeerd te maken- de eindigheid van programma’s te bepalen- analyseren van complexe control flow zoals coroutines

34

Declarative Programming LanguagesAutomatically Inferring

Properties of Interest

append([],L,L).append([X|Xs],Ys,[X|Zs]) :- append(Xs,Ys,Zs).

powerful dynamic flexible

optimisation correctness termination

UITLEG:

Delcaratieve talen zoals Prolog zijn heel krachtig, dynamisch en flexibel.

De uitdaging bestaat erin om automatisch belangrijke eigenschappen af te leiden van Prolog programma’s om na te gaan of ze correct zijn, altijd eindigen en hoe je ze efficient kan compileren.

bugs

Delcarative Programming LanguagesIndustrial-Strength

Static Types for Prolog

35

Prosyn Expert System

1 MegaLoC

Prolog

Case Study: Industrial Partner

Prolog Program Types+

your type checker

UITLEG:

Prolog is een ongetypeerde taal. Hierdoor is het makkelijke om via schrijffouten moeilijk op te sporen bugs te introduceren.

In deze thesis ontwikkel je een type systeem voor Prolog:De programmeur schrijft type-signaturen voor zijn predikaten, en jouw type checker gebruikt die om bugs op te sporen.

Je evalueert je type checker op het Prosyn expert systeem van onze industriele partner. Dat bestaat uit 1 miljoen lijnen Prolog code.

Application Areas

 

37

Contact: Wannes Meert

Industry

Introduction • Airplanes collect many flight

parameters • Airplane health & reliability

extremely important • BUT: Ground maintenance

checks cost flying time • Automating diagnostics and

predicting when the airplane will need repairs = win-win

3

Image source: http://www.b737.org.uk/737ng.htm

Theses with: Boeing Jetairfly

EuroMillions Basketball League 3E

Sirris Thomson-Reuters

Xenit Pepite Melexis

Flanders Make imec Cern …

Questions? Sources: • http://www.b737.org.uk/737ng.htm • Anomaly Detection Based on Aggregation of Indicators, T. Rabenoro & J.

Lacaille, Proceedings of 23rd annual Belgian-Dutch Conference on Machine Learning (Benelearn 2014), p 64-71

• http://techcrunch.com/2010/03/17/today-in-history-the-flight-data-recorder/ • Boeing 737 Bleed Air System Video • Boeing 737NG Aircraft Maintenance Manual, Part I: System Description

Section (SDS), ATA Chapters 21, 36, Boeing Proprietary (not published)

4.2. Estimating the skeleton configuration

Figure 4.5: bad skeleton particle Figure 4.6: good skeleton particle

Figure 4.7: particles after one iteration Figure 4.8: after two iterations

Correction to obtain actual knee position

As discussed in section 3.1.2 the point clouds do not correspond to the actual limbs,but rather to the surfaces of the limbs facing the camera. So if we were to take theendpoint of the cylinders as the limb positions, the points would correspond to thetop points. Figure 4.11 shows the positions of the markers on the leg during themeasurement with the Vicon system. The knee marker is not located on the "top"of the knee, to which the end points of our cylinders correspond. To take this into

25

38

Machine Learning for sports

Soccer & basketball

E-sports

Sports Analytics

Contact: Jesse Davis

TasksStrategy detection

Performance analysis & prediction

Scouting

39

Sports Analytics

Thesis Topics Soccer analytics

Model flow of a game

Quantify team performance

Learn aging curves of players Basketball analytics

Detect surprising events

40

Sports Analytics

TasksContinuous monitoring

Injury risk profiles

41

Health

Thesis topics

Performance management and Injury prevention

Sensor fusion for surface detection and skill detection in runners

Kinect monitoring for qualitative feedback during rehabilitation

42

Health

43

Engineering & SensorsContact: Wannes Meert, Jesse Davis

43

http://www.pom-sbo.orghttp://www.pom2sbo.org

ejector

Mebios-KU Leuven setup

Engineering & SensorsContact: Wannes Meert, Jesse Davis, Luc De Raedt

Badminton-spelende robot: http://www.youtube.com/watch?v=StPZLZq01Xs

See also Health & Sports

Engineering & SensorsContact: Wannes Meert, Jesse Davis, Luc De Raedt

Large Hadron Collider maintenance (CERN)

Introduction • Airplanes collect many flight

parameters • Airplane health & reliability

extremely important • BUT: Ground maintenance

checks cost flying time • Automating maintenance

checkups by letting the airplane diagnose itself using its data = win-win

Image source: http://www.b737.org.uk/737ng.htm

Analysing data from airplanes

Games

learning to solve science tests formulated in natural language (like SAT, GMAT, GRE, …)

Tests as a testbed for intelligent behavior, for “reasoning”

Allen AI Institute, Levesque’s Winograd test, IBM Watson …

46

AI ChallengesContact: Luc De Raedt, Jesse Davis, Anton Dries, Hendrik Blockeel

Three machines A, B and C produce 50 percent, 30

percent and 20 percent of the total production respectively.

The percentage of defective pieces is 3 percent, 4 percent

and 5 percent respectively. One chooses a piece. It is

defective. What is the probability that it originates from

machine A?

Problem

A die is thrown 3 times.

Find the probability that the sum of

the dots is at least 5.Mike has a bag with 4 red marbles and 3

green marbles. He takes one marble from

the bag and it is red. W

hat is the probability

that the second marble he takes fro

m the

bag is also red?

In a group of 10 people, 60 percent have

brown eyes. Two people are selected from

the group. What is the probability that

neither of them has brown eyes?

Suppose 0.1 percent of the population is

infected with a certain disease. On a

medical test for the disease, 98 percent of

those infected give a positive result while 1

percent of those not infected give a positive

result. If a randomly chosen person is

tested and gives a positive result, what is

the probability the person has the disease?

A gin hand consists of 10 cards from a deck

of 52 cards, containing 13 hearts, 13

diamonds, 13 clubs, and 13 spades. Find

the probability that a gin hand has all 10

cards of the same suit.

GOAL: solve the problem directly

from text

48

Luc De Raedt

Artificial intelligence, reasoning about uncertainty, action- and activity learning, machine learning, data mining, constraint programming, probabilistic programming (ProbLog), automated data science, language for mining and learning. Applications in natural language, vision, robotics, automatic programming. Verification of AI and ML.

Hendrik Blockeel

Machine learning, data mining, probabilistic logics, declarative languages for data mining.  

Application domains include bio-informatics, arts, history, compiler development, optimization.

Jesse Davis

Machine learning, data mining for personalized medicine. Artificial intelligence, statistical relational learning, transfer learning

Applications in healthcare (e.g., clinical practice, physical therapy, medical and biological texts, etc.). Applications to sport (e.g., football and basketball)

Bettina Berendt Web mining, privacy, social media, user issues

Wannes Meert Probabilistic programming and methods. Data Science Applications. Applications in engineering. Collaborations with industry.

Anton DriesConstraint programming, probabilistic programming, data mining. Automated Data Science.

Design and implementation of AI systems and their applications.

49

Gerda JanssensPerformant probabilistic ILP data mining systems, integration of logic programming techniques in the knowledge representation language FO(.), program analysis and abstract interpretation, implementations of logic programs, verification of functional equivalence of C programs

Danny De Schreye Computational creativity in Humor

Marc Denecker

Constraint programming, Knowledge Base Systems, SAT solving, declarative languages (formal modelling languages),

Applications in configuration, scheduling, optimization, security, business rule systems, executable formal software specifications, logical workflow languages.

Tom Schrijversfunctional programming, constraint and logic programming, type systems, programming language theory, programming language design and implementation, program analysis

Check out dtai-web for more details

Questions ?Advisable to contact promotors or daily advisors before selecting a topic Also, attend thesis info market after Easter Holidays

top related