machine learning basic concepts

35
Machine Learning – Basic Concepts Ute Schmid Cognitive Systems, Applied Computer Science, University of Bamberg www.uni-bamberg.de/cogsys Last change: November 4, 2020 U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 1 / 35

Upload: others

Post on 02-Mar-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Machine Learning – Basic Concepts

Ute Schmid

Cognitive Systems, Applied Computer Science, University of Bambergwww.uni-bamberg.de/cogsys

Last change: November 4, 2020

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 1 / 35

Artificial Intelligence (AI) and Machine Learning (ML)If we are ever to make claims of creating an artificial intelligence, we mustaddress issues in natural language, automated reasoning, and machinelearning. (George F. Luger)

↪→ ML is one topic among many in AI↪→ neural networks are one family of ML algorithms among many

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 2 / 35

1st AI Winter1974–1980

2nd AI Winter1987–1993

Winter without end2000-2008

AI under other namesCognitive SystemsIntelligent Systems/Agents

Big Bang ofDeep Learning2008

Lighthill ReportAI works only forToy problems

Knowledge Engineering BottleneckFailure of 5th Generation Pj

AI ProgramsGame playing, Learning,Problem solving, planing,Text understanding, machinetranslation

Knowledge-based SystemsAutomated inferenceExpert systemsLisp, Prolog

1994IBM‘s Deep BlueChess Champion

2011IBM‘s Watson wins Jeopardy

2016Google DeepMindAlphaGo

2012 Google Brain recogn.Image of a cat

1956

Machine LearningRandom forest, neural networks,SVM, Naive Bayes, AdaBoost

1st Wave: DescribeHandcrafted knowledge

2nd Wave: CategorizeStatistical learning

3rd Wave: ExplainContextual adaptationHuman-centered AI

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 3 / 35

AI PerspectiveAI: Computational prinicples to generate general intelligent behavior

I automated reasoningI knowledge representationI planningI autonomous agentsI language understandingI machine learning ←↩ one principle among others

Machine Learning in AII Arthur Samuel: programs which play checkers (started 1952, at IBM)I Donald Michie: Reinforcement Learning for Tic-tac-toe (1963)I Gerald Dejong: Explanation Based Learning (1981)

a computer algorithm analyses data and creates a general rule it canfollow and discard unimportant data

I Stephen Muggleton: Inductive Logic Programming (1991)https://www.ibm.com/ibm/

history/ibm100/images/icp/

A138918I23240Y22/us__en_

us__ibm100__700_series_

_checkers__620x350.jpg,https://miro.

medium.com/max/768/

0*oyFmAciLmqLVxeB0.jpg

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 4 / 35

Origins and Application Domains of Machine Learning

Data Science

Artificial Intelligence

Signal Processing

Machine Learning

Pattern Recognition Machine Learning

Data Mining/Big Data

Data Bases

Origins of ML

Applications of ML

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 5 / 35

ML as Multidisciplinary FieldMachine learning is inherently a multidisciplinary field

artificial intelligence

probability theory, statistics

computational complexity theory

information theory

philosophy

psychology

neurobiology

. . .

ML and Big DataI ML offers useful methods to analyze big/huge data setsI more data does not in geral mean better resultsI learn from relevant data – save material and energy resourcesI see our DFG project Dare2Del

ML and Data ScienceI Data science is an interdisciplin of statistics, databases, information

retrieval, and machine learning

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 6 / 35

Human Learning

Learning is crucial for flexibility and adaptability (survival in changingenvironments)

Learning is improvement of knowledge and skills from experience (incontrast to maturation)

Types of learningI Rote learning, learning by heartI Concept learning (cat, table, save behavior)I Skill acquisition (riding a bike, playing piano)I Strategy learning (making a mathematical proof, writing a computer

program, planning a birthday party)I Meta-learning (select a suitable learning stratregy, evaluate own

performance)

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 7 / 35

What is Machine Learning?

How can we make computers to learn from experience?

A computer program is said to learn from experience E with respect tosome class of tasks T and performance measure P, if its performance attasks in T , as measured by P, improves with experience E .

(Tom Mitchell, 1997)

Experience: Data represented as feature sets (e.g. demographicaldata and medical history); images (bitmaps); moves in a game;observations (e.g., sequences of events); . . .

Task: Learn to classify risk/no risk of getting some illness; learn torecognize objects (cat); learn to play chess; learn to predict next event

Performance measure: Typically some accuracy measure (percentcorrect estimate for unseen data!), Michie: also comprehensibilityand ‘operational effectiveness’ of learned models for humans

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 8 / 35

Merit of Machine Learning

Great practical value in many application domains

Data Mining: large databases may contain valuable implicitregularities that can be discovered automatically (outcomes ofmedical treatments, consumer preferences)

Poorly understood domains where humans might not have theknowledge needed to develop efficient algorithms (human facerecognition from images)

Domains where the program must dynamically adapt to changingconditions (controlling manufacturing processes under changingsupply stocks)

Autonomous Systems (robots, cars)

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 9 / 35

Types of Machine Learning

Supervised: preclassified training datalearn a mapping function f : X → Y(machine learning is function approximation)

Unsupervised: find groups in collection of data based on somesimilarity measure (Clustering)

Policy Learning (Reinforcement Learning, Inductive Programming)

Types of Tasks

Classification: Y is two-valued (concept learning)Y is a finite set of discrete values (categories)

Regression: Y is a metric value

Generating actions

Reinforcement Learning is incremental learning, other approaches aretypically batch learning.

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 10 / 35

Concept Learning

Humans can learning some types of concepts from very few examplese.g., regular past tense

Can machines learning also from very few examples?Yes, some of the classic, symbol-level approaches

Josh Tenenbaumhttp://pinouchon.github.

io/images/tufa.png

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 11 / 35

Human Concept Learning

Human concept acquisition is very powerful:I fast and implicit identification of relevant features and relationsI flexible classification in different contextsI generalization from few (sometimes only one) example

This same ability lies behind stereotyping, that is, unjustifiedovergeneralizations!(women cannot park cars, men cannot listen, Scots are stingy)↪→ see inductive bias

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 12 / 35

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 13 / 35

Learning as Function Approximation

http://en.proft.me/2015/12/24/types-machine-learning-algorithms/

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 14 / 35

Knowledge-based vs. Learning Systems

Knowledge-based Systems: Acquisition and modeling of common-senseknowledge and expert knowledge

⇒ limited to given knowledge base and rule set⇒ Inference: Deduction generates no new knowledge but

makes implicitly given knowledge explicit⇒ Top-Down: from rules to facts

Learning Systems: Extraction of knowledge and rules fromexamples/experience

Teach the system vs. program the systemLearning as inductive process

⇒ Bottom-Up: from facts to rules

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 15 / 35

Learning as Approach to Overcome the KnowledgeAcquisition Bottleneck

(Feigenbaum, 1983)

Break-through in computer chess withDeep Blue:Evaluation function of chess grandmasterJoel Benjamin. Deep Blue cannot changethe evaluation function by itself!

Experts are often not able to verbalizetheir special knowledge.⇒ Indirect methods:Extraction of knowledge from expertbehavior in example situations(diagnosis of X-rays, controlling achemical plant, ...)

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 16 / 35

Learning as InductionDeduction InductionAll humans are mortal. (Axiom) Socrates is human. (Background K.)Socrates is human. (Fact) Socrates is mortal. (Observation(s))

Conclusion: Generalization:Socrates is mortal. All humans are mortal.

Mortal Beings

Humans

* Socrates

Induction generates hypothesesnot knowledge!

Deduction: from general to specific ⇒ proven correctness

Induction: from specific to general ⇒ (unproven) knowledge gain

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 17 / 35

Epistemological Problems

⇒ pragmatic solutions

Confirmation Theory:A hypothesis obtained by generalization gets supported by newobservations (not proven!). (Nelson Goodman)

Grue Paradox :All emeralds are grue.Something is grue, if itis green before a futuretime t and bluethereafter.⇒ Not learnable fromexamples!

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 18 / 35

Inductive Learning Hypothesis

Inductive learning is not proven correct

The learning task is to determine a hypothesis h ∈ H identical to thetarget concept c for all possible instances in instance space X

(∀x ∈ X )[h(x) = c(x)]

Only training examples D ⊂ X are available

Inductive algorithms can at best guarantee that the output hypothesish fits the target concept over D

(∀x ∈ D)[h(x) = c(x)]

Inductive Learning Hypothesis: Any hypothesis found toapproximate the target concept well over a sufficiently large set oftraining examples will also approximate the target function well overother unobserved examples

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 19 / 35

Performance Evaluation

How good is the learned hypothesis?

It is not relevant how good a learned model performes on the trainingexamples but how good it performes on unseen instances!

Typical approach: k-fold cross validation to detect possible overfitting

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 20 / 35

There is no Bias-free Learning

Generalization learning is only possible, if the learning system has aninductive bias.

I Restriction/language bias: not every model can be expressed by thegiven hypothesis language

I Preference/search bias: Typically learning algorithms are based on agreedy search stratregy in the hypothesis space; the bias directs searchand influences which model is learned

A problem independent of the selected ML algorithm is the samplingbias: how representative are the training data for the (infinite) set ofall possible instances of the domain

Time and again, research has shown that the machines we build reflect how we see the world,

whether consciously or not. For artificial intelligence that reads text, that might mean

associating the word “doctor” with men more than women, or image-recognition algorithms that

misclassify black people as gorillas. (August 28, 2017)

https://qz.com/1064035/

google-goog-explains-how-artificial-intelligence-becomes-biased-against-women-and-minorities/

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 21 / 35

Concept and Classification Learning

Concept learning:

Objects are clustered in concepts.Extensional:

(infinite) set X of all exemplarsIntentional: finite characterization

T = {x | has-3/4-legs(x), has-top(x)}

Construction of a finite characterization from a subset of examples inX (“training set” D).

h : X → {0, 1} c(x) ∈ {0, 1}

Natural extended to classes:

Identification of relevant attributes and their interrelation, whichcharacterize an object as member of a class.

h : X → K c(x) ∈ {k1, . . . , kn}

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 22 / 35

Constituents of Classification Learning

A set of training examples D ⊂ XEach example is represented by an n-ary feature vector x ∈ Xand associated with a class c(x) ∈ K : 〈x , c(x)〉A learning algorithm constructing a hypothesis h ∈ H

A set of new objects, also represented by feature vectors which canbe classified according to h

Examples for features and values

Sky ∈ {sunny, rainy} ← categorial, binary

AirTemp ∈ {warm, cold}Humidity ∈ {normal, high}Outlook ∈ {sunny, cloudy, rainy} ← categorial, arbitrary number of values

WaterTemp ∈ [4, . . . , 40] ← numerical (natural, integer, real)

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 23 / 35

Examples of Concept Learning

↪→ can be found in the UCL Machine Learning Repository

Risk of cardiac arrest yes/no, given medical data

Credit-worthiness of customer yes/no, given personal andcustomer data

Safe chemical process yes/no, given physical and chemicalmeasurements

Days on which our friend Aldo can be found on the tennis courtenjoying his favorite game (yes/no), given weather conditions(running example – playtennis)

Generalization of pre-classified example data, applicationfor prognosis

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 24 / 35

Learning Problems which are not Concept Learning

Handwriting recognition

Play checkers

Robot driving

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 25 / 35

Designing a Learning System

1 Choosing the Training ExperienceI direct or indirect feedbackI degree to which the learner controls the sequence of training examplesI representativity of the distribution of the training examples⇒ significant impact on success or failure

2 Choosing the Target FunctionI determine what type of knowledge will be learnedI most obvious form is some kind of combination of feature values which

can be associated with a class (word/letter)

3 Choosing a Representation for the Target FunctionI e.g., a large table, a set of rules, a linear function, an arbitrary function

4 Choosing a Learning AlgorithmI Decision Tree, Multi-Layer Perceptron, . . .

5 Presenting Training ExamplesI all at onceI incrementally

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 26 / 35

Notation

Instance Space X : set of all possible examples over which theconcept is defined (possibly attribute vectors)

Target Concept c : X → {0, 1}: concept or function to be learnedTarget Class c : X → {k1, . . . , kn}

Training Example x ∈ X of the form < x , c(x) >

Training Set D: set of all available training examples

Hypothesis Space H: set of all possible hypotheses according tothe hypothesis language

Hypothesis h ∈ H: boolean valued function of the formX → {0, 1} or X → K

⇒ the goal is to find a h ∈ H, such that (∀x ∈ X )[h(x) = c(x)]

An induced hypothesis is often also called a model.

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 27 / 35

Hypothesis Language

H is determined by the predefined language in which hypotheses canbe formulated

e.g.: Conjunctions of feature valuesvs. Disjunction of conjunctionsvs. Matrix of real numbersvs. Horn clauses...

Hypothesis language and learning algorithm are highly interdependent

Each hypothesis language implies a bias!

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 28 / 35

Properties of Hypotheses

general-to-specific ordering

I naturally occurring order over H

I learning algorithms can be designed to search H exhaustively withoutexplicitly enumerating each hypothesis h

I hi is more general or equal to hk (written hi ≥g hk)

⇔ (∀x ∈ X )[(hk(x) = 1)→ (hi (x) = 1)]

I hi is (strictly) more general to hk (written hi >g hk)

⇔ (hi ≥g hk) ∧ (hk �g hi )

I ≥g defines a partial ordering over the Hypothesis Space H

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 29 / 35

Running Example – playtennis

example target concept Enjoy :“days on which Aldo enjoys his favorite sport”

set of example days D, each represented by a set of attributes

Example Sky AirTemp Humidity Wind Water Forecast Enjoy1 Sunny Warm Normal Strong Warm Same Yes2 Sunny Warm High Strong Warm Same Yes3 Rainy Cold High Strong Warm Change No4 Sunny Warm High Strong Cool Change Yes

the task is to learn to predict the value of Enjoy for an arbitrary day,based on the values of its other attributes

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 30 / 35

Properties of Hypotheses - Example

h1 = Aldo loves playing Tennis if the sky is sunnyh2 = Aldo loves playing Tennis if the water is warmh3 = Aldo loves playing Tennis if the sky is sunny and the water is warm

⇒ h1 >g h3, h2 >g h3, h2 6>g h1, h1 6>g h2

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 31 / 35

Properties of Hypotheses

consistency

I a hypothesis h is consistent with a set of training examples D iffh(x) = c(x) for each example < x , c(x) > in D

Consistent(h,D) ≡ (∀ < x , c(x) >∈ D)[h(x) = c(x)]

I that is, every example in D is classified correctly by the hypothesis

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 32 / 35

Properties of Hypotheses - Example

h1 is consistent with D

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 33 / 35

Learning Involves Search

Searching through a space of possible hypotheses to find thehypothesis that best fits the available training examples and otherprior constraints or knowledge

Different learning methods search different hypothesis spaces

Learning methods can be characterized by the conditions under whichthese search methods converge toward an “optimal” hypothesis

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 34 / 35

Summary

Machine learning (ML) is automated knowledge acquisition and improvement

Typically, ML is a process of inductive reasoning. In contrast to deductive knowledgeextraction, ML means acquistion of new, generalized, hypothetical knowledge from sampleexperience.

The inductive learning hypothesis states that if a hypothesis approximates a targetconcept reasonably well over the training examples, it will also work reasonably well overunobserved examples.

Concept learning is a special case of classification learning with only two classes (belongsto concept/does not belong to concept).

Important concepts of ML are: Instance space and hypothesis space, training set andtarget class.

Some hypothesis languages allow a general-to-specific ordering of hypotheses.

A hypothesis is called consistent with a training set if all examples can be classifiedcorrectly (in many cases, we do not want to learn such overfitting hypotheses, as we willdiscuss later).

In general, ML can be characterized as search in hypothesis space.

U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 35 / 35