machine learning basic concepts
TRANSCRIPT
Machine Learning – Basic Concepts
Ute Schmid
Cognitive Systems, Applied Computer Science, University of Bambergwww.uni-bamberg.de/cogsys
Last change: November 4, 2020
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 1 / 35
Artificial Intelligence (AI) and Machine Learning (ML)If we are ever to make claims of creating an artificial intelligence, we mustaddress issues in natural language, automated reasoning, and machinelearning. (George F. Luger)
↪→ ML is one topic among many in AI↪→ neural networks are one family of ML algorithms among many
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 2 / 35
1st AI Winter1974–1980
2nd AI Winter1987–1993
Winter without end2000-2008
AI under other namesCognitive SystemsIntelligent Systems/Agents
Big Bang ofDeep Learning2008
Lighthill ReportAI works only forToy problems
Knowledge Engineering BottleneckFailure of 5th Generation Pj
AI ProgramsGame playing, Learning,Problem solving, planing,Text understanding, machinetranslation
Knowledge-based SystemsAutomated inferenceExpert systemsLisp, Prolog
1994IBM‘s Deep BlueChess Champion
2011IBM‘s Watson wins Jeopardy
2016Google DeepMindAlphaGo
2012 Google Brain recogn.Image of a cat
1956
Machine LearningRandom forest, neural networks,SVM, Naive Bayes, AdaBoost
1st Wave: DescribeHandcrafted knowledge
2nd Wave: CategorizeStatistical learning
3rd Wave: ExplainContextual adaptationHuman-centered AI
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 3 / 35
AI PerspectiveAI: Computational prinicples to generate general intelligent behavior
I automated reasoningI knowledge representationI planningI autonomous agentsI language understandingI machine learning ←↩ one principle among others
Machine Learning in AII Arthur Samuel: programs which play checkers (started 1952, at IBM)I Donald Michie: Reinforcement Learning for Tic-tac-toe (1963)I Gerald Dejong: Explanation Based Learning (1981)
a computer algorithm analyses data and creates a general rule it canfollow and discard unimportant data
I Stephen Muggleton: Inductive Logic Programming (1991)https://www.ibm.com/ibm/
history/ibm100/images/icp/
A138918I23240Y22/us__en_
us__ibm100__700_series_
_checkers__620x350.jpg,https://miro.
medium.com/max/768/
0*oyFmAciLmqLVxeB0.jpg
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 4 / 35
Origins and Application Domains of Machine Learning
Data Science
Artificial Intelligence
Signal Processing
Machine Learning
Pattern Recognition Machine Learning
Data Mining/Big Data
Data Bases
Origins of ML
Applications of ML
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 5 / 35
ML as Multidisciplinary FieldMachine learning is inherently a multidisciplinary field
artificial intelligence
probability theory, statistics
computational complexity theory
information theory
philosophy
psychology
neurobiology
. . .
ML and Big DataI ML offers useful methods to analyze big/huge data setsI more data does not in geral mean better resultsI learn from relevant data – save material and energy resourcesI see our DFG project Dare2Del
ML and Data ScienceI Data science is an interdisciplin of statistics, databases, information
retrieval, and machine learning
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 6 / 35
Human Learning
Learning is crucial for flexibility and adaptability (survival in changingenvironments)
Learning is improvement of knowledge and skills from experience (incontrast to maturation)
Types of learningI Rote learning, learning by heartI Concept learning (cat, table, save behavior)I Skill acquisition (riding a bike, playing piano)I Strategy learning (making a mathematical proof, writing a computer
program, planning a birthday party)I Meta-learning (select a suitable learning stratregy, evaluate own
performance)
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 7 / 35
What is Machine Learning?
How can we make computers to learn from experience?
A computer program is said to learn from experience E with respect tosome class of tasks T and performance measure P, if its performance attasks in T , as measured by P, improves with experience E .
(Tom Mitchell, 1997)
Experience: Data represented as feature sets (e.g. demographicaldata and medical history); images (bitmaps); moves in a game;observations (e.g., sequences of events); . . .
Task: Learn to classify risk/no risk of getting some illness; learn torecognize objects (cat); learn to play chess; learn to predict next event
Performance measure: Typically some accuracy measure (percentcorrect estimate for unseen data!), Michie: also comprehensibilityand ‘operational effectiveness’ of learned models for humans
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 8 / 35
Merit of Machine Learning
Great practical value in many application domains
Data Mining: large databases may contain valuable implicitregularities that can be discovered automatically (outcomes ofmedical treatments, consumer preferences)
Poorly understood domains where humans might not have theknowledge needed to develop efficient algorithms (human facerecognition from images)
Domains where the program must dynamically adapt to changingconditions (controlling manufacturing processes under changingsupply stocks)
Autonomous Systems (robots, cars)
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 9 / 35
Types of Machine Learning
Supervised: preclassified training datalearn a mapping function f : X → Y(machine learning is function approximation)
Unsupervised: find groups in collection of data based on somesimilarity measure (Clustering)
Policy Learning (Reinforcement Learning, Inductive Programming)
Types of Tasks
Classification: Y is two-valued (concept learning)Y is a finite set of discrete values (categories)
Regression: Y is a metric value
Generating actions
Reinforcement Learning is incremental learning, other approaches aretypically batch learning.
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 10 / 35
Concept Learning
Humans can learning some types of concepts from very few examplese.g., regular past tense
Can machines learning also from very few examples?Yes, some of the classic, symbol-level approaches
Josh Tenenbaumhttp://pinouchon.github.
io/images/tufa.png
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 11 / 35
Human Concept Learning
Human concept acquisition is very powerful:I fast and implicit identification of relevant features and relationsI flexible classification in different contextsI generalization from few (sometimes only one) example
This same ability lies behind stereotyping, that is, unjustifiedovergeneralizations!(women cannot park cars, men cannot listen, Scots are stingy)↪→ see inductive bias
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 12 / 35
Learning as Function Approximation
http://en.proft.me/2015/12/24/types-machine-learning-algorithms/
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 14 / 35
Knowledge-based vs. Learning Systems
Knowledge-based Systems: Acquisition and modeling of common-senseknowledge and expert knowledge
⇒ limited to given knowledge base and rule set⇒ Inference: Deduction generates no new knowledge but
makes implicitly given knowledge explicit⇒ Top-Down: from rules to facts
Learning Systems: Extraction of knowledge and rules fromexamples/experience
Teach the system vs. program the systemLearning as inductive process
⇒ Bottom-Up: from facts to rules
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 15 / 35
Learning as Approach to Overcome the KnowledgeAcquisition Bottleneck
(Feigenbaum, 1983)
Break-through in computer chess withDeep Blue:Evaluation function of chess grandmasterJoel Benjamin. Deep Blue cannot changethe evaluation function by itself!
Experts are often not able to verbalizetheir special knowledge.⇒ Indirect methods:Extraction of knowledge from expertbehavior in example situations(diagnosis of X-rays, controlling achemical plant, ...)
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 16 / 35
Learning as InductionDeduction InductionAll humans are mortal. (Axiom) Socrates is human. (Background K.)Socrates is human. (Fact) Socrates is mortal. (Observation(s))
Conclusion: Generalization:Socrates is mortal. All humans are mortal.
Mortal Beings
Humans
* Socrates
Induction generates hypothesesnot knowledge!
Deduction: from general to specific ⇒ proven correctness
Induction: from specific to general ⇒ (unproven) knowledge gain
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 17 / 35
Epistemological Problems
⇒ pragmatic solutions
Confirmation Theory:A hypothesis obtained by generalization gets supported by newobservations (not proven!). (Nelson Goodman)
Grue Paradox :All emeralds are grue.Something is grue, if itis green before a futuretime t and bluethereafter.⇒ Not learnable fromexamples!
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 18 / 35
Inductive Learning Hypothesis
Inductive learning is not proven correct
The learning task is to determine a hypothesis h ∈ H identical to thetarget concept c for all possible instances in instance space X
(∀x ∈ X )[h(x) = c(x)]
Only training examples D ⊂ X are available
Inductive algorithms can at best guarantee that the output hypothesish fits the target concept over D
(∀x ∈ D)[h(x) = c(x)]
Inductive Learning Hypothesis: Any hypothesis found toapproximate the target concept well over a sufficiently large set oftraining examples will also approximate the target function well overother unobserved examples
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 19 / 35
Performance Evaluation
How good is the learned hypothesis?
It is not relevant how good a learned model performes on the trainingexamples but how good it performes on unseen instances!
Typical approach: k-fold cross validation to detect possible overfitting
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 20 / 35
There is no Bias-free Learning
Generalization learning is only possible, if the learning system has aninductive bias.
I Restriction/language bias: not every model can be expressed by thegiven hypothesis language
I Preference/search bias: Typically learning algorithms are based on agreedy search stratregy in the hypothesis space; the bias directs searchand influences which model is learned
A problem independent of the selected ML algorithm is the samplingbias: how representative are the training data for the (infinite) set ofall possible instances of the domain
Time and again, research has shown that the machines we build reflect how we see the world,
whether consciously or not. For artificial intelligence that reads text, that might mean
associating the word “doctor” with men more than women, or image-recognition algorithms that
misclassify black people as gorillas. (August 28, 2017)
https://qz.com/1064035/
google-goog-explains-how-artificial-intelligence-becomes-biased-against-women-and-minorities/
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 21 / 35
Concept and Classification Learning
Concept learning:
Objects are clustered in concepts.Extensional:
(infinite) set X of all exemplarsIntentional: finite characterization
T = {x | has-3/4-legs(x), has-top(x)}
Construction of a finite characterization from a subset of examples inX (“training set” D).
h : X → {0, 1} c(x) ∈ {0, 1}
Natural extended to classes:
Identification of relevant attributes and their interrelation, whichcharacterize an object as member of a class.
h : X → K c(x) ∈ {k1, . . . , kn}
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 22 / 35
Constituents of Classification Learning
A set of training examples D ⊂ XEach example is represented by an n-ary feature vector x ∈ Xand associated with a class c(x) ∈ K : 〈x , c(x)〉A learning algorithm constructing a hypothesis h ∈ H
A set of new objects, also represented by feature vectors which canbe classified according to h
Examples for features and values
Sky ∈ {sunny, rainy} ← categorial, binary
AirTemp ∈ {warm, cold}Humidity ∈ {normal, high}Outlook ∈ {sunny, cloudy, rainy} ← categorial, arbitrary number of values
WaterTemp ∈ [4, . . . , 40] ← numerical (natural, integer, real)
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 23 / 35
Examples of Concept Learning
↪→ can be found in the UCL Machine Learning Repository
Risk of cardiac arrest yes/no, given medical data
Credit-worthiness of customer yes/no, given personal andcustomer data
Safe chemical process yes/no, given physical and chemicalmeasurements
Days on which our friend Aldo can be found on the tennis courtenjoying his favorite game (yes/no), given weather conditions(running example – playtennis)
Generalization of pre-classified example data, applicationfor prognosis
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 24 / 35
Learning Problems which are not Concept Learning
Handwriting recognition
Play checkers
Robot driving
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 25 / 35
Designing a Learning System
1 Choosing the Training ExperienceI direct or indirect feedbackI degree to which the learner controls the sequence of training examplesI representativity of the distribution of the training examples⇒ significant impact on success or failure
2 Choosing the Target FunctionI determine what type of knowledge will be learnedI most obvious form is some kind of combination of feature values which
can be associated with a class (word/letter)
3 Choosing a Representation for the Target FunctionI e.g., a large table, a set of rules, a linear function, an arbitrary function
4 Choosing a Learning AlgorithmI Decision Tree, Multi-Layer Perceptron, . . .
5 Presenting Training ExamplesI all at onceI incrementally
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 26 / 35
Notation
Instance Space X : set of all possible examples over which theconcept is defined (possibly attribute vectors)
Target Concept c : X → {0, 1}: concept or function to be learnedTarget Class c : X → {k1, . . . , kn}
Training Example x ∈ X of the form < x , c(x) >
Training Set D: set of all available training examples
Hypothesis Space H: set of all possible hypotheses according tothe hypothesis language
Hypothesis h ∈ H: boolean valued function of the formX → {0, 1} or X → K
⇒ the goal is to find a h ∈ H, such that (∀x ∈ X )[h(x) = c(x)]
An induced hypothesis is often also called a model.
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 27 / 35
Hypothesis Language
H is determined by the predefined language in which hypotheses canbe formulated
e.g.: Conjunctions of feature valuesvs. Disjunction of conjunctionsvs. Matrix of real numbersvs. Horn clauses...
Hypothesis language and learning algorithm are highly interdependent
Each hypothesis language implies a bias!
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 28 / 35
Properties of Hypotheses
general-to-specific ordering
I naturally occurring order over H
I learning algorithms can be designed to search H exhaustively withoutexplicitly enumerating each hypothesis h
I hi is more general or equal to hk (written hi ≥g hk)
⇔ (∀x ∈ X )[(hk(x) = 1)→ (hi (x) = 1)]
I hi is (strictly) more general to hk (written hi >g hk)
⇔ (hi ≥g hk) ∧ (hk �g hi )
I ≥g defines a partial ordering over the Hypothesis Space H
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 29 / 35
Running Example – playtennis
example target concept Enjoy :“days on which Aldo enjoys his favorite sport”
set of example days D, each represented by a set of attributes
Example Sky AirTemp Humidity Wind Water Forecast Enjoy1 Sunny Warm Normal Strong Warm Same Yes2 Sunny Warm High Strong Warm Same Yes3 Rainy Cold High Strong Warm Change No4 Sunny Warm High Strong Cool Change Yes
the task is to learn to predict the value of Enjoy for an arbitrary day,based on the values of its other attributes
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 30 / 35
Properties of Hypotheses - Example
h1 = Aldo loves playing Tennis if the sky is sunnyh2 = Aldo loves playing Tennis if the water is warmh3 = Aldo loves playing Tennis if the sky is sunny and the water is warm
⇒ h1 >g h3, h2 >g h3, h2 6>g h1, h1 6>g h2
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 31 / 35
Properties of Hypotheses
consistency
I a hypothesis h is consistent with a set of training examples D iffh(x) = c(x) for each example < x , c(x) > in D
Consistent(h,D) ≡ (∀ < x , c(x) >∈ D)[h(x) = c(x)]
I that is, every example in D is classified correctly by the hypothesis
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 32 / 35
Properties of Hypotheses - Example
h1 is consistent with D
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 33 / 35
Learning Involves Search
Searching through a space of possible hypotheses to find thehypothesis that best fits the available training examples and otherprior constraints or knowledge
Different learning methods search different hypothesis spaces
Learning methods can be characterized by the conditions under whichthese search methods converge toward an “optimal” hypothesis
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 34 / 35
Summary
Machine learning (ML) is automated knowledge acquisition and improvement
Typically, ML is a process of inductive reasoning. In contrast to deductive knowledgeextraction, ML means acquistion of new, generalized, hypothetical knowledge from sampleexperience.
The inductive learning hypothesis states that if a hypothesis approximates a targetconcept reasonably well over the training examples, it will also work reasonably well overunobserved examples.
Concept learning is a special case of classification learning with only two classes (belongsto concept/does not belong to concept).
Important concepts of ML are: Instance space and hypothesis space, training set andtarget class.
Some hypothesis languages allow a general-to-specific ordering of hypotheses.
A hypothesis is called consistent with a training set if all examples can be classifiedcorrectly (in many cases, we do not want to learn such overfitting hypotheses, as we willdiscuss later).
In general, ML can be characterized as search in hypothesis space.
U. Schmid (CogSys, UniBA) ML-1-Introduction November 4, 2020 35 / 35