probabilistic modeling incomplete history in 0predrag/classes/2015springb555/s2.pdf · • babbage...

19
PROBABILISTIC MODELING INCOMPLETE HISTORY IN 0.5H Predrag Radivojac, Indiana University

Upload: vuonghuong

Post on 14-May-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

PROBABILISTIC MODELINGINCOMPLETE HISTORY IN 0.5H

Predrag Radivojac, Indiana University

CONFLUENCE OF THREE DISCIPLINES

Probability Theory• mathematical infrastructure for manipulating probabilities, grounded in

axioms of probability• wide choice of probabilistic models with well-understood theoretical

properties

Statistics• formulation of the process of narrowing down the solutions based on

observed data and experience (knowledge and assumptions)• leads to selection of optimal or acceptable models with respect to data and

experience• formalizes assessing confidence about those solutions

Computer Science• provides theory, algorithms, software to manage data and compute

solutions• formalizes studies of the tradeoffs between quality of solutions and available

resources (time, space, computer architecture)

PROBABILITY THEORY

Blaise Pascal (1623‐1662) Pierre de Fermat (1601‐1665)

The Problem of Points• Old game, already known in the 15th century• Introduced to Pascal by Chevalier de Mere, probably in 1654• Two players agree to toss a coin until someone wins n times. The bets are placed. They 

play, but the game is interrupted. How should they split the money so it is fair?• What is the solution?

Fermat to Pascal, August 29, 1654

Monsieur,Our interchange of blows still continues, and Iam well pleased that our thoughts are in suchcomplete adjustment as it seems since theyhave taken the same direction and followedthe same road...

Both solved the problem, but in different ways. Fermat’s approach was combinatorial. Pascal introduces an expectation function.

PROBABILITY THEORY

Jacob Bernoulli (1665‐1705)

Important quotes from Ars Conjectandi:

• “Probability, indeed, is degree of certainty, and differs from the latter as a part differs from the whole. Truly, if complete and absolute certainty, which we represent by the letter a or by 1... “

• “To predict something is to measure its probability. Therefore, we define the science of prediction or stochastics, as the art of measuring probabilities of things as accurately as possible, to the end that, in judgments and actions, we may always choose or follow that which has been found to be better, more satisfactory, safer, or more carefully considered. On this alone turns all the wisdom of the philosopher and all the practical judgment of the statesman.”

The Art of Conjecturing, 1713

Ars Conjectandi

• Discussed probability• Introduced subjective notion of 

probability• Introduced “Bernoulli trials”• Proved weak law of large numbers• Introduced “science of prediction”

Translation from: Encyclopedia Stochastikon

PROBABILITY THEORY

Abraham de Moivre (1667‐1754) The Doctrine of Chances, 1738

The Doctrine of Chances

• Introduced the concept of a normal distribution

• Showed that normal distribution is a limit of the binomial

• Gave the first take of the Central Limit Theorem (proved by Laplace)

Shafer, Vovk. The sources of Kolmogorov’s Grundbegriffe. Statistical Science (2006) 21(1): 70-98

If A and B cannot both happen, then:

PROBABILITY THEORY

Shafer, Vovk. The sources of Kolmogorov’s Grundbegriffe. Statistical Science (2006) 21(1): 70-98

• Bernoulli: “A run of a hundred [heads] may be metaphysically possible, but it is physically impossible. It has never happened and never will happen.”

• Probable: probability exceeds half of certainty, e.g. P(A) > 1/2• Possible: event has a low degree of certainty, e.g. P(A) > 1/20 or 1/30

19th century• Boltzmann’s second law of thermodynamics claims that a dissipative processes are 

irreversible because the probability of a state with entropy far from the maximum is vanishingly small

• Major players in France, Germany, Russia, Britain (Borel, Frechet, Levy, Hadamard, Lebesgue, Gauss, Reimann, von Kries, Ellis, Venn, Kolmogorov, Markov)

20th century• Lack of clarity and rigor in the probability calculus; Henri Poincare said “one can hardly 

give a satisfactory definition of probability”• David Hilbert’s 6th of 23 open problems presented at International Congress of 

Mathematics in Paris (1900) was to treat probability axiomatically

PROBABILITY THEORY

Andrey Kolmogorov(1903‐1987)

Grundbegriffe der Wahrscheinlich‐keitsrechnung, 1933

Grundbegriffe der Wahrscheinlichkeitsrechnung

• Introduced axioms of probability that stood the test of time

Maurice Frechet, 1938, introduced him at the colloquium at University of Geneva with these words

It was at the moment when Mr. Borel introduced thisnew kind of additivity into the calculus of probability –in 1909, that is to say – that all the elements needed toformulate explicitly the whole body of axioms of(modernized classical) probability theory cametogether.It was not enough to have all the ideas in mind, to

recall them now and then; one must make sure thattheir totality is sufficient, bring them together explicitly,and take responsibility for saying that nothing furtheris needed in order to construct the theory.

This is exactly what Mr. Kolmogorov did. This is hisachievement. (And we do not believe he wanted toclaim any others, so far as the axiomatic theory isconcerned)

Shafer, Vovk. The sources of Kolmogorov’s Grundbegriffe. Statistical Science (2006) 21(1): 70-98

STATISTICS

Graunt, John. Natural and political observations mentioned in a following Index, and made upon the Bills of Mortality, 1665.

John Graunt (1620‐1674)Natural and Political Obser‐vations Made upon the Bills of Mortality, 1662 (1663)

Graunt discussed:

• trustworthiness of the data in the “bills” published over a 60‐year period

• description of mortality due to plague, including “imputation” of missing data

• detailed description and analysis of the gender ratio, discovered stability

• provided a “life table” in order to answer question on how many men of fighting age live in London

Follow ups on Graunt’s work:

• John Arbuthnot tested the hypothesis that the ratio of men vs. women was 1• Christian Huygens calculated the expected and median lifetime

STATISTICS

Fienberg. A brief history of statistics in thee and one-half chapters: a review essay. Statistical Science (1992) 7(2): 208-225.

STATISTICS

Fienberg. A brief history of statistics in thee and one-half chapters: a review essay. Statistical Science (1992) 7(2): 208-225.

STATISTICS

Thomas Bayes (1701‐1761)

Pierre‐Simon Laplace (1749‐1827)

Inverse Probability:

• Both Bayes and Laplace understood it, but Bayes died before publishing his work

• Bayes was first, Laplace went further –he was only 25 when he repeated Bayes’s work

• In the “Bayesian” sense, both used uniform priors

Laplace’s demon:

• “We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes.“

An Essay towards solving a Problem in the Doctrine of Chances, 1763

STATISTICS

Carl Friedrich Gauss (1777‐1855)

The Method of Least Squares:

• search for a statistical approach for combining observations• The data was most frequently observations of planetary 

positions, orbits, geodesic arcs• Laplace’s approach was ad hoc, people tried to improve it• Legendre comes up with the method of least squares in 

1805, but quantification of uncertainty was missing• Gauss used normally distributed error terms for a system of 

linear equations and then maximized posterior distribution and showed this was the same as Legendre’s method

• Laplace recognized that a normal distribution is important in itself and proved the central limit theorem

• “Gauss‐Laplace synthesis” is the foundation of modern statistics

But, the Bayesian statistics was not the only way:

• In 1778 Daniel Bernoulli (nephew of Jacob, son of Johann) proposed a method that can be considered a sketch of the maximum likelihood method

• Leonhard Euler wrote an appended commentary to the article criticizing this method as “arbitrary”; ML gets forgotten in light of success of Laplace and Gauss

• Maximum likelihood was later rediscovered in the 20th century, popularized by Fisher

STATISTICS

Fienberg. A brief history of statistics in thee and one-half chapters: a review essay. Statistical Science (1992) 7(2): 208-225.

by Adolphe Quetelet

STATISTICS

Fienberg. A brief history of statistics in thee and one-half chapters: a review essay. Statistical Science (1992) 7(2): 208-225.

COMPUTER SCIENCE

George Boole (1815‐1864)Gottfried Wilhelm Leibniz (1646‐1716)

Binary system as basis for computing:• Leibnitz: development of formal logic; advocated binary system for performing 

calculations (0‐1 or on‐off) • George Boole: published “Boolean algebra” in 1854

Prehistory:

• people wanted to have devices for counting since forever (e.g. abacus)

• But, those were not general purpose computing machines

COMPUTER SCIENCE

Ada King (1815‐1852), Countess of Lovelace

Charles and Ada:• Babbage had an idea to construct a machine that can compute anything (called it “Analytical Engine”)• Ada Lovelace constructed the first program to compute Bernoulli numbers on the analytical engine. 

Considered the first programmer.• Computer language Ada named after Ada Lovelace

Charles Babbage (1791‐1871)

Mechanical devices:

• Pascal: built the first mechanical adding machine in 1642 (apparently described by Hero of Alexandria)

• Babbage: began constructing a “difference machine” in 1822

Part of difference machine assembled by Babbage’s son. Actual difference machine constructed from Babbage’s design.

COMPUTER SCIENCE

David Hilbert (1862‐1943)

Computability:• Turing machine: represents a computing machine, can do 

what any other computing machine can; the machine can compute any function that can be expressed as an algorithm (Church‐Turing thesis)

• Recursion, lambda calculus and Turing machines are equivalent in terms of representing a class of functions

Hilbert’s  Entscheidungsproblem:

• Is mathematics decidable? Is there a mechanical method that can be applied to any mathematical assertion and eventually tell whether that assertion is true or false?

Alan Turing (1912‐1954)

Kurt Godel (“Her Warum”):• Limits of what could be proved and disproved; Recursion

Alan Turing:• Halting problem is undecidable• There is no solution to the “decision 

problem”

Alonzo Church:• Lambda calculus

COMPUTER SCIENCE

Von Neumann architecture:• General purpose computing 

architecture• Keeps code and data in the same 

memory

John von Neumann (1903‐1957)

von Neumann architecture

Konrad Zuse (1910‐1995)

• Turing‐complete electromechanical computer, Z3 (1941)

ENIAC:• Electronic Numerical Integrator And 

Computer• conceived and designed at UPenn

Wikipedia

THANK YOU!