info rm atics and computing information and uncertainty
TRANSCRIPT
Informatics and computing
Manipulating symbols Last class
Typology of signs Sign systems Symbols
Tremendously important distinctions for informatics and computational sciences Computation = symbol manipulation Symbols can be manipulated without reference to content (syntactically), due to
the arbitrary nature of convention Allows computers to operate! All signs rely on a certain amount of convention, as all signs have a pragmatic
(social) dimension, but symbols are the only signs which require exclusively a social convention, or code, to be understood.
Informatics and computing
Symbol manipulation
Some have meaning (in some language) The relation between symbols and meaning is arbitrary Example: cut-up method for generating poetry pioneered by Brion Gysin and William Burroughs and often used by artists such as David Bowie, or use of samples in electronic music
aedl:
adel adle aedl aeld alde aled dael dale deal dela dlae dlea eadl eald edal edla elad elda lade laed ldae ldea lead leda
4! Permutations:4 x 3 x 2 x 1 = 24
Informatics and computing
Information theory“The mathematical theory of communication”, Claude Shannon (1948)
Efficiency of information transmission in electronic channels
Key concept: information quantity that can be measured unequivocally (objectively)
Does not deal at all with the subjective aspects of information semantics and pragmatics. Information is defined as a quantity that depends on symbol manipulation alone
Informatics and computing
What’s an information quantity?How to quantify a relation?
Information is a relation between an agent, a sign and a thing, rather than simply a thing.
The most palpable element in the information relation is the sign, symbols
But which symbols do we use to quantify the information contained in messages?• Several symbol systems can be used to convey the same message• We must agree on the same symbol system for all messages!
Informatics and computing
What’s an information quantity?Both sender and receiver must use the same code, or convention, to encode and decode symbols from and to messages. • We need to fix the language used for communication• Set of symbols allowed (an alphabet)• The rules to manipulate symbols (syntax)• The meaning of the symbols (semantics).
A language specifies the universe of all possible messages = Set of all possible symbol strings of a given size.
Shannon Information is thus defined as “a measure of the freedom from choice with which a message is selected from the set of all possible messages”
DEAL
DELA
DLAE
DLEA
DAEL
DALE
EALD
EADL
ELAD
ELDA
EDLA
EDAL
ALDE
ALED
ADLE
ADEL
AELD
AEDL
LDEA
LDAE
LEAD
LEDA
LADE
LAED
DEAL is 1 out of 4! = 4×3×2×1 = 24 choices.
Informatics and computing
What’s an information quantity?Information is defined as “a measure of the freedom from choice with which a message is selected from the set of all possible messages”
Bit (short for binary digit) is the most elementary choice one can make between two items: “0’ and “1”, “heads” or “tails”, “true” or “false”, etc.
Bit is equivalent to the choice between two equally likely choices.
Example, if we know that a coin is to be tossed, but are unable to see it as it falls, a message telling whether the coin came up heads or tails gives us one bit of information
Informatics and computing
Decision-makingDecision-making:• Perhaps the most fundamental capability of human beings• Decision always implies uncertainty• Choice• Lack of information, randomness, noise, Error
“The highest manifestation of life consists in this: that a being governs its own actions. A thing which is always subject to the direction of another is somewhat of a dead thing. ”“A man has free choice to the extent that he is rational.” (St. Thomas Aquinas)
“In a predestinate world, decision would be illusory; in a world of perfect foreknowledge, empty; in a world without natural order, powerless. Our intuitive attitude to life implies non-illusory, non-empty, non-powerless decision… Since decision in this sense excludes both perfect foresight and anarchy in nature, it must be defined as choice in face of bounded uncertainty” (George Shackle)
Informatics and computing
Uncertainty-based information: original contributions
Information is transmitted through noisy communication channels: Ralph Hartley and Claude Shannon (at Bell Labs), the fathers of Information Theory, worked on the problem of efficiently transmitting information; i.e. decreasing the uncertainty in the transmission of information.
Hartley, R.V.L., "Transmission of Information", Bell System Technical Journal, July 1928, p.535.
C. E. Shannon, ``A mathematical theory of communication,'' Bell System Technical Journal, vol. 27, pp. 379-423 and 623-656, July and October, 1948.
Informatics and computing
Choices: multiplication principle• “If some choice can be made in M different ways, and some subsequent choice can be made
in N different ways, then there are M x N different ways these choices can be made in succession” [Paulos]
• 3 shirts and 4 pants = 3 x 4 = 12 outfit choices
Informatics and computing
Hartley uncertainty• Nonspecificity: Hartley measure• The amount of uncertainty associated with a set of alternatives (e.g. messages) is measured
by the amount of information needed to remove the uncertainty• A type of ambiguity
Quantifies how many yes-no questions need to be asked to establish what the correct alternative is
AAH 2log)(
Number of Choices
Measured in bits
A = Set of Alternatives
xn
x3
x2
x1
B
Informatics and computing
Hartley uncertainty
AAH 2log)(
Number of Choices
Measured in bits
Quantifies how many yes-no questions need to be asked to establish what the correct alternative is
A
Menu Choices A = 16 Entrees B = 4 Desserts
How many dinner combinations? 16 x 4 = 64
H(AxB) = log2(16x4) = log2(16)+log2(4) = 4+2 = 6
AxB
Informatics and computing
Hartley uncertainty: decision trees
AAH 2log)(
Number of ChoicesMeasured in bits
Informatics and computing
What about probability?Some alternatives may be more probable than others!
A different type of ambiguityHigher frequency alternatives: less information required
Measured by Shannon’s entropy measureThe amount of uncertainty associated with a set of alternatives (e.g. messages) is measured by the average amount of information needed to remove the uncertainty
Probability distribution of letters in English text (Orwell’s 1984 in fact):
Informatics and computing
Shannon’s entropy
Probability of alternativeMeasured in bits
A = Set of weighted Alternatives
xn
x3
x2
x1
Shannon’s measureThe average amount of uncertainty associated with a set of weighted alternatives (e.g. messages) is measured by the average amount of information needed to remove the uncertainty
Informatics and computing
Entropy of a messageMessage encoded in an alphabet of n symbols, for example:English = 26 characters + spaceMore code = dots, dashes and spacesDNA: A, T, G, C
Informatics and computing
What it measuresmissing information, how much information is needed to establish what the symbol is, or• uncertainty about what the symbol is, or• on average, how many yes-no questions need to be asked to
establish what the symbol is.
One alternative
Uniform distribution
Informatics and computing
Example: Morse code
1) All dots: p1 = 1, p2 = p3 = 0.Take any symbol – it’s a dot; no uncertainty, no question needed, no missing information, HS = -1.log2(1) = 0.
2) 50-50 dots and dashes: p1 = p2 = 1/2, p3 = 0.Given the probabilities, need to ask one questionone piece of missing informationHS = -(1/2.log2(1/2) + 1/2.log2(1/2) ) = -1.log2(1/2) = - (log2(1) - log2(2)) = log2(2) = 1 bit
3) Uniform: all symbols equally likely, p1 = p2 = p3 = 1/3.Given the probabilities, need to ask as many as 2 questions - 2 pieces of missing information, HS = - log2(1/3) = - (log2(1) - log2(3)) = log2(3) = 1.59 bits
Informatics and computing
Bits, entropy and Huffman codesGiven a symbol set {A,B,C,D,E}And occurrence probabilities PA, PB, PC, PD, PE, The Shannon entropy then corresponds to:The average minimum number of bits needed to represent a symbol
Huffman coding: variable length coding for messages whose symbols have variable frequencies that minimizes number of bits per symbol?
Coding:H = -(0.250*log2(0.250)+
0.375*log2(0.375)+
0.167*log2(0.167)+ 0.125*log2(0.125)+ 0.083*log2(0.083)) =
2.135
Huffman code: #bits per symbol=0.375 * 1+0.250 * 2+0.167 * 3+0.125 * 4+0.083 * 4= 2.208
Informatics and computing
Critique of Shannon’s communication theory
•The entropy formula as a measure of information is arbitrary• Shannon’s theory measures quantities of information, but it does not consider information content• In Shannon’s theory, the semantic aspects of information are irrelevant to the engineering problem
Informatics and computing
Other forms of uncertainty•Vagueness or fuzziness•Simultaneously being “True” and “False”•Fuzzy Logic and Fuzzy Set Theory
Informatics and computing
From crisp to fuzzy sets•Fuzziness: Being and Not Being•Laws of Contradiction and Excluded Middle are Broken
Set of all People
Tall People
1
Set of all People
Tall People
1XAA
AA
0
Informatics and computing
Papers:
1) boyd, danah and Crawford, Kate, Six Provocations for Big Data (September 21, 2011). A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society, September 2011R.
2)Ryuji Suzuki, John R. Buck and Peter L. Tyack (2006) Information entropy of humpback whale songs, J. Acoust. Soc. Am, 199(3), March
3)David A. Huffman (1952). A method for the construction of Minimum-Redundancy Codes, in Proceedings of the I.R.E, September.
This week’s discussion