![Page 1: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/1.jpg)
LING 408/508: Programming for Linguists
Lecture 2 August 26th
![Page 2: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/2.jpg)
Today’s Topics
• con$nuing on from last $me … • Homework 1
![Page 3: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/3.jpg)
Adminstrivia • No class on
– Monday September 7th (Labor Day) – Wednesday November 11th (Veterans Day) – Week a5er September 11th (out of town), plus Monday 21st – Monday October 12th
![Page 4: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/4.jpg)
IntroducTon: data types • what if you want to store even larger numbers than 32 bits? – Binary Coded Decimal (BCD) – 1 byte can code two digits (0-‐9 requires 4 bits) – 1 nibble (4 bits) codes the sign (+/-‐), e.g. hex C/D 23 22 21 20
0 0 0 0
23 22 21 20
0 0 0 1
23 22 21 20
1 0 0 1
0
1
9
2 0 1 4
2 bytes (= 4 nibbles)
+ 2 0 1 4
2.5 bytes (= 5 nibbles)
23 22 21 20
1 1 0 0 C 23 22 21 20
1 1 0 1 D credit (+) debit (-‐)
![Page 5: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/5.jpg)
IntroducTon: data types
• Typically, 64 bits (8 bytes) are used to represent floaTng point numbers (double precision) – c = 2.99792458 x 108 (m/s) – coefficient: 52 bits (implied 1, therefore treat as 53) – exponent: 11 bits (usually not 2’s complement, unsigned with bias 2(10-‐1)-‐1 = 511)
– sign: 1 bit (+/-‐)
C: float double
wikipedia
x86 CPUs have a built-‐in floaTng point coprocessor (x87) 80 bit long registers
e.g. probabiliTes
![Page 6: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/6.jpg)
IntroducTon: data types
• Next Tme, we'll talk about the representaTon of characters (leeers, symbols, etc.)
![Page 7: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/7.jpg)
Example 1
• Recall the speed of light: • c = 2.99792458 x 108 (m/s)
1. Can a 4 byte integer be used to represent c exactly? – 4 bytes = 32 bits – 32 bits in 2’s complement format – Largest posiTve number is – 231-‐1 = 2,147,483,647 – c = 299,792,458
![Page 8: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/8.jpg)
Example 2
• Recall the speed of light: • c = 2.99792458 x 108 (m/s)
2. How much memory would you need to encode c using BCD notaTon? – 9 digits – each digit requires 4 bits (a nibble) – BCD notaTon includes a sign nibble – total is 5 bytes
![Page 9: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/9.jpg)
Example 3
• Recall the speed of light: • c = 2.99792458 x 108 (m/s)
3. Can the 64 bit floaTng point representaTon (double) encode c without loss of precision? – Recall significand precision: 53 bits (52 explicitly stored)
– 253-‐1 = 9,007,199,254,740,991 – almost 16 digits
![Page 10: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/10.jpg)
Example 4 • Recall the speed of light: • c = 2.99792458 x 108 (m/s)
• The 32 bit floaTng point representaTon (float) – someTmes called single precision -‐ is composed of 1 bit sign, 8 bits exponent (unsigned with bias 2(8-‐1)-‐1), and 23 bits coefficient (24 bits effecTve).
• Can it represent c without loss of precision? – 224-‐1 = 16,777,215 – Nope
![Page 11: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/11.jpg)
Homework 1
• For both soluTons, show your work, i.e. how you derived your answer
• Pi (𝛑) is an irraTonal number – can't be represented precisely!
wikipedia
![Page 12: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/12.jpg)
Homework 1
1. Encode Pi as accurately as possible using both the 64 and 32 bit floaTng point representaTons InstrucBon: draw the diagram and fill in the 1's and 0's
2. How many decimal places of precision is provided by each of the 64 and 32 bit floaTng point representaTons?
![Page 13: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/13.jpg)
Homework 1 Hints • How to encode 1: (bias: 01111 + 0 = 20, frac: 1000… remember: there is an implicit leading 1,
• = 1.000… in binary)
![Page 14: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/14.jpg)
Homework 1 Hints
• How to encode 2: (exp: 10000 = bias 01111 + 1 = 21, frac: 1000…) = 10.00… in binary
![Page 15: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/15.jpg)
Homework 1 Hints
• How to encode 3: (exp: 10000 = bias 01111 + 1 = 21, frac: 1100…) = 11.000… in binary
![Page 16: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/16.jpg)
Homework 1 Hints
• How to encode 4: (exp: 10001 = bias 01111 + 10 = 22, frac: 1000…) = 100.0… in binary
![Page 17: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/17.jpg)
Homework 1 Hints
• How to encode 5: (exp: 10001 = bias 01111 + 10 = 22, frac: 1010…) = 101.0… in binary
![Page 18: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/18.jpg)
Homework 1 Hints
• How to encode 6: (exp: 10001 = bias 01111 + 10 = 22, frac: 1100…) = 110.0… in binary
![Page 19: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/19.jpg)
Homework 1 Hints
• How to encode 7: (exp: 10001 = bias 01111 + 10 = 22, frac: 1110…) = 111.0… in binary
![Page 20: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/20.jpg)
Homework 1 Hints
• How to encode 8: (exp: 10001 = bias 01111 + 100 = 23, frac: 1000…) = 1000.0… in binary
![Page 21: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/21.jpg)
Homework 1 Hints
• Decimal 3.5 is 1.11 x 21 = 11.1 in binary
![Page 22: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/22.jpg)
Homework 1 Hints
• Decimal 3.25 is 1.101 x 21 = 11.01 in binary
![Page 23: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/23.jpg)
Homework 1 Hints
• Decimal 3.125 is 1.1001 x 21 = 11.001 in binary
![Page 24: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/24.jpg)
Homework 1
• Due Friday night – (by midnight in my emailbox)
• Required format (for all homeworks unless otherwise specified): – Plain text or PDF formats only
• (no .doc, .docx etc.) – Single file only – cut and paste into one document
• (no mulTple aeachments) – Subject line: 408/508 Homework 1 – First line: your full name
![Page 25: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/25.jpg)
IntroducTon: data types • How about leeers, punctuaTon, etc.? • ASCII
– American Standard Code for InformaTon Interchange – Based on English alphabet (upper and lower case) + space + digits +
punctuaTon + control (Teletype Model 33) – QuesBon: how many bits do we need? – 7 bits + 1 bit parity – Remember everything is in binary …
C: char
Teletype Model 33 ASR Teleprinter (Wikipedia)
![Page 26: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/26.jpg)
IntroducTon: data types order is important in sorTng!
0-‐9: there’s a connecTon with BCD. NoBce: code 30 (hex) through 39 (hex)
![Page 27: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/27.jpg)
IntroducTon: data types • Parity bit:
– transmission can be noisy – parity bit can be added to ASCII code – can spot single bit transmission errors – even/odd parity:
• receiver understands each byte should be even/odd – Example:
• 0 (zero) is ASCII 30 (hex) = 011000 • even parity: 0110000, odd parity: 0110001
– Checking parity: • Exclusive or (XOR): basic machine instrucTon
– A xor B true if either A or B true but not both – Example:
• (even parity 0) 0110000 xor bit by bit • 0 xor 1 = 1 xor 1 = 0 xor 0 = 0 xor 0 = 0 xor 0 = 0 xor 0 = 0 xor 0 = 0
x86 assemby language: 1. PF: even parity flag set by arithmeTc ops. 2. TEST: AND (don’t store
result), sets PF 3. JP: jump if PF set Example: MOV al,<char> TEST al, al JP <locaTon if even> <go here if odd>
![Page 28: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/28.jpg)
IntroducTon: data types • UTF-‐8
– standard in the post-‐ASCII world – backwards compaTble with ASCII – (previously, different languages had mul$-‐byte character sets that
clashed) – Universal Character Set (UCS) TransformaTon Format 8-‐bits
(Wikipedia)
![Page 29: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/29.jpg)
IntroducTon: data types
• Example: – あ Hiragana leeer A: UTF-‐8: E38182 – Byte 1: E = 1110, 3 = 0011 – Byte 2: 8 = 1000, 1 = 0001 – Byte 3: 8 = 1000, 2 = 0010 – い Hiragana leeer I: UTF-‐8: E38184
Shis-‐JIS (Hex): あ: 82A0 い: 82A2
![Page 30: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/30.jpg)
IntroducTon: data types • How can you tell what encoding your file is using? • DetecTng UTF-‐8
– Microsos: • 1st three bytes in the file is EF BB BF • (not all so=ware understands this; not everybody uses it)
– HTML: • <meta hep-‐equiv="Content-‐Type" content="text/html;charset=UTF-‐8" >
• (not always present) – Analyze the file:
• Find non-‐valid UTF-‐8 sequences: if found, not UTF-‐8… • InteresTng paper:
– hep://www-‐archive.mozilla.org/projects/intl/UniversalCharsetDetecTon.html
![Page 31: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/31.jpg)
IntroducTon: data types • Filesystem:
– different on different computers: some$mes a problem if you mount filesystems across different systems
• Examples: – FAT32 (File AllocaTon Table) DOS, Windows, memory cards – ExFAT (Extended FAT) SD cards (> 4GB files) – NTFS (New Technology File System) Windows – ext4 (Fourth Extended Filesystem) Linux – HFS+ (Hierarchical File System Plus) Macs
limited to 4GB max file size
![Page 32: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/32.jpg)
IntroducTon: data types • Filesystem:
– different on different computers: some$mes a problem if you mount filesystems across different systems
• Files: – Name (Path from / root) – Type (e.g. .docx, .pptx, .pdf, .html, .txt) – Owner (usually the Creator) – Permissions (for the Owner, Group, or Everyone) – need to be opened (to read from or write to) – Mode: read/write/append – Binary/Text in all programming languages:
open command
![Page 33: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%](https://reader033.vdocuments.us/reader033/viewer/2022042021/5e785ba16919b87de44182a0/html5/thumbnails/33.jpg)
IntroducTon: data types • Text files:
– text files have lines: how do we mark the end of a line? – End of line (EOL) control character(s):
• LF 0x0A (Mac/Linux), • CR 0x0D (Old Macs), • CR+LF 0x0D0A (Windows)
– End of file (EOF) control character: • (EOT) 0x04 (aka Control-‐D)
binaryvision.nl
programming languages: NUL used to mark the end of a string