characters and unicode java data types. unicode characters, like the letters of the alphabet and...
TRANSCRIPT
CHARACTERS AND UNICODE
Java Data Types
Unicode
Characters, like the letters of the alphabet and other printable symbols, are represented internally in the computer by a numerical code.
The coding system that Java relies on for characters is called Unicode.
Unicode assigns an integer value to each separate printable character.
Unicode is based in part on a previous coding system known as ASCII.
Unicode
Though Unicode handles thousands of characters, this will only cover the first 128 codes. English alphabet Keyboard symbols Codes that don’t represent printable characters
Control codes
The first 32 codes are non-printable control codes.
They are referred to as control codes because they originated as instructions for communicating with or controlling a device.
Most of these codes are of no use to the programmer.
But there are some, such as the line feed and carriage return codes, which may be useful for printed output.
Control codes
System.out.print(“\n”);The escape sequence “\n” causes the pair of
control codes, line feed and carriage return, to be generated. Similarly, “\t” causes a horizontal tab to be generated.
Although not literally printable, these characters can affect the appearance of the output.
Control codes
Code Name Description
16 DLE Data link escape
17 DC1 Device control 1
18 DC2 Device control 2
19 DC3 Device control 3
20 DC4 Device control 4
21 NAK Negative acknowledge
22 SYN Synchronous idle
23 ETB End of transm. Block
24 CAN Cancel
25 EM End of medium
26 SUB Substitute
27 ESC Escape
28 FS File separator
29 GS Group separator
30 RS Record separator
31 US Unit separator
Code Name Description
0 NUL Null
1 SOH Start of Heading
2 STX Start of text
3 ETX End of text
4 EOT End of transmission
5 ENQ Enquiry
6 ACK Acknowledge
7 BEL Bell
8 BS Backspace
9 HT Horizontal tab
10 LF Line feed
11 VT Vertical tab
12 FF Form feed
13 CR Carriage return
14 SO Shift out
15 SI Shift in
Unicode characters
The Unicode values from 32 to 127 are printable, except for code 127, which stands for deletion.
The letters of the alphabet, small and capital, and the various punctuation marks are arranged as they are for historical reasons.
The sort order for characters or text items is based on their Unicode values.
Unicode characters
New data type: char
We can now consider the character type, char.‘char’ is effectively an integer type.It is possible to cast between something typed
char and something typed int.The underlying value is always an integer.The type (int or char) determines whether the
integer value or the associated Unicode character is displayed.
The following program illustrates the idea that an integer can be cast to a character and printed out as such.
Char
public class UnicodeChars{
public static void main(String[] args){
char myChar;int i;i = 65;myChar = (char) i;System.out.println(myChar); //Prints A
}}
Char
In this program, the char type takes the integer value of 65, and finds the character associated with it, capital letter A.
It isn’t always necessary to use a number to assign a value to a char type. You can directly set the character you want. myChar = ‘A’; Note that we use single quotes instead of double
quotes to denote a single character value. Double quotes signify a string, while single quotes signify a character.
Finding the integer value of a char
It is also possible to find the integer value of a char type, by casting it to an int, as shown below: Char myChar = ‘A’;int i = (int) myChar;System.out.println(i); //Prints 65
Lab
Prepare yourselves! You are now ready for questions 39 to 42 on the assignment sheet. The first couple of questions consists of determining a
character from an integer, and an integer from a character.
The last couple will involve taking an integer/char type, and turning it into the other type.