chapter 8 characters and strings
DESCRIPTION
Chapter 8 Characters and Strings. Principle of enumeration. Computers tend to be good at working with numeric data. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/1.jpg)
Chapter 8 Characters and Strings
![Page 2: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/2.jpg)
Principle of enumeration• Computers tend to be good at working with numeric
data.• The ability to represent an integer value, however,
also makes it easy to work with other data types as long as it is possible to represent those types using integers. For types consisting of a finite set of values, the easiest approach is simply to number the elements of the collection.
• Types that are identified by counting off the elements are called enumerated types.
![Page 3: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/3.jpg)
Characters• Computers use the principle of enumeration to represent
character data inside the memory. If you assign an integer to each character, you can use that integer as a code for the character it represents
• Character codes, however, are not particularly useful unless they are standardized.
• The first widely adopted character coding was ASCII: American Standard Code for Information Interchange.
• With only 256 characters, the ASCII system proved inadequate to represent the many alphabets in use throughout the world.
• ASCII has been superseded by Unicode.• Figure 8-1, p. 256, table.
![Page 4: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/4.jpg)
Some notes
• The first thing to remember about the Unicode table is that you don’t actually have to learn the numeric code for the characters. The important observation is that a character has a numeric representation, and not what that representation happens to be.
• A character constant consists of the desired character enclosed in single quotation marks. Thus, the constant ‘A’ in a program indicates the Unicode representation of an upper case A. That it has the value 1018 = 6510 is irrelevant detail.
![Page 5: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/5.jpg)
Important properties
• The codes for the digits 0 through 9 are consecutive. ‘0’ + 9 is ‘9’
• The codes for the uppercase letters A through Z are consecutive; the codes for the lowercase letters a through z are consecutive.
‘a’ + 2 is ‘c’The arithmetic operations can be used with character
values just as with integers.Avoid using integer constants to refer to Unicode
characters .
![Page 6: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/6.jpg)
Special characters• Most of the characters in the Unicode table appear on the
keyboard. They are called printing characters.• The table also includes special characters. They are indicated
in the Unicode table by an escape sequence, which consists of a backslash followed by a character or sequence of digits.\b Backspace\f Form feed (starts a new page)\n Newline (moves to the next line)\r Return (moves to the beginning of the current line)\t Tab (moves to the next tab)\\ Backslash character itself\’ The character ‘\” The character “\ddd The character whose Unicode is the octal number ddd
![Page 7: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/7.jpg)
Conversion
• It is better to make the conversion between int (Unicode) and char (character) explicit by introducing type casts.
ExampleRandomly generate an uppercase letter.
private char randomLetter() { return (char) rgen.nextInt((int) ‘A’, (int) ‘Z’); }
![Page 8: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/8.jpg)
The operations that generally make sense:• Adding an integer to a character (usually a digit).• Subtracting one character from another.
‘a’ – ‘A’ gives the distance between a lowercase letter and its corresponding uppercase letter.
‘M’ + (‘a’ – ‘A’) gives ‘m’This can be used to convert uppercase letters into lowercase
letters.• Comparing two characters
(ch >= ‘a’) && (ch <= ‘z’) is true if ch is a lowercase letter
![Page 9: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/9.jpg)
Useful methods in the character classstatic boolean isDigit(char ch)
static boolean isLetter(char ch)
static boolean isLetterOrDigit(char ch)
static boolean isLowerCase(char ch)
static boolean isUpperCase(char ch)
static boolean isWhitespace (char ch)
static char toLowerCase(char ch)
Static char toUpperCase(char ch)
![Page 10: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/10.jpg)
Strings• Java defines many useful methods that operate on the String
class.• The String class uses the receiver syntax when you call a
method on a string• String class is immutable. None of its methods ever changes
the internal state. Classes that prohibit clients from changing an object’s state is said to be immutable.
• What happens is that these methods return a new string on which the desired changes have been performed.
• To change a string, you can overwrite a string:str = str.toLowerCase();
![Page 11: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/11.jpg)
Strings vs. characters
• Both the String and the Character classes export a toUpperCase method.
• In the Character class, you call toUpperCase as a static method
ch = Character.toUpperCase(ch);• In the String class, you apply toUpperCase to an
existing string str = str.toUpperCase();
![Page 12: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/12.jpg)
Selecting characters from a string
• In Java, positions within a string are numbered starting from 0.
str.charAt(1) gives the second character in str.• A substring can be extracted from a larger string. If a
string variable str contains “hello, world” str.subString(1, 4);
returns “ell”
![Page 13: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/13.jpg)
Comparing strings
• Equality: Use s1.equals(s2) instead of s1 == s2 for equality, since s1 == s2 compares objects s1 and s2 (references) not values (content) of objects.
• Order: Use s1.compareTo(s2). It compares two strings s1 and s2 using the numeric ordering imposed by the underlying character codes (lexicographic order), different from conventional dictionary ordering.
• For characters, c1 < c2, compares the codes of c1 and c2.
Other methods in the String class, Figure 8-4, p. 266.
![Page 14: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/14.jpg)
Searching within a string/** Given a string composed of separate words, this method returns its * acronym. * @param str Given string composed of separate words. * @return The acronym of the given string. * / private String acronym(String str) { String result = str.substring(0,1); /* get the first character */ int pos = str.indexOf(‘ ‘); /* position of the first space */ while (pos != -1) { /* while not the end */ result += str.substring(pos + 1, pos + 2);
/* concat a leter */ pos = str.indexOf(‘ ‘, pos + 1); /* position of next space */ } return result; }
![Page 15: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/15.jpg)
Simple string idioms• Iterating through the characters in a string for (int i = 0; i < str.length(); i++) { char ch = str.charAt(i); code to process each character in turn . . . }
• Growing a new string character by character String result = “”; for (whatever limits) { code to determine next ch to be added . . . result += ch; }
![Page 16: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/16.jpg)
A case study
/* * File: PigLatin.java * ------------------------ * This file takes a line of text and converts each word into Pig Latin while * keeping punctuation marks. * The rules for forming Pig Latin words are as follows: * - If the word begins with a vowel, add “way” to the end of the word. * - If the word begins with a consonant, extract the set of consonants up * to the first vowel, move that set of consonants to the end of the word * and add “ay”. * - If the word contains no vowel, the word is unchanged. */
![Page 17: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/17.jpg)
• Top level English pseudo code public void run() { Tell the user what the program does. Ask the user for a line of text. Translate the line into Pig Latin and print it on the
console. }• Implementation at the current level
public void run() { println(“This program translates a line into Pig Latin.”); String line = readLine(“Enter a line: “); Translate the line into Pig Latin and print it on the console. }
![Page 18: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/18.jpg)
• Define a method to replace English, interface design
public void run() { println(“This program translates a line into Pig
Latin.”); String line = readLine(“Enter a line: “); println(translateLine(line)); }/** * Translates a line into Pig Latin * @param line An English line * @return The Pig Latin * */Private String translateLine(String line)
![Page 19: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/19.jpg)
• Next level English pseudo code
Apply a pattern, recalling the acronym pattern.private String translateLine(String line) { String result = “”; while not end { Get the next word;
Translate that word into Pig Latin; Append the translated word to result;
} return result;
}
![Page 20: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/20.jpg)
• As a programmer, you will often trip over some detail
that the framers of the problem either overlooked or considered too obvious to mention. In some cases, the omission is serious enough that you have to discuss it with the person who assigned you the programming task. In many cases, however, you will have to choose for yourself a policy that seems reasonable.– In this case, the specification is unclear about spaces and
punctuation marks. A reasonable decision is: Keep spaces and punctuation marks, translate words only.
![Page 21: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/21.jpg)
Implementation guideline
• Identify reusable codes.• Use library whenever possible.
StringTokenizer classimport java.util.*;
Token is a sequence of characters that acts as a constant unit.– In this case, take a word as a token, punctuation
marks as delimiters.Define DELIMITERS: check wikipedia or keyboard.
![Page 22: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/22.jpg)
Implementation guideline (cont.)
• Use the character methods, FIGURE 8-3, and string methods, FIGURE 8-4.
• Use for instead of while whenever possible.– Use for in findFirstVowel, since we can get word.length– Use for in isWord, since we can get token.length
• Use table to exhaust cases.– findFirstVowel, which is called by translateWord, returns
a value -1 or 0 or a positive integer. Thus translateWord must handle all the cases.
![Page 23: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/23.jpg)
Summary
For each level• English pseudo code• Straight implementations at the current level• Design methods to replace English pseudo code• Go to next level methods
Apply implementation guideline.English pseudo code can be used as comments.
![Page 24: Chapter 8 Characters and Strings](https://reader034.vdocuments.us/reader034/viewer/2022050910/56815d38550346895dcb3ae9/html5/thumbnails/24.jpg)
Testing
• Bottom-up testing (start with testing methods at the lowest level and move up, test callees before the caller)
• Test normal cases• Test special or extreme (boundaries of input
variables) cases• Black-box testing (verify input/output specifications)• White-box testing (execute every part of the code,
conditions in if, switch)