chapter 8 characters and strings

24
Chapter 8 Characters and Strings

Upload: bluma

Post on 11-Feb-2016

45 views

Category:

Documents


1 download

DESCRIPTION

Chapter 8 Characters and Strings. Principle of enumeration. Computers tend to be good at working with numeric data. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 8 Characters and Strings

Chapter 8 Characters and Strings

Page 2: Chapter 8 Characters and Strings

Principle of enumeration• Computers tend to be good at working with numeric

data.• The ability to represent an integer value, however,

also makes it easy to work with other data types as long as it is possible to represent those types using integers. For types consisting of a finite set of values, the easiest approach is simply to number the elements of the collection.

• Types that are identified by counting off the elements are called enumerated types.

Page 3: Chapter 8 Characters and Strings

Characters• Computers use the principle of enumeration to represent

character data inside the memory. If you assign an integer to each character, you can use that integer as a code for the character it represents

• Character codes, however, are not particularly useful unless they are standardized.

• The first widely adopted character coding was ASCII: American Standard Code for Information Interchange.

• With only 256 characters, the ASCII system proved inadequate to represent the many alphabets in use throughout the world.

• ASCII has been superseded by Unicode.• Figure 8-1, p. 256, table.

Page 4: Chapter 8 Characters and Strings

Some notes

• The first thing to remember about the Unicode table is that you don’t actually have to learn the numeric code for the characters. The important observation is that a character has a numeric representation, and not what that representation happens to be.

• A character constant consists of the desired character enclosed in single quotation marks. Thus, the constant ‘A’ in a program indicates the Unicode representation of an upper case A. That it has the value 1018 = 6510 is irrelevant detail.

Page 5: Chapter 8 Characters and Strings

Important properties

• The codes for the digits 0 through 9 are consecutive. ‘0’ + 9 is ‘9’

• The codes for the uppercase letters A through Z are consecutive; the codes for the lowercase letters a through z are consecutive.

‘a’ + 2 is ‘c’The arithmetic operations can be used with character

values just as with integers.Avoid using integer constants to refer to Unicode

characters .

Page 6: Chapter 8 Characters and Strings

Special characters• Most of the characters in the Unicode table appear on the

keyboard. They are called printing characters.• The table also includes special characters. They are indicated

in the Unicode table by an escape sequence, which consists of a backslash followed by a character or sequence of digits.\b Backspace\f Form feed (starts a new page)\n Newline (moves to the next line)\r Return (moves to the beginning of the current line)\t Tab (moves to the next tab)\\ Backslash character itself\’ The character ‘\” The character “\ddd The character whose Unicode is the octal number ddd

Page 7: Chapter 8 Characters and Strings

Conversion

• It is better to make the conversion between int (Unicode) and char (character) explicit by introducing type casts.

ExampleRandomly generate an uppercase letter.

private char randomLetter() { return (char) rgen.nextInt((int) ‘A’, (int) ‘Z’); }

Page 8: Chapter 8 Characters and Strings

The operations that generally make sense:• Adding an integer to a character (usually a digit).• Subtracting one character from another.

‘a’ – ‘A’ gives the distance between a lowercase letter and its corresponding uppercase letter.

‘M’ + (‘a’ – ‘A’) gives ‘m’This can be used to convert uppercase letters into lowercase

letters.• Comparing two characters

(ch >= ‘a’) && (ch <= ‘z’) is true if ch is a lowercase letter

Page 9: Chapter 8 Characters and Strings

Useful methods in the character classstatic boolean isDigit(char ch)

static boolean isLetter(char ch)

static boolean isLetterOrDigit(char ch)

static boolean isLowerCase(char ch)

static boolean isUpperCase(char ch)

static boolean isWhitespace (char ch)

static char toLowerCase(char ch)

Static char toUpperCase(char ch)

Page 10: Chapter 8 Characters and Strings

Strings• Java defines many useful methods that operate on the String

class.• The String class uses the receiver syntax when you call a

method on a string• String class is immutable. None of its methods ever changes

the internal state. Classes that prohibit clients from changing an object’s state is said to be immutable.

• What happens is that these methods return a new string on which the desired changes have been performed.

• To change a string, you can overwrite a string:str = str.toLowerCase();

Page 11: Chapter 8 Characters and Strings

Strings vs. characters

• Both the String and the Character classes export a toUpperCase method.

• In the Character class, you call toUpperCase as a static method

ch = Character.toUpperCase(ch);• In the String class, you apply toUpperCase to an

existing string str = str.toUpperCase();

Page 12: Chapter 8 Characters and Strings

Selecting characters from a string

• In Java, positions within a string are numbered starting from 0.

str.charAt(1) gives the second character in str.• A substring can be extracted from a larger string. If a

string variable str contains “hello, world” str.subString(1, 4);

returns “ell”

Page 13: Chapter 8 Characters and Strings

Comparing strings

• Equality: Use s1.equals(s2) instead of s1 == s2 for equality, since s1 == s2 compares objects s1 and s2 (references) not values (content) of objects.

• Order: Use s1.compareTo(s2). It compares two strings s1 and s2 using the numeric ordering imposed by the underlying character codes (lexicographic order), different from conventional dictionary ordering.

• For characters, c1 < c2, compares the codes of c1 and c2.

Other methods in the String class, Figure 8-4, p. 266.

Page 14: Chapter 8 Characters and Strings

Searching within a string/** Given a string composed of separate words, this method returns its * acronym. * @param str Given string composed of separate words. * @return The acronym of the given string. * / private String acronym(String str) { String result = str.substring(0,1); /* get the first character */ int pos = str.indexOf(‘ ‘); /* position of the first space */ while (pos != -1) { /* while not the end */ result += str.substring(pos + 1, pos + 2);

/* concat a leter */ pos = str.indexOf(‘ ‘, pos + 1); /* position of next space */ } return result; }

Page 15: Chapter 8 Characters and Strings

Simple string idioms• Iterating through the characters in a string for (int i = 0; i < str.length(); i++) { char ch = str.charAt(i); code to process each character in turn . . . }

• Growing a new string character by character String result = “”; for (whatever limits) { code to determine next ch to be added . . . result += ch; }

Page 16: Chapter 8 Characters and Strings

A case study

/* * File: PigLatin.java * ------------------------ * This file takes a line of text and converts each word into Pig Latin while * keeping punctuation marks. * The rules for forming Pig Latin words are as follows: * - If the word begins with a vowel, add “way” to the end of the word. * - If the word begins with a consonant, extract the set of consonants up * to the first vowel, move that set of consonants to the end of the word * and add “ay”. * - If the word contains no vowel, the word is unchanged. */

Page 17: Chapter 8 Characters and Strings

• Top level English pseudo code public void run() { Tell the user what the program does. Ask the user for a line of text. Translate the line into Pig Latin and print it on the

console. }• Implementation at the current level

public void run() { println(“This program translates a line into Pig Latin.”); String line = readLine(“Enter a line: “); Translate the line into Pig Latin and print it on the console. }

Page 18: Chapter 8 Characters and Strings

• Define a method to replace English, interface design

public void run() { println(“This program translates a line into Pig

Latin.”); String line = readLine(“Enter a line: “); println(translateLine(line)); }/** * Translates a line into Pig Latin * @param line An English line * @return The Pig Latin * */Private String translateLine(String line)

Page 19: Chapter 8 Characters and Strings

• Next level English pseudo code

Apply a pattern, recalling the acronym pattern.private String translateLine(String line) { String result = “”; while not end { Get the next word;

Translate that word into Pig Latin; Append the translated word to result;

} return result;

}

Page 20: Chapter 8 Characters and Strings

• As a programmer, you will often trip over some detail

that the framers of the problem either overlooked or considered too obvious to mention. In some cases, the omission is serious enough that you have to discuss it with the person who assigned you the programming task. In many cases, however, you will have to choose for yourself a policy that seems reasonable.– In this case, the specification is unclear about spaces and

punctuation marks. A reasonable decision is: Keep spaces and punctuation marks, translate words only.

Page 21: Chapter 8 Characters and Strings

Implementation guideline

• Identify reusable codes.• Use library whenever possible.

StringTokenizer classimport java.util.*;

Token is a sequence of characters that acts as a constant unit.– In this case, take a word as a token, punctuation

marks as delimiters.Define DELIMITERS: check wikipedia or keyboard.

Page 22: Chapter 8 Characters and Strings

Implementation guideline (cont.)

• Use the character methods, FIGURE 8-3, and string methods, FIGURE 8-4.

• Use for instead of while whenever possible.– Use for in findFirstVowel, since we can get word.length– Use for in isWord, since we can get token.length

• Use table to exhaust cases.– findFirstVowel, which is called by translateWord, returns

a value -1 or 0 or a positive integer. Thus translateWord must handle all the cases.

Page 23: Chapter 8 Characters and Strings

Summary

For each level• English pseudo code• Straight implementations at the current level• Design methods to replace English pseudo code• Go to next level methods

Apply implementation guideline.English pseudo code can be used as comments.

Page 24: Chapter 8 Characters and Strings

Testing

• Bottom-up testing (start with testing methods at the lowest level and move up, test callees before the caller)

• Test normal cases• Test special or extreme (boundaries of input

variables) cases• Black-box testing (verify input/output specifications)• White-box testing (execute every part of the code,

conditions in if, switch)