the disambiguation of keyboard of mobile and error correction

Upload: sibongile-chawe

Post on 09-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    1/111

    1

    THE INVESTIGATION OF ERROR CORRECTION FOR A REDUCED

    KEYBOARD DISAMBIGUATING SYSTEM

    by

    SIMANGELE SIBONGILE PRUDENCE CHAWE

    A mini-dissertation submitted for the partial fulfilment of the requirements for the

    degree

    BACCALAUREUS INGENERIAE

    in

    ELECTRICAL AND ELECTRONIC ENGINEERING SCIENCE

    at the

    UNIVERSITY OF JOHANNESBURG

    STUDY LEADER: REOLYN HEYMANN

    JUNE 2010

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    2/111

    SUMMARY

    In this mini-dissertation a predictive text disambiguating system on mobile phones is designed. This is a

    system that is able to correct a spelling mistake within each word when texting a message on a mobile

    phone. A single error/mistake per word will be considered for correction. More than one error/mistake in aword will only be acknowledged by the system and inform the user of the errors in that particular word.

    The types of mistakes/errors that will be de corrected are the inversion errors. These errors are detected by

    the implementation of Levenshtein Distance and the use of Probability Theory. Simulation of the text

    messaging application of a phone is performed using the programming tool that is java based known as

    NetBeans. This simulation is to practically confirm the results that are expected at the initial stage of the

    project. It is also to evaluate the systems ability to correct the errors, which is confirmed through thefunctionality and performance experiments. In the simulation all lengths of the words were investigated,

    from the dictionary used, the maximum word was 28 letters long. Although such long words can be

    corrected, to run the system for such lengthy words, bigger memory (RAM) to the operating system is

    required, say about 2GB.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    3/111

    Table of Contents

    Chapter 1 : INTRODUCTION....................................................................................................................... 1-1

    1.1 BACKGROUND ............................................................................................................................. 1-11.2 PROBLEM STATEMENT ............................................................................................................. 1-31.3 PROJECT OBJECTIVE .................................................................................................................. 1-4

    1.3.1 THE ORIGINAL SYSTEM ........................................................................................................ 1-41.3.2 THE EXPANDED VERSION OF THE ORIGINAL SYSTEM................................................. 1-4

    1.4 PROJECT SCOPE .......................................................................................................................... 1-41.5 METHODOLOGY OVERVIEW ................................................................................................... 1-61.6 ECSA OUTCOMES ........................................................................................................................ 1-71.7 PROJECT OVERVIEW .................................................................................................................. 1-81.8 CONCLUSION ............................................................................................................................... 1-9

    Chapter 2 : INFORMATION THEORY........................................................................................................ 2-1

    2.1 INTRODUCTION ........................................................................................................................... 2-12.2 GENERAL TERMS AND CONCEPTS ......................................................................................... 2-22.3 INVERSION ERRORS ................................................................................................................... 2-32.4 THEORY ON TYPING ERRORS .................................................................................................. 2-3

    2.4.1 POSSIBLE TYPING ERRORS................................................................................................... 2-4

    2.5 ERROR DETECTION AND CORRECTION SCHEMES ............................................................ 2-5

    2.5.1 HAMMING DISTANCE ............................................................................................................ 2-52.5.2 LEVENSHTEIN DISTANCE ..................................................................................................... 2-7

    2.6 PROBABILITY THEORY ............................................................................................................. 2-8

    2.6.1 EXPERIEMENTS, OUTCOMES AND EVENTS ................................................................... 2-102.6.2 PROBABILITY DISTRIBUTION ............................................................................................ 2-102.6.3 TYPES OF DISTRIBUTION .................................................................................................... 2-11

    2.7 CONCLUSION ............................................................................................................................. 2-12

    Chapter 3 : DISAMBIGUATING SYSTEMS ............................................................................................... 3-1

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    4/111

    3.1 INTRODUCTION ........................................................................................................................... 3-13.2 TEXT ENTRY ................................................................................................................................ 3-13.3 THEORY OF DISAMBIGUATION .............................................................................................. 3-33.4 MEASURING AMBIGUITY ......................................................................................................... 3-4

    3.4.1 MODEL ....................................................................................................................................... 3-53.4.2 DICTIONARY ............................................................................................................................ 3-53.4.3 CASE SENSETIVITY ................................................................................................................ 3-63.4.4 TYPES OF KEYBOARDS ......................................................................................................... 3-73.4.5 KEYSTROKES PER WORD (KSPW) ....................................................................................... 3-83.4.6 ADJUST TIME AND TIMEOUT KILL ..................................................................................... 3-8

    3.5 PREDICTIVE TEXT ...................................................................................................................... 3-9

    3.5.1 T9 ................................................................................................................................................. 3-93.5.2 iTAP........................................................................................................................................... 3-103.5.3 MULTITAP ............................................................................................................................... 3-103.5.4 LETTERWISE........................................................................................................................... 3-113.5.5 DISAMBIGUATION FAILURE & MISSPELLING ............................................................... 3-11

    3.6 MULTITAP vs LETTERWISE .................................................................................................... 3-113.7 CONCLUSION ............................................................................................................................. 3-12

    Chapter 4 : DESIGN ...................................................................................................................................... 4-1

    4.1 INTRODUCTION ........................................................................................................................... 4-14.2 DESIGN PROCESS ........................................................................................................................ 4-2

    4.2.1 MULTITAP: ................................................................................................................................ 4-24.2.2 T 9: ............................................................................................................................................... 4-34.2.3 SBO (SIMPLE BUT PERATIONAL input system) ................................................................... 4-64.2.4 SBONGIE (SIMPLE BUT OPERATIONAL N GROUNDS INVERSION ERRORS) ........... 4-7

    4.3 DETAILED DESIGN ..................................................................................................................... 4-94.4 WORD ERROR RATE ................................................................................................................. 4-154.5 CONCLUSION ............................................................................................................................. 4-16

    Chapter 5 : EXPERIMENTAL DESIGN ...................................................................................................... 5-1

    5.1 INTRODUCTION ........................................................................................................................... 5-15.2 EXPERIMENTAL OVERVIEW .................................................................................................... 5-15.3

    EXPERIMENTAL DESIGN........................................................................................................... 5-2

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    5/111

    5.3.1 EXPERIMENT ONE: The screen and the keyboard ................................................................... 5-35.3.2 EXPERIMENT TWO: Dictionary ............................................................................................... 5-55.3.3 EXPERIMENT THREE: Functionality ....................................................................................... 5-65.3.4 EXPERIMENT TWO: Performance ........................................................................................... 5-9

    5.4 CONCLUSION ............................................................................................................................. 5-11

    Chapter 6 : IMPLEMENTATION OVERVIEW ........................................................................................... 6-1

    6.1 INTRODUCTION ........................................................................................................................... 6-16.2 COMPONENT IMPLEMENTATION ........................................................................................... 6-26.3 INTERGRATION ISSUES ............................................................................................................. 6-76.4 CONSTRUCTION ISSUES ............................................................................................................ 6-86.5 CONCLUSION ............................................................................................................................... 6-8

    Chapter 7 : RESULTS AND ANALYSIS ..................................................................................................... 7-1

    7.1 INTRODUCTION ........................................................................................................................... 7-17.2 EXPERIMENTAL RESULTS ........................................................................................................ 7-17.3 ANALYSIS OF RESULTS ............................................................................................................. 7-47.4 CONCLUSION ............................................................................................................................... 7-5

    Chapter 8 : CONCLUSIONS ......................................................................................................................... 8-1

    8.1 INTRODUCTION AND OVERVIEW ........................................................................................... 8-18.2 RESTATEMENT OF OBJECTIVES ............................................................................................. 8-18.3 ACHIEVEMENT OF OBJECTIVES ............................................................................................. 8-28.4 IMPLEMENTATION ISSUES ....................................................................................................... 8-28.5 SHORTCOMINGS ......................................................................................................................... 8-38.6 RECOMMENDATIONS ................................................................................................................ 8-48.7 ACHIEVEMENT OF ECSA OUTCOMES .................................................................................... 8-58.8 FUTURE WORK ............................................................................................................................ 8-6

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    6/111

    i

    LIST OF FIGURES

    Figure 1-1: Mobile phone Industry growth.................................................................................................... 1-2Figure 2-1: Venn diagrams showing two events A and B in a sample space S .............................................. 2-9

    Figure 2-2: Probability function of the Poisson distribution for various values of ................................. 2-12Figure 3-1: The model of a mobile phone ...................................................................................................... 3-5Figure 3-2: The ASCII table and Description ................................................................................................ 3-6Figure 3-3: Time to adjust (tA) and timeout kill (tK) processes as a function of practice for Multitap........ 3-12Figure 4-1:Demonstration of Multitap.......................................................................................................... 4-2Figure 4-2:Demonstration of T9 ................................................................................................................... 4-3Figure 4-3: The number of keystrokes in the same word using T9 and Multitap respectively ...................... 4-5Figure 4-4: The Process of the input System.................................................................................................. 4-6Figure 4-5: The SBO input System ................................................................................................................. 4-7Figure 4-6: The process of input system plus the error correcting system .................................................... 4-8Figure 4-7: The SBONGIE system ................................................................................................................. 4-9Figure 4-8: Simulated mobile phone ............................................................................................................ 4-10Figure 4-9: The input sequence entered by the user.................................................................................... 4-11Figure 4-10: The input sequence presented to the system............................................................................ 4-11Figure 4-11:Disambiguating process .......................................................................................................... 4-12Figure 4-12: The Tree Diagram Structure of the Dictionary set-up ............................................................ 4-12Figure 4-13:Error Detection Process ......................................................................................................... 4-13Figure 4-14:Error Correction process ........................................................................................................ 4-14Figure 4-15: The information displayed on the screen ................................................................................ 4-15Figure 4-16: WER Curve.............................................................................................................................. 4-16Figure 5-1: The Theoretical Overall System .................................................................................................. 5-2Figure 5-2: The screen and the Keyboard...................................................................................................... 5-3Figure 5-3: The setup of the experiment........................................................................................................ 5-4Figure 5-4:Dictionary Structure ................................................................................................................... 5-5Figure 5-5: The Overall system...................................................................................................................... 5-7Figure 5-6: Simulated mobile phone using NetBeans .................................................................................... 5-8Figure 5-7: The expected results .................................................................................................................... 5-9Figure 5-8: The expected graph of the time response .................................................................................. 5-10Figure 5-9: The expected graph of the quantity response............................................................................ 5-11

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    7/111

    ii

    Figure 6-1: Component One: The simulated mobile phone ........................................................................... 6-2Figure 6-2: The translation of the given buttons to the numbered keypads................................................... 6-3Figure 6-3: Tree Diagram structure of the dictionary ................................................................................... 6-3Figure 6-4:An array containing all the possible combinations .................................................................... 6-4Figure 6-5: The diagram showing words accessing in the dictionary ........................................................... 6-4Figure 6-6: The Folder (abc) of length four................................................................................................... 6-5Figure 6-7: The filtering of non English words.............................................................................................. 6-6Figure 6-8:List of English dictionary words from the input sequence and accounting for all possible single

    inversion errors in the sequence.................................................................................................................... 6-7Figure 7-1:Results ......................................................................................................................................... 7-2Figure 7-2: The graph of the time response ................................................................................................... 7-4Figure 7-3: The graph of the quantity response............................................................................................. 7-4Figure 8-1:Array in the library ..................................................................................................................... 8-3Figure 8-2: The Keyboard.............................................................................................................................. 8-4

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    8/111

    iii

    LIST OF TABLES

    Table 2-1: Linear Combination: .................................................................................................................... 2-6Table 4-1: Multitap vs. T9 ............................................................................................................................. 4-4Table 7-1: Results .......................................................................................................................................... 7-2

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    9/111

    iv

    LIST OF SYMBOLS

    Symbol: Description:

    n Number of successive source symbols

    Length of binary codewords

    t number of inversion errors

    w(vi) Hamming weight

    vi number of codewords

    dij Hamming distance

    BER Bit error rate

    WER Word error rate

    SER Symbol error rate

    W word

    MW ways of mistyping the word

    SW Sensitivity of the word Exclusive OR

    C1 andC2 Binary words

    LD Levenshtein Distance

    S Sample space

    Wi input word

    P Probability

    q Probability of failure

    Variance

    x variable

    KSPW Keystroke per word

    tA Adjust time

    tK Timeout kill

    Lj set of dictionary words

    wpm word per minute

    SBO Simple But Operational

    SBONGIE Simple But Operational N Ground Inversion Errors

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    10/111

    1-1

    CHAPTER 1: INTRODUCTION

    1.1

    BACKGROUND

    People have been communicating since the early years of scratching pictures on cave walls, parchments,

    smoke signal etc [1]. Since then communication has evolved and became more and more electronic. Even so

    the idea is still the same, that is, to convey a message from point A to point B. This message is properly

    guided from the sender to the receiver by a communication channel [2]. Devices such as laptops and mobile

    phones are mediums used in our everyday lives to send and receive emails and text messages, multimedia

    messages etc. Communication in the modern society plays a great role as it has invaded every aspect of dailyliving from professional and educational to social [2].

    Nevertheless, there are special groups of people who cannot take advantage of most of the technology

    advances, due to disability. This negatively impact on everyone as it results in the division of our

    community. Technology is taking over our lives at home as well as in our working environment [2]. The

    ability to use and apprehend the technological devices is crucial as it presents greater opportunities socially

    and professionally. About 10% of the worlds population suffers from some kind of disability

    [3]

    . Thesepeoples social and economic lives are compromised by the fact that technology in the past did not

    accommodate for their needs [3]. All this is changing now.

    The early 80s was the beginning of the study of ambiguous keyboards in the desire to help deaf people to

    communicate over the telephone network [3], as they are unable to hear another person speaking over a

    telephone line or channel. Back in those days a dial was used to represent the numbers or keys on the phone.

    The message was encoded by dialling two numbers to represent a single letter. The user would have to lookup the numbers on a chart to dial. That was the beginning of text messaging. The dial was changed to using

    keys instead, as we still have it today. Although initially text messaging was invented to assist the deaf,

    anyone and everyone (capable) are using it today. We also have video phoning. The intention was to make it

    possible for the deaf to communicate via the mobile phone either than text messaging. This of course has

    attracted everyone world wide.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    11/111

    Chapter 1: Introduction

    1-2

    A great number of people are using mobile phones, ranging from primary school kids to business men and

    women. Whatever the need is for these people to use a mobile phone, whether it is for status reasons,

    keeping in touch or advertising to their consumers, about 97% of them are using the messaging application

    on their phones [11]. The numbers of people that have subscribed to mobile phone services have been on the

    increase ever since mobile phones were introduced[11][12]. Figure 1-1 shows the growth of the industry of

    mobile phone, and it still continues to grow [11].

    Figure 1-1: Mobile phone Industry growth

    The main purpose of communication is to break the limitations of accessing information by allowing people

    all over the world to communicate from where they are. This kind of communication today is usually over

    network communication lines. These lines are not perfect, therefore are subject to technical as well as

    economical problems. These problems vary, ranging from cross lines, line congestion to the time

    consumption and cost. A reliable communication is a perfect one, and it does not exist in the real world,

    qualities are traded off.

    If a cheaper line is desired, then it might take longer or the message could be send to the wrong person etc.

    Hence a reliable system for every individual is unique; it depends on what the individual is willing to trade

    in. Selecting a preferred medium of communication is greatly affected by the problems experienced by the

    communication lines. In this project, the main focus is on message texting and the accessories involved in

    texting a message on a mobile phone.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    12/111

    Chapter 1: Introduction

    1-3

    1.2 PROBLEMSTATEMENTWe are living in fast times, where time is money. No one enjoys spending time texting a message, it feels

    like a waste of time, especially when a message is very detailed. The problem is that people can not type as

    fast as they think or would like to, especially when faced with the constraints of mobile devices. There are

    numerous approaches used to solve these problems, one being a voice recognition system; this is where

    texting is done automatically by a phone through word spoken to it. According to the milestone of

    technology this is the direction technology is heading to, the challenge in an efficient and reliable

    communication system is time. Voice recognition capabilities on a mobile phone are complemented with

    traditional way of text messaging because you can not use your voice in a noisy environment or in a

    meeting, an input text will still be necessary in voice recognition mobile phones. Hence although manually

    texting a message is tedious at times, the advance technologies will not completely eliminate it. Text

    messaging will always be a great influence to our everyday lives.

    The main focus of this project is on the traditional way of text messaging. Another approach used to deal

    with the constraints of mobile technology is the predictive text software called T9; T9 has been proven to be

    efficient and reliable, as it comes as an optional standard on most mobile phones. The software allows the

    user to text in the sequence of key numbers on which the desired letters are located and it will make out

    what the word is. If the sequence makes out more than one word, then the list of the words are given in some

    form, alphabetically or in descending order of each words probability. The question we are asking ourselves

    in this project is what happens if the users sequences of key numbers are mistyped? Is it still possible to

    make out what the word is using T9, after a single error, an inversion error, which is mistyping a key

    adjacent to the intended key? The answer for a T9 system is sometimes, and that answer is not good enough.

    The predictive text software such as T9 mainly focuses on reducing the number of times the keys are pressed

    per word, in the hope of reducing the time rate and speed up the process of text messaging on a mobile

    phone. While it is successful in doing that, it only accommodates spelling errors to a very small degree, in

    that it completely divert from the intended word just by the presence of an error in the sequence of keys. We

    will be dealing with inversion errors that occur in texting message in this project.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    13/111

    Chapter 1: Introduction

    1-4

    1.3 PROJECTOBJECTIVEThe purpose of the project is to implement the already existing tools in the predictive text software, modify

    it where possible, as well as to introduce error correction schemes to an erroneous sequence. To achieve this,

    an original system that is intended to reduce the number of keystrokes is required and a system that will be

    the better version of the original system in that it is able to correct errors. Hence the project will be divided

    into two parts: The implementation of the original system, and the expanded version of the original system,

    which is the system that will achieve the overall objective, which is to correct inversion errors within the

    text messaging of a mobile phone.

    1.3.1 THE ORIGINAL SYSTEM

    This is a system that makes it possible for the user to input the sequence, intending to spell a word. The

    system will list out all possible characters resulting from the particular sequence. Therefore the objective of

    this part of the system is to enable the user to text in the sequence without having to tap one key more than

    once, which saves time.

    1.3.2 THE EXPANDED VERSION OF THE ORIGINAL SYSTEM

    The objective of this part of the system is to accommodate for an error that may occur in the sequence of

    keystrokes given by the user. In this case the original system will not be able to make out the true intended

    word if there is a single error, because none of the possible characters made out by the sequence will come

    to the intended word due to the incorrect keystroke. The system will consider all possible combinations

    including the possibility of incorrect key in the sequence.

    1.4 PROJECTSCOPEA computer will be used to simulate the predictive text disambiguating system. This system will be in the

    range with other existing disambiguating systems, like T9, Multitap and letterwise. The success of the

    project will result in a more efficient system. We will focus on correcting single inversion error per word,

    the system will also be considered for speed efficient, but this will not be the focus of the project. The

    differences and similarities, as well as advantages and disadvantages of the different predictive text

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    14/111

    Chapter 1: Introduction

    1-5

    disambiguating system will be considered in the comparison of the systems. Some of the already existing

    software will be looked at but not used in the project this is just for the sake of being aware and appreciating

    work/projects similar to this project. The following are the boundaries of this project:

    A 12 key- numeric keyboard will be used throughout the project, shown in Figure 3-1.

    English dictionary is used, hence only English words are considered,

    Only one inversion error per word is to be corrected.

    The memory used as storage space for the dictionary is assumed to be enough and will not be

    investigated.

    The first letter of the input sequence is always considered as correct, from research done, the

    mistakes made by the user in spelling a word, especially a long word, is mistyping and not so much

    as the lack of spelling knowledge. The first letter being correct introduces some degree of certainty

    to the system.

    The system will not be case sensitive, that means upper case and small case letters are seen as

    equivalent.

    There are systems such as iTap that do the word completion, to ensure that we do not deviate from

    the actual problem of this project, we will not be implementing such systems.

    The above mentioned limitations are proven to be satisfactory and will contribute in simplifying the project

    and maintaining the focus on the main objective of the project, which is to correct a single inversion error

    per word in a text message on a mobile phone.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    15/111

    Chapter 1: Introduction

    1-6

    1.5 METHODOLOGYOVERVIEWStarting the project, extensive reading on the field of communication, all the technological devices and

    recent developments on technology is researched and well understood. The research is mainly done using

    the internet as well as library textbooks and magazines. This composes the first phase of the project. The

    knowledge of previously presented projects on topics related to mobile phone predictive text and

    disambiguating systems, gives the summary and the overview of what is out there and a guide line for this

    project.

    The second phase is structuring the project and pointing out the focus of the project. The structure is there to

    put boundaries to ensure that the objectives of the project are attained to and any diversion that may occur

    are quickly spotted and eliminated. The investigation of the use of the text messaging application on mobile

    phones is to be performed. Since this investigation has been conducted by other researchers, the findings are

    taken from there and used as part of this project. Problems faced by the mobile phone users are part of the

    findings, and the objectives of this project are to assist in minimising some of these problems.

    The investigation is part of the first phase as well as the second phase and introduces the third phase were

    the solution is to be designed and analysed. The solutions will be assessed in depth and then be

    implemented. Deciding that the solution is the optimum solution will be a process of step by step

    comparison of each block to already existing software. The advantages and disadvantages will determine the

    why the solution is the suited solution.

    After a solution has been selected, it will be executed to fit the specification of this project and help achieve

    the project objective. Program code will be written in a chosen and hence suited language to efficiently

    detect an error, select a solution to solve the error and finally correct it, if possible. The comparison of a text

    word to the dictionary words, error detection and error correction will be carried out in three steps

    respectively; this composes the last and final phase. Together the above mentioned phases are to be

    integrated together to achieve the main objective of an efficient and reliable system, capable of correcting a

    single inversion error per word in a text message.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    16/111

    Chapter 1: Introduction

    1-7

    1.6 ECSAOUTCOMESThroughout the project the first five (5) Engineering Council of South Africas (ECSA) outcomes are to

    be achieved. These are the following:

    Engineering problem solving

    Application of fundamental and specialist knowledge

    Engineering design and synthesis

    Investigation, experiments and data analysis

    Engineering methods, skills, tools and information technology

    In this project, ECSA outcomes will be achieved. Starting with the objective, which is identified and

    explained in Section 1.3, this objective is an engineering problem which requires fundamental and specialist

    knowledge in the field of telecommunication as well as information theory. Mathematical and statistical

    calculations will be used extensively, as the probability of an error/mistake occurring in a text word will be

    required to show the need for this project to be conducted, as well as the probability of more than one

    error/mistake in a word occurring. The information theory will be used to calculate and analyse the overall

    system.

    The design and simulation of the predictive text disambiguating system fulfils the third ECSA outcome,

    which is the engineering design and synthesis as well as the forth, which is investigation, experiments and

    data analysis. This is shown through the research done to gain knowledge about relevant topics; this is

    explained in Section 1.5. To fulfil the requirements of the project, engineering methods, skills, tools and

    information technology are assets that one must acquire, the end of the project will mean that all these assets

    will have been attained, fulfilling the final ECSA outcome.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    17/111

    Chapter 1: Introduction

    1-8

    1.7 PROJECTOVERVIEW

    In this chapter the problem of sending incorrectly spelled messages and its implication has fully been

    addressed and how the solution to this problem is important. The constraints of mobile devices contribute

    greatly in the inducement of errors. These errors results in an inefficient performance of the mobile phones.

    In the following chapters, the solutions to the already mentioned problem statement will be addressed. Then

    Chapter 2 focuses on Information theory of the overall project, this theory is obtained through extensive

    research that is explored in the field of error detection and correction, as well as on probability theory.

    Research is done on all available predictive text software; comparison is done to appreciate each software

    contribution to the ever developing technology of mobile phones. The different algorithms are briefly stated

    to familiarise the reader to the concepts that will be used to help solve the problem at hand. These algorithms

    are explained in depth in Chapter 3, where the focus of the theory is narrowed down to the point of view of

    the project, this is done by using the theory in Chapter 2 and further exploring it and put it in the sense as to

    help with the project solution. Chapter 3 takes us through the process of using a mobile phone in the

    message texting application. The concept of ambiguity, types of keyboards and the algorithms used in

    predictive text are explained further and in depth.

    In Chapter 4, the actual design of the disambiguating system is discussed and the process of the design is

    given step by step. This is done progressively to help the reader fully comprehend the process and the design

    itself. To further assist in the comprehensive of the project to the reader, block diagrams are used for each

    progressive step. In Chapter 4 two designs are considered and compared to each other, the advantages and

    disadvantages of the designs are put into perspective and the most efficient design is to chosen and used to

    further help in the completion of the overall project. The positive aspects of each design are used to obtain a

    suitable system to be implemented in this project, and then the design system is expanded to fulfil therequirement of the project. The design is to be explained theoretically in Chapter 4, in Chapter 5, the

    functionality of the design is to be tested, evaluated and analysed. The limitations as well as expectations of

    the system are also explained in this chapter.

    We could imitate the T9 or Multitap software exactly and improve on them to obtain the project objectives,

    but the main focus or lack of, on this software is speed. Where speed is a complimentary element in this

    project, hence using the existing software as is would complicate and confuse the focus of the project.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    18/111

    Chapter 1: Introduction

    1-9

    Therefore a simplified version of the already existing software is designed, called the SBO input system; this

    is done to ensure that the focus is on detecting and correcting inversion errors. This is explained in the

    section of implementation and Chapter 6 is about that. Then Chapter 7 is results obtained from tests

    performed in chapter 5, we discuss how the results compare to the expected results and specifications of the

    project and evaluate how the project objectives have been attained through the obtained results. And finally

    we conclude about the project as a whole in Chapter 8.

    1.8 CONCLUSION

    Communication using text messaging has expanded since it was firstly discover to allow deaf or disable

    people to use the telephone line. The spectrum of people using the system includes all kind of people from

    all over the world, making efficiency a top priority. The mobile phone has a reduced keyboard which has

    proven to be efficient but introduces ambiguities which lead to spelling errors, due to miss keying or lack of

    spelling knowledge. It is time consuming to the user to manually check spelling mistakes. In this project a

    system is designed to automatically detect and correct single type of error, eliminating the mentioned

    problem of having to worry about spelling mistakes. This system with all the specifications will be achieved

    at the completion of this project. The aim is to achieve the objectives described in section 1.3 while workingwithin the boundaries set to ensure that the focus of the project is attained at all times. And yet not

    compromise or complicate the already existing system in the process.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    19/111

    2-1

    CHAPTER 2: INFORMATION THEORY

    2.1

    INTRODUCTION

    The overview of information theory as a whole as well as information on error correction is presented in this

    chapter. This is to familiarise the reader with the theoretical words that will be used throughout this mini

    dissertation. The field is broad, hence only the relevant and important topics will be covered in this chapter.

    In the following chapter, the focus is on expanding more on topics that make up this project. In the current

    chapter we begin by introducing The General Terms and Concepts which introduces the theory of

    communication. This is to give a comprehensive understanding to the reader of terms like block code, codebook etc, as they make up a wide section in the field of Telecommunication.

    We look into existing research on predictive text and different techniques, as well as BER and WER.

    Working with bits rather than words or symbols is easier to understand and explain, so we introduce the

    concept of BER then incorporate WER, which will be used to assess the systems performance of the overall

    project. From the information on errors, especially typing errors, we discuss the possible errors as well as

    how they come about. This is done to convince and aware the reader of the constraints present by the mobilephone. We then move on to theError Detecting and Correcting Schemes, in this section we broadly explain

    existing concepts and theories and error detection and error correction. Examples are given to ensure that the

    theories are fully understood and that their implementation to the solution associated with other information

    will be easily apprehended.

    We conclude by introducing the theory on probabilities. This section is broadly discussed and some of the

    theory will not be used at first hand, it will be at the background of the solution but will make the readeraware of the writers perspective. This will be like a reference point from which the writer and the reader

    will be viewing this project from. The term probability is explained and the laws are stated along with the

    Venn diagrams, to explain the probability concept. Then finally we look into the probability distribution and

    the different types of distribution.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    20/111

    Chapter 2: Information Theory

    2-2

    2.2 GENERALTERMSANDCONCEPTSWhen discussing information theoretic concepts, it is useful to consider blocks rather than symbols, with

    each block consisting ofn number of successive source symbols. The term memory-less is often used when

    talking about communication channels and sources [24]. A memory-less channel depends only on the present

    input and is independent of the previous inputs and outputs [24], [20]. The same concept applies to a memory-

    less source; the output is independent of its last state of emission.

    Before we can introduce some of the concepts in this section, some terms need to be defined first:

    Code word: This is a, say kbits of information representing a sample signal at the receiver, this code

    word is used to compare it with the received signal to reconstruct the sample after errors have

    occurred through transmission.

    Code book: This is a list of all code words to be received at the receiver.

    Super word: Is a code wordreceived at the receiver with insertion errors.

    Sub word: Is a code wordreceived at the receiver with deletion errors.

    We have already defined the meaning of inversion errors in Section 1.2. Now the words defined above are

    integrated to form new concepts. An insertion error is one where there is an extra unwanted symbol within a

    code word (successive symbols making up a word/ information) [24]. A deletion occurs when there is a

    missing symbol within a code word. This happens when the code word is subject to some sort of noise

    through a channel. When there is an insertion error, the code word is called a super wordand a sub-wordis a

    code word resulting from deletion error. There is also an inversion error, where a symbol has been replaced

    by another symbol. Inversion errors are of great importance in this project, therefore it is important that we

    distinguish them from other errors.

    In binary form, an inversion error is when a 1 is changed into a 0 and vice versa, but this concept is more

    involved when dealing with words and symbols, because an inversion of a letter could be any other letter.

    That is the reason we will first explain the concept of BER then move on to the concept of WER.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    21/111

    Chapter 2: Information Theory

    2-3

    The rate at which a certain error correction scheme performs is called a Bit error rate (BER). The BER

    measures the probability of a bit that has been corrected after decoding [7]. Another rate that measures the

    performance of an error correction scheme is called the word error rate (WER) [17] and also known as

    (SER). The WER measures the probability of a block that has been incorrectly decoded.

    2.3 INVERSIONERRORSIf one bit of the transmitted data is inverted from a 0 to a 1 and vice versa this type of error is called an

    inversion error. This is the most common type of error that is experienced by digital communication

    systems. A large number of projects have been carried out in the past to minimise inversion errors. In this

    project we consider inversion errors in words, which are more complex and can be difficult to detect or

    correct. In the next section, we discuss the types of error that are common on mobile phone and how they

    could occur while texting a message.

    2.4 THEORYONTYPINGERRORSIn physics, problems involving complicated correlations are often explored using mean field theory. In mean

    field theory, correlations between sites are decoupled. The result is a good approximation provided

    correlations are small. In the present case, correlations between typing errors are small if the probability of

    making a single typing error is small. If so, then the probability of making more than one keystroke error

    within a given word is small and there can be little correlation. For single keystroke error rates in the range

    (0% to 10%), we expect the mean field approximation to be very good.

    The procedure for building a mean field approximation is as follows:

    Construct the set of possible 1-keystroke typing errors.

    Weight all possible 1-keystroke typing errors equally.

    Determine the average effect of a one-keystroke error.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    22/111

    Chapter 2: Information Theory

    2-4

    Use this average effect to calculate the expected amplification of typing errors.

    This theory is interesting but it is beyond the scope of this project. It is important to note that the speed at

    which a word is texted into a mobile phone using the 12- keyed keyboard also contributes into the

    efficiency of predictive text.

    2.4.1 POSSIBLE TYPING ERRORS

    To model the way typing errors are made on a mobile phone keypad, we assume the following:

    Typing errors are due to hitting keys adjacent to the intended key, either horizontally or vertically.

    All ways of making typing errors occur with equal probability. And we will consider double typing

    errors (a keystroke is mistakenly repeated), inversion errors (an incorrect keystroke), insertion or

    deletion errors (a keystroke is spuriously inserted or omitted), etc. Inclusion of these types of errors

    complicates the analysis, but does not change the conclusions.

    Given these assumptions, we can say that for each word W, there areMw ways of mistyping the word with a

    one keystroke error. For instance, the word so is typed using the standard ambiguous code with the key

    sequence 76. There is one key vertically adjacent to the 7 key: the 4 key, which corresponds to the letters g,

    h, and i. There is one key horizontally adjacent to the 7 key: the 8 key, corresponding to the letters t, u, and

    v. similarly, there are two keys vertically adjacent to the 6 key: the 3 key, corresponding to the letters d, e,

    and f; and the 9 key, corresponding to the letters w, x, y, and z. There is one key horizontally adjacent to the

    6 key: the 5 key, corresponding to the letters j, k, and l. Each adjacent key might be mistakenly hit.

    The possible key combinations in a mistyping are: 46 and 86, where the first keystroke is in error, and, 73,

    75, and 79, where the second keystroke is in error. In the T9system, which is explained in Chapter 3, these

    keystroke combinations give rise to the letter combinations in, to, re, pl, and ry respectively. The

    difference between the letters intended and the letters displayed will be referred to as display errors. The

    number of display errors for these mistyping is: 2, 1, 2, 2, and 2, respectively. The average display error over

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    23/111

    Chapter 2: Information Theory

    2-5

    all of these mistyping is 1.8. We call this average number the sensitivity of the word so under T9, which is

    discussed in the following chapter.

    To discuss the corresponding calculation for Multitap, which is also discussed in Chapter 3, we indicate the

    unambiguous shifted letters in bold. Thus, the word so is written so. The mistyping for so are: ho,

    to, sf, sk, and sw. In each case, there is only one display error [15]. Thus the average is 1, and the

    sensitivity of the word so under Multitap is 1, S w =1[3], [19]. Continuing to compute the sensitivity of all

    words in the same way for T9and Multitap, we obtain the distributions. We see that word sensitivities for

    T9peak around 4, while most words have a sensitivity at or near 1 for Multitap. Since the average word

    length for English is about 5.5 letters [18], these data imply that single keystroke errors often cause display

    errors throughout the word.

    2.5 ERRORDETECTIONANDCORRECTIONSCHEMESThe capability to detect when a word has been incorrectly spelled is called error detection and the ability to

    correct such an error is called error correction. In this section, error detection and correction schemes are

    explained.

    2.5.1 HAMMING DISTANCE

    The Hamming distance is named after Richard Hamming, who introduced it in his fundamental paper on

    Hamming codes, Error detecting and error correcting codes, in 1950[9]

    . This distance represents the number

    of different or opposite bits in a fixed length binary word. This distance is used as an estimate of error.

    Therefore Hamming distance between two strings of equal length is the number of positions at which the

    corresponding symbols are different. Table 2-1 will be used as reference for linear combination, (i.e.

    exclusive OR and OR).

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    24/111

    Chapter 2: Information Theory

    2-6

    Table 2-1:Linear Combination:

    000 = 011 = 110 = 101 =

    000 =+ 111 =+ 010 =+ 001 =+

    A broader view of hamming distance is one where the distance between code words is taken into account. A

    length N codeword means that at the receiver it must be decided, among 2N possible data words, which of

    the 2K code words were actually transmitted. The Hamming distance between binary words c1 and c2 are:

    2121 ),( ccccd = (2.1)

    The number of inversion errors that a code is capable of solving is represented by tand length of a binary

    codeword is represented by n. The symbol v is used to represent the transmitted code word and ris used to

    denote the received code word, in the context of this project, the transmitted word is the word text by the

    user and the received word is the dictionary (correct) word.

    The number of errors that can be detected in a codeword is denoted by e and the number of synchronization

    errors (between the text word and the correct dictionary word) that a code can correct is represented by s.

    Since synchronization errors are modelled as deletion and insertion errors. This means that s is the sum of

    deletion and insertion errors that can be corrected by a given code [5].

    di sss += (2.2)

    Where:

    is = The total number of insertion errors that are corrected

    ds = The total number of deletion errors that are corrected

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    25/111

    Chapter 2: Information Theory

    2-7

    The intentional use of redundancy to an advantage is called error-control-coding [6]. The concept of

    codeword separation and distance are then introduced to define the principles of error-control-coding.

    Hamming weight w(vi) is the total number of ones in a codeword vi and Hamming distance dij=d(vi,vj)

    represents the total number of positions in which codeword viand vj differ[2].

    Hamming weight analysis of bits is used in several disciplines including information theory, coding theory,

    and cryptography [9]. However, for comparing strings of different lengths or strings where not just

    substitutions but also insertions or deletions have to be expected, the Levenshtein distance is more

    appropriate in this regard.

    2.5.2 LEVENSHTEIN DISTANCE

    The metric is named after Vladimir Levenshtein, who considered this distance in 1965 [20]. It is often used in

    applications that need to determine how similar, or different, two strings are, such as in spell checkers.

    Levenshtein distance calculates the least number of edit operations that are necessary to convert one string to

    obtain another string [23], [24]. The Levenshtein distance (LD) is a measure of similarity between two strings,

    denoted here by s1 and s2. The distance is the number of deletions, insertions or substitutions required to

    transform s1 into s2. The greater the distance, the more different the strings are[20]. The algorithm employs a

    proximity matrix, which denotes the distances between substrings of the two given strings.

    For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits

    change one into the other, and there is no way to do it with fewer than three edits [15]:

    kitten sitten (substitution of 's' for 'k')

    sitten sittin (substitution of 'i' for 'e')

    sittin sitting (insert 'g' at the end).

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    26/111

    Chapter 2: Information Theory

    2-8

    2.6 PROBABILITYTHEORYThe probability of an experiment is a measure of a chance that a specific outcome will occur. The

    probability is given by equation (2-3).

    SinoutcomesofNumber

    AinoutcomesofNumberAP =)( (2-3)

    1)( =SP (2-4)

    The definition of Equation (2-4) is from the Equation (2-3). In this project the number of events in a sample

    space S is two, defined as A for the input word:

    iW (2-5)

    And B for the dictionary

    jL (2-6)

    Where j = {1, 2...M} and M is the number of words in the dictionary.

    In connection with the probability laws, the concepts about events of a sample space are given as follow

    BAU (2-7)

    Equation (2-7) is called the union of events and consists of all the outcomes that are in A or B or both

    BAI (2-8)

    Equation (2-8) is called the intersection of events, and consists of all the outcomes that are in both events (A

    and B). When:

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    27/111

    Chapter 2: Information Theory

    2-9

    =BAI (2-9)

    Where is the empty set, therefore A and B are mutually exclusive or disjoint. This occurs when the

    outcomes in A excludes that in B. The concept is further illustrated in Figure 2-1, below.

    Figure 2-1: Venn diagrams showing two events A and B in a sample space S

    Part one in Figure 2-1 illustrate the Intersection concept and Part two the Union concept, define in equation

    2-4 and 2-3 respectively. Let:

    }{iWA = (2-10)

    },...,,{ 21 MLLLB = (2-11)

    To determine if the input word (Wi) is equivalent to one of the word of the dictionary (Lj), the rule of

    intersection, Equation (2-8), is implemented. The outcomes are yes or no, the former occurs when event A

    has something in common with event B. And the latter occurs when Equation (2-9) is fulfilled. Hence in

    these instances, the correctness of the word is determined. Before the probability distribution is introduced,

    there are terms which need to be put in perspective, which are Experiments, Outcomes and Events. It is

    important to fully understand this section, as it greatly influences and builds the overall solution to the

    project.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    28/111

    Chapter 2: Information Theory

    2-10

    2.6.1 EXPERIEMENTS, OUTCOMES AND EVENTS

    The theory of probability has the purpose of providing mathematical models of situations govern by the

    effects of chance. For instance the chance that the word is spelled incorrectly or the correctly spelled word is

    in the wrong context. The other examples are the chances of failing or passing a module, there is a process

    involved which will finally lead to an outcome.

    An experiment is a process of measurement or observation [23]. It can be in a laboratory, in the street, or

    generally in nature. A single performance of an experiment is called a trial and result in an outcome. All

    possible outcomes in an experiment make up a sample space which is usually denoted by S. The subsets ofS

    are called the events. In the context of this project the sample space is made of events A, which consist of

    the input word (Wi) and event B, which consist of all the words within the dictionary.

    2.6.2 PROBABILITY DISTRIBUTION

    The probability distribution describes the range of possible values that a random variable can attain and the

    probability that the value of the random variable is within any (measurable) subset of that range. In this

    project, the probability of the experiment of text messaging, where a word is text in and then determined if it

    is a correct English dictionary word, the probability of the incorrect word becomes very crucial as it will

    determine if this incorrect word can be corrected or not. For instance, a probability of 0.9 is considered to be

    close enough, therefore that word should be considered for error correction.

    The probability distribution explored in the next subsection is mainly to familiarize the reader of this

    concept and the different types of distribution. The different types of probability distributions are listed and

    explained below:

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    29/111

    Chapter 2: Information Theory

    2-11

    2.6.3 TYPES OF DISTRIBUTION

    Each distribution is different, and a suitable distribution for a particular experiment is needed. Listing them

    will help in the selection of a suitable distribution for the sub-section of the project.

    Bernoulli Distribution

    Bernoulli distribution is a discrete probability distribution with the outcome which can only take on one of

    the two results, success or failure. It takes on a value 1 for success and a value 0 for failure [17]. This value

    will be defined with the symbol n .The probability of success is:

    P (2-12)

    The probability of failure is:

    Pq = 1 (2-13)

    Binomial Distribution

    The binomial distribution is the discrete probability distribution of the number of successes in a sequence of

    n independent success/failure experiments, each of which yields success with probability P, as defined in

    Equation (2-3). Such a success/failure experiment is also called a Bernoulli experiment, and was explained

    in section 2.5.3.1. In fact, when n = 1, the binomial distribution becomes a Bernoulli distribution The

    binomial distribution is the basis for the popular binomial test of statistical significance[23], [24], [20].

    Poisson Distribution

    Poisson distribution is the discrete distribution with infinitely many possible values and probability

    functions [23]. It is defined by the following equation.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    30/111

    Chapter 2: Information Theory

    2-12

    = e

    xxf

    x

    !)( (2-14)

    The effect of is shown in Figure 2-2, below.

    Figure 2-2: Probability function of the Poisson distribution for various values of

    2.7 CONCLUSION

    The main focus of this chapter was to give a global view on the concepts and theory in the field ofinformation theory and probability theory. Having successfully understood the information given in this

    chapter will equip you, the reader, with understanding the different paths that will be taken in the journey

    that we will take while trying to fulfil the objectives of the project. This knowledge is broad and may

    influence the reader to have alternative methods to use other than the solution that will be presented in this

    document. Chapter 3 is similar to this chapter as it provides the reader with the knowledge to that will put

    the reader and the writer on the same page but it is different in that it focuses mainly on technical issues that

    are related to this project and other projects relating to the same topic.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    31/111

    3-1

    CHAPTER 3: DISAMBIGUATING SYSTEMS

    3.1

    INTRODUCTION

    The main issue of this project is to disambiguate mobile phones in the application of message texting. In this

    chapter, all the acquired information from Chapter 2 will implemented in such a way as to help in the

    achieving of the objectives of the project. In this chapter it will be shown how the information acquired from

    the previous chapter will be put into use for the good of the project. The technicality of the theory in the

    field of mobile phone is expanded, by exploring the action of Text Entry. Doing so will highlight the

    importance of the implementation of predictive systems within a mobile phone. The theory ondisambiguation will explain the process used to remove the uncertainty presented by hardware of the mobile

    phone. Then finally the different types of predictive software are explored and explained

    3.2 TEXTENTRYOn a typical telephone keypad, groups of letters in alphabetical order are associated with key numbers. For

    example, a, b, and c, are typically associated with number 2. Thus, any single press of a key is

    ambiguous, as it may represent any of the associated sets of three or four letters.

    Early text-entry methodology concentrated on explicitly disambiguated entries: two-key input (chording)

    and multi-pressing. Two-key entry methods activate a combination of keys, simultaneously or in sequence,

    to encode each symbol unambiguously. Such systems require the user to press the key associated with the

    desired letter, and then follow with a second key to specify the position of the letter on the first key. The

    second key is usually one of the keys on the top row: for example, to enter g, a user would press 4

    followed by 1 [5], [10].

    Multi-press works differently. It requires multiple taps on the same key to disambiguate an entry: the user

    taps the key the number of times corresponding to the position of the letter in the standard ordering. For

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    32/111

    Chapter 3: Disambiguating Systems

    3-2

    example on the number 2 key, the user taps once for a, twice for b and three times for c [2], [3].

    Although these methods offer perfectly unambiguous entries, they involve the significant disadvantage of

    requiring more than one keystroke per letter, which results in cumbersome and laborious typing.

    In view of this perceived disadvantage and since ease of use and typing speed are essential components of

    effective text entry. The methods enabling one keystroke per letter emerged as early as the 1970s, using the

    standard ambiguous code and a database of stored responses, represented by their numerical sequences.

    Early testing and implementations of such systems established their accuracy [16], and ease of use [13]. In

    recent years, dictionary-based disambiguation mechanisms have appeared in various forms. Often aided by

    N-gram frequencies, syntactical information, or other statistical information of letter and word frequencies

    [1], [9], [11], [14]. When used, such systems compare the numerical code of an entry, which is treated as a discrete

    unit, with those found in a database, and guess the intended letters or words. However, many dictionary

    words share the same numerical code, and in these cases the system will present alternatives in a list (a

    query). The user selects the intended word from the list. This requires extra taps for the word entry to be

    correct and complete.

    Nevertheless, even if dictionary words are correctly disambiguated with just a few extra taps, a greater

    problem is created in practice because it is necessary to allow entry of non-dictionary words, which are in

    everyday use (e.g., proper names, slang, abbreviations, technical and professional terms, etc.). For example:

    a collection of text from the 1988 Wall Street Journal containing 20,691,239 words, was found by James

    Raymond Davis [4] to contain not only 8,633,941 ambiguous words, but also 4,007,375 words which were

    not in Websters seventh dictionary, to which it was compared.

    With a perfect dictionary, multi-press methods and word guessing methods are points on a continuum. For

    both, extra keystrokes beyond one per intended letter are required for a word to appear correctly. In the case

    of multi-press, the extra keystrokes are entered throughout the word to select each intended letter, while in a

    word-guessing method the extra keystrokes all occur at the end of the word. Delaying the extra keystrokes

    until the end of the word has the advantage of reducing the total number of keystrokes which must be

    entered. It has the disadvantage of causing the display to be unstable as algorithms typically present their

    current best guess based on partial information after each keystroke is entered.

    With an imperfect dictionary, word-guessing methods fail catastrophically on many words which are not in

    the dictionary. Since a word guessing algorithm cannot match a word not in its dictionary, it resorts to

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    33/111

    Chapter 3: Disambiguating Systems

    3-3

    default rules which do not involve complete words in order to make a guess. When the underlying code is

    the highly ambiguous, these default rules will typically render letter sequences which have little relationship

    with the intended letter sequence.

    In some implementations of word-guessing systems, such as Tegic Communications implementation on the

    Ericsson model 280, a failure of the word-guessing algorithm places the phone into multi-press mode so that

    the word can be re-entered unambiguously using multi-press. This solution enables users to enter any letter

    sequence, but slows typing speed significantly.

    3.3 THEORYOFDISAMBIGUATIONAn ambiguous keyboard is one where the number of possible selections is less than the number of possible

    characters i.e. the number of keys on a keyboard is less than the number of letters in the alphabet.

    Ambiguous keyboards have recently come into the popular domain through the popularisation of mobile

    phones for SMS messages. If we type with an ambiguous keyboard, it is most likely that we end up with

    gibberish on the displaying screen of the phone; we rely on a disambiguation process for it to make sense.

    Disambiguation looks at the sequence of keys pressed (for example, using the above keyboard: 3def,

    6mno, 4ghi) and works out what the meant word is, for instead this sequence combination gives the word

    dog. It does this by having some knowledge about the language and using it to guess the most likely word.

    The ambiguity is completely removed when you look at the word and confirm it is correct or by selecting an

    alternative word from the presented list of words.

    This is the disambiguation process commonly used on mobile phones, the most common method which is

    called T9 and patented by Tegic in 1985. There are other ways of disambiguating and it is also possible to

    have any number of keys down to 3 rather than the 12 keys we normally use on a mobile phone. These

    topics are discussed later on in this chapter. Disambiguation relies on the fact that language has a certain

    amount of redundancy about 50% in English [23]. This is because the way we use letters is not random, it is

    partially defined by the structure of the language, for example in English, no words contain the string pq,

    not many contain dr and lots contain er. These features of language were first identified by Shannon and

    Weaver in 1963, who described the Information Entropy of language.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    34/111

    Chapter 3: Disambiguating Systems

    3-4

    Within the context of this paper, we are discussing disambiguation for character entry, in other words

    typing. We should not ignore, however, that although this paper will mostly discuss letters and keyboards,

    the process is applicable to any alphabet and language including those represented by symbols.

    3.4 MEASURINGAMBIGUITYA systems ambiguity or efficacy in disambiguating entries can be measured in at least two ways: the query

    rate and the lookup error rate. Both are easy to measure given a complete list of words from a language, their

    numeric equivalents, and their probabilities. The query rate measures how often the same keystroke pattern

    yields multiple words. It is calculated as the reciprocal sum of all probabilities of all the words with identical

    codes. Using the standard ambiguous code, a query occurs, on average, every three words, which at an

    average typing speed of 20 wpm, means every 9 seconds [4], [7].

    The lookup error rate measures how often the desired word is not the first in the list of alternatives in a

    query. It is calculated as the reciprocal sum of probabilities of words in queries, except the first. Using the

    standard ambiguous code, lookup error occurs in every 28 words [7].

    Clearly, queries slow typing speed by demanding attention and cognitive processing time from the user.

    Lookup errors exact the same demands, in addition to requiring extra taps to select the right word from the

    list. In this documentation, we present an optimized system, which reduces the number of queries and

    lookup errors significantly, thus eliminating most of the distractions that slow typing speed. This is done by

    ensuring that the words are listed starting with the highest probable word through to the least probable. For

    words with the same keystroke pattern, meaning they have the same probability, the most recent and

    commonly used words will take precedence over seldom used words.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    35/111

    Chapter 3: Disambiguating Systems

    3-5

    3.4.1 MODEL

    A model of a mobile phone shown below in Figure 3-1 is used to demonstrate and show a reduced keyboard

    to be used throughout this project. This figure clearly shows the ambiguity presented by the reduced

    keyboard. This is due to the fact that the key 2 represents not only the number two but the letter a, b as

    well as c. This mean that by the stroke of the key 2, the user could be either entering the letter a, b,

    c or the number 2. As the objective of the project is to correct spellings, hence any numbers will be

    ignored unless they are spelled out in words.

    Figure 3-1: The model of a mobile phone

    The number of keys to be used displayed in Figure 3-1 is nine, being the keys 2 to 9, with allocated

    letters as shown above, and the key 0 to insert a space between words. This will be a confirmation that a

    word has been completed and the error detection and correction in the entry word is to be implemented.

    3.4.2 DICTIONARY

    The most commonly used dictionary words are to be stored in the memory of the phone. People tend to

    purposely reduce words to keep the number of characters send at a minimum to save money and time in the

    process. For the simplicity and better management of the memory space and efficient of the system, the

    following terms will be adhering to when implementing the system:

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    36/111

    Chapter 3: Disambiguating Systems

    3-6

    The dictionary will not be expanded in any way, i.e. if the required word is not in the library it

    cannot be corrected.

    Slang or non-English words will be deemed as unknown or subject to correction by the system.

    The error correction is limited to a single error per word.

    Word completion systems, such as iTap, will not be dealt with in this project.

    3.4.3 CASE SENSETIVITY

    Figure 3-2: The ASCII table and Description

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    37/111

    Chapter 3: Disambiguating Systems

    3-7

    In reality the algorithm of the system sets a fixed distant difference between upper and lower cases of all the

    26 alphabets, normally this distance is 32. This is illustrated in Figure 3-2 which is the Ascii Table and

    description. Looking at the two last columns in this figure, to the bottom, the character z in lower case is

    represented by 122 and the Z in upper case is represented by 90. The difference is 32, which is the same

    for all 26 alphabets, shown in Figure 3-2. The case of the letter of the word presents another ambiguity

    which will not be dealt within this project as the main objective is the correction spelling errors, therefore

    the system will not be sensitive toward the upper and lower cases of the letters in a word, that is A=a and for

    all alphabets.

    3.4.4 TYPES OF KEYBOARDS

    Communication aids are devices developed or specially adapted for people with severe communicationimpairments. There is a wide variety of communication aids because these people have a large variety of

    skills, needs, and problems [13]. Some people with severe motor disabilities can use their hands; others

    cannot, and have to use alternatives, such as mouth-sticks, head-sticks, switches, or eye-pointing devices [2],

    [13].

    In general, most communication aids for people with severe motor disabilities are designed to work with or

    to emulate a keyboard. Switches can be operated using their head, hands, arms, knees, feet, legs, shouldersor any body part over which they have muscular control [2]. Other kinds of switches work by detecting

    movement such as a tilting arm or head, making a sound or breaking a beam of light. It is possible to find a

    special kind of switch called "Sip-Puff device" which works with breath [13].

    There are different types of keyboards that are used for presenting letters and symbols to the user to be able

    to text; these are intended for capable people. The layout of these keyboards is such that the user will find it

    easy and fast to adapt using them

    [8]

    . For typewriters, laptops and portable computers, QWERTY is normallyused and a numeric keyboard is implemented onto mobile phones and most handheld devices. QWERTY

    keyboard is not ambiguous since all letters have their own keys, only numbers and symbols share keys. This

    ambiguity is resolved by the use of a shift and un-shift key. Whereas the numeric keyboard does not have

    enough keys to represent all letters, numbers and symbols separate. Hence all these share the twelve

    available keys, resulting in total ambiguity to the system of message texting.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    38/111

    Chapter 3: Disambiguating Systems

    3-8

    Typing errors on mobile phone keypads can be expected to be more frequent than on QWERTY keyboards.

    The numeric keypads are conducive to typing errors due to their small size and difficult-to-operate keys, due

    to ambiguity [7], [8]. This typing error is more problematic on mobile phones than it is on QWERTY

    keyboard, because a one-keystroke typing error results in a one-letter difference in the displayed text. By

    contrast, when using an ambiguous keyboard, such as the standard mobile phone keypad, one-stroke typing

    error may result in many letters being different in the typed text and result in a completely different word,

    this largely due to the disambiguation system that is implemented then.

    Disambiguation works by using context to choose the letter to display. If the context is altered in one place,

    it can generally affect the letters displayed in many places. Potentially, a single keystroke error can affect the

    entire word. Hence spelling errors are a very serious problem on mobile phones.

    3.4.5 KEYSTROKES PER WORD (KSPW)

    Keystroke per word is a useful metric used for characterising the overall text entry behaviour in a system.

    KSPW is the number of keystrokes, on average, required to produce a word using a given input method. As

    a baseline, consider KSPW = 1[7], [10]

    . This is a reasonable measure for a QWERTY keyboard, because eachletter has a dedicated key. KSPW< 1 is possible, for example, with word prediction techniques. KSPW> 1 is

    likely if the keyboard has fewer keys than symbols in the target language [6], [7].

    3.4.6 ADJUST TIME AND TIMEOUT KILL

    The time to adjust is defined as the time from the first correct keystroke for a character until the character

    was actually obtained through presses of the same key, or presses of the NEXT key. Then there is what is

    known as timeout, this is the time that needs to pass before attempting to press the same key the second time

    consecutively. This contributes negatively to the speed of message texting, hence Timeout kill is there to

    avoid wasting such time, and it is simply the time from the keystroke that produced the correct character to

    correctly pressing the timeout kill key, which is normally the NEXT key.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    39/111

    Chapter 3: Disambiguating Systems

    3-9

    3.5 PREDICTIVETEXTPredictive text is an input technology that is mostly used on mobile phones. The technology allows some

    common words to be entered by a single key press for each letter, as opposed to the multiple key press

    approach used as a standard in mobile phones. For example, a simple word like cook will require eleven

    keystrokes, being: pressing the number 2 key three times, the number 6 key three times and wait for timeout

    and the again three times and finally the number 5 key twice. Whereas the predictive text software such as

    T9, the sequence key numbered 2665 are pressed once and lists of words are displayed, and with one or two

    extra keystrokes the correct word is selected.

    The intent is to simplify the writing of text messages, e-mail, entries into an address book or calendar, etc.

    Theoretically, the number of keystrokes per word, on average, is comparable to using a full, unambiguous

    keyboard, that is KSPW=1, provided that all words used, including all slang, proper nouns, abbreviations,

    URLs, foreign-language words and so on, are in the dictionary, ignoring any symbols and punctuation and

    that no spelling mistakes or typing mistakes are allowed. This is ideal but in practice, however, these factors

    are found to be crucial to speed and accuracy.

    The following are the well known software or methods used to disambiguate mobile phone texting:

    3.5.1 T9

    T9 is an abbreviation for Text on 9 keys. T9 is software that comes preloaded on most mobile phones and

    other mobile devices. This software is meant to aid typing faster and easier on numeric keypads; it unlocks

    the power of mobile phones keypads letting you enter messages faster and easier. This software combines

    the groups of letters found on each phone key with fast access dictionary of words and recognizes what to

    text as you type, the words are offered to the user, for the key sequence the user entered and lets the user

    access other choices with one or more presses of the NEXT key.

    T9 also has an option for users to add in their own word, whether it URL, slang, password or own invented

    words. What happens is that when T9 does not recognize the text word and option is presented to the user to

    ADD or SPELL the word. T9 also recognizes and completes words that were typed in once before.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    40/111

    Chapter 3: Disambiguating Systems

    3-10

    3.5.2 iTAP

    This software was developed as competition to T9. It was designed as a replacement for the old letter

    mappings on phones to help with word entry. This makes some of the modern mobile phones features like

    text messaging and note-taking easier. This system is especially useful for longer words.

    For instance three or more characters are entered in a row, and then iTap would guess the rest of the word.

    For example, entering "prog" will suggest "program". If a different word is desired, such as "progress" or

    words formed with different letters but requiring the same key presses like "prohibited" or "spoil", by

    pressing the NEXT key other words in a menu for selection are shown, in the order of descending

    commonality of their use. A space is automatically inserted after the word. If the phone does not recognize a

    word it then stores the word as an optional choice. When the memory space is filled the phone deletes the

    oldest word to make space for the new word.

    iTap is also able to complete words and phrases. iTap will guess the best match based upon a built in

    dictionary, including words sharing the typed prefix. This dictionary also contains phrases and commonly

    used sentences. This way the predictive guesses iTap offers are enhanced based upon context of the word

    that is being typed.

    3.5.3 MULTITAP

    The Multitap software is the oldest version, and comes as a standard with all mobile phone and handheld

    devices; the user is required to press each key one or more times to specify the desired letter in a specific

    key. For example, the 2 key is pressed once for the letter a, twice for b, three times for c [2], [5], [7]. Beside

    requiring multiple keystrokes for many letters, Multitap requires a mechanism to segment consecutive letters

    on the same key. This is known as timeout, which is already discussed in Section 3.4.6.

  • 8/8/2019 The Disambiguation of Keyboard of Mobile and Error Correction

    41/111

    Chapter 3: Disambiguating Systems

    3-11

    3.5.4 LETTERWISE

    LetterWise was developed to avoid the problems experienced by the above mentioned methods. It is not

    dictionary based; it works with a stored database of probabilities of prefixes [11]. For example, if the user

    presses 3 with prefix th, the most likely next letter is e because the in English is far more probable than

    either thd or thf [7]. The most significant departure is that LetterWise does not use a dictionary of stored

    words. Instead, a priori analysis of a dictionary is used to distil probability information about letter

    sequences in the language. This allows efficient entry of words and, unlike dictionary-based approaches,

    generalizes to non-words. LetterWise occasionally guesses the wrong letter, and in these cases the user must

    press a special NEXT key to choose the next mostly likely letter for the given key and context.

    3.5.5 DISAMBIGUATION FAILURE & MISSPELLING

    As mentioned above, spelling mistakes introduce a bigger problem that result in a list of possible words that

    are far from the intended word, making it difficult for the predictive text software to perform effectively and

    efficiently. A keystroke pattern that is incorrect may result in a list of words and not even one being the

    intended word. This results in disambiguation failure. When mistyping or misspellings occur, the keystrokesequence is very unlikely to be recognized correctly by a disambiguation system.

    3.6 MULTITAP VS LETTERWISE

    A study has been conducted to show the performances of LetterWise and Multitap. The separate effects of tA(time to adjust) and tK( time kill) are shown in Figure 3-3 fo