string searching and matching

14
Data Engineering and Cloud Computing Department Reva University Novel Approach for String Searching and Matching using American Standard Code for Information Interchange Value

Upload: -

Post on 11-Apr-2017

30 views

Category:

Engineering


3 download

TRANSCRIPT

Page 1: String Searching and Matching

Data Engineering and Cloud Computing DepartmentReva University

Novel Approach for String Searching and Matching using American Standard Code for Information Interchange Value

Page 2: String Searching and Matching

Click icon to add picture

OutlineAbstract IntroductionHash TableString Matching Algorithm Design ProposedAlgorithm of the Proposed WorkResults and DiscussionConclusionReference

Page 3: String Searching and Matching

The algorithms based on string matching generally searches for the search string in the database and find all the occurrences of the search string. This paper introduces a novel approach for string searching and matching to identify the correct occurrence of a given search string. The proposed work is based on calculating the sum of ASCII values of each character in the search string and comparing this sum with only the names which have the same ASCII values in the database. This is implemented using hashing and hence it can limit the search to only a few name strings. After searching for the corresponding ASCII value the string matching is done by comparing the first and last character of the search string with the name strings. If there is a match, then any two random positions in the search string are considered for comparison. If all these four positions match, then the whole string is compared, otherwise the string is skipped from further comparisons. This method is efficient in identifying the search string easily and the number of comparisons is reduced.

Umma Khatuna Jannat

Abstract

Page 4: String Searching and Matching

Introduction

In general, string searching and matching form an important class of algorithms that tries to locate the position of occurrence of patterns that may be occur in a bigger text or string.

String matching is a basic and important research subject in computer science, which plays a crucial role in text processing. String-matching algorithms are implemented in a large number of software applications. They generally help in finding all occurrences of a pattern present in a text. String matching algorithms generally helps in finding one or all occurrences of a search string, also called as a pattern, in an input string. Multiple pattern matching is a one in which more than one search string is simultaneously matched against the text or otherwise it is called single pattern matching.

Umma Khatuna Jannat

Page 5: String Searching and Matching

Hash Table A hash table is a data structure that stores elements and allows insertions, lookups, and

deletions to be performed

A hash table is an alternative method for representing a dictionary

In a hash table, a hash function is used to map keys into positions in a table. This act is called hashing

Hash Table Operations Search: compute f(k) and see if a pair exists Insert: compute f(k) and place it in that position Delete: compute f(k) and delete the pair in that position

In ideal situation, hash table search, insert or delete

Umma Khatuna Jannat

Page 6: String Searching and Matching

Hash Table

Umma Khatuna Jannat

Page 7: String Searching and Matching

String MatchingString matching algorithms, are an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger string or text.

Umma Khatuna Jannat

Page 8: String Searching and Matching

Algorithm Design Proposed

Umma Khatuna Jannat Figure 1 Work Flow diagram

Page 9: String Searching and Matching

Algorithm of the Proposed Work

1) Start2) Store the input Search String in ‘s’3) Ss.to lowercase()4) Get the ascii of each character in the search string and compute their sum.5) Using modulo hash function navigate to the corresponding ASCII index.6) FgetFirstcharacter(S)7) Using modulo hash function go to the index ‘F’8) Lgetlastcharacter(S)9) Match L with the last character of the name string in that block. If true, then do steps 10-12 else skip that string and go onto the next string in that block10) Get 2 random positions,Mgetmthcharacter(S),Ngetnthcharacter(S )11) If s[M]=namestring[M] and S[N]=namestring[N] is true12) Then compare all places of the string excluding positions FLMN.13) Else go to step 8 and continue the process until a match is found or until the end of the block14) StopUmma Khatuna Jannat

Page 10: String Searching and Matching

Umma Khatuna Jannat

This algorithm is implemented using a database with 5500 names. This algorithm reduces the number of comparisons by a large number. So strings out of 5500 to find the correct match, which is 0.218% of 5500.

Results and Discussion

Figure 2: Performance with various number of name strings

Page 11: String Searching and Matching

Umma Khatuna Jannat

In Figure 3, Large, medium and small represent the length of the search string used and the chart represents the maximum number of comparisons to be made using various algorithms.

Results and Discussion

Figure 3: Boyer Moore, Brute Force and the proposed algorithm based on the number of comparison operations.

Page 12: String Searching and Matching

ConclusionIn the work a novel approach for string searching and matching is proposed based on the American Standard Code for Information Interchange value. The number of comparisons are reduced significantly, to a maximum value of the worst case and the best case. It has a maximum time complexity .In the future this algorithm can be enhanced by dividing the names in the block into four quarters and perform the match so that we can reduce the factor .

Umma Khatuna Jannat

Page 13: String Searching and Matching

Reference [1] A. Hume and D. Sunday, “Fast String Searching,” Journal of Software: Practice and Experience , Vol. 21, No.11, pp.1221-1248, 1991.[2] A.M. Alshahrani and M.I. Khalil, “Exact and Like String Matching algorithm for Web and Networks Security”, Computer and Information Technology(WCCIT), pp.1-4,2013.[3] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, “Introduction to Algorithms".[4] R.S.Boyer, J.S.Moore,"A Fast String Searching Algorithm.,"Comm. ACM , Vol.20 , No 10, pp. 762-772 , 1977.[5] D.Knuth,J.H.Morris, V.Pratt, "Fast Pattern Matching in Strings,". SIAM Journal on Computing, Vol.6 , No.2 , pp.323-350 , 1977.[6] R.Cole, "Tight Bounds on the Complexity of the Boyer-Moore String Matching Algorithm," Proc. ACM-SIAM symposium on Discrete algorithms,pp.224-233, 1991.[7] Z. Galil, "On Improving the Worst-case Running Time of the Boyer-Moore String Matching Algorithm". Comm. ACM ,Vol.22, No.9, pp.505-508, 1979[8] V.Gupta ,M.Singh and K.B.Vinod,”Pattern Matching Algorithms for Intrusion Detection and Prevention System: A Comparative Analysis ,” Intl.Conf.Advances in Computing, Communications and Informatics, pp.50-54,2014.

Page 14: String Searching and Matching

Thank you!