dua khety paper...dua-khety: hieroglyphics detection, classiﬁcation and transliteration the...

Dua-Khety:

Hieroglyphics Detection, Classification and

Transliteration

The American University in Cairo - Department of Computer

Science and Computer Engineering (CSCE)

Ahmed Agha, Malak Sadek, Mohamed Badreldin, Mohamed Ghoneim

Supervised by Dr. Amr Goneid

20th of May, 2018

1

Contents

1 Introduction 3

2 Related Work 4

3 Dataset 4

4 Implementation 5

4.1 Pre-Processing and Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 54.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.3 The Mobile Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

5 Results 7

6 Conclusion 7

Appendix A iOS Application Screenshots 9

2

Abstract

This paper introduces Dua-Khety; a mobile application available on both Androidand iOS devices that is capable of localizing and segmenting hieroglyphs within imagesand then classifying them and assigning them a Gardiner Code to provide the user withtheir transliteration and meaning. The entire process takes place o✏ine, meaning thatthe service is convenient, fast, lightweight, and can be accessed anywhere at anytime.The application also has other features, such as the ability to browse other users’photographs, and access a hieroglyphics dictionary, to provide an engaging and educativeexperience.

Keywords: Machine Learning, Computer Vision, Mobile Apps, Hieroglyphs, Egyptology

1 Introduction

Egyptian hieroglyphs are one of the mostillusive pictographic languages in the world.Pictographic languages do not form full andproper sentences, rather they use a series ofpictures and symbols to convey the semantics ofthe sentence. This can sometimes be literal (i.e.a symbol of a man refers to a man) and othertimes metaphorical (i.e. some symbols of animalsrefer to certain Egyptian gods) [1]. Hieroglyphsalso present an additional challenge in that thereis no standardized reading order as they can bewritten top to bottom, left to right, or rightto left, and sometimes in multiple orientationswith several overlaps between the symbols.

Figure 1: Examples of hieroglyphs read indi↵erent orders [2]

What’s more is that over the years, most of thetemples and documents where these hieroglyphicscan be found have become eroded, worn down,fractured, or vandalized, making any digitalvisual processing all the more di�cult. Because

of these obstacles, much of the way of life ofthe ancient Egyptians has been lost as no onewas able to properly decipher the meaningsbehind most of their symbols. Even though someadvancements were made [3], they could not trulyunlock the key to Egypt’s past. Eventually, SirAlan Gardiner documented and classified eachhieroglyphic symbol with a code that could thenbe used to search for more information about it’smeaning. This hieroglyphic dictionary ended upbeing thousands of pages long and searching forsymbols became a tedious task, especially as onehad to know their code to search for them. Evenwith the emergence of the internet speeding upmanual searches, there still remained the problemof not knowing a hieroglyphic’s ’Gardiner Code’to be able to search for it.

Figure 2: The contents page for Sir AlanGardiner’s massive hieroglyphic dictionary [4]

To help facilitate the decoding of hieroglyphs and

3

extending that knowledge beyond archaeologistsand field experts and into the hands of the generalpublic, Dua-Khety was created. In ancientEgypt, only very few of the elite citizens couldread and write. As a result, scribes used to readand write letters for common people and it wasconsidered a very prestigious profession. Onescribe named Dua-Khety famously wrote a letterto his son titled The Satire of the Trades [5] wherehe dismisses other professions and talks about thebeauty of being a scribe saying:

“So I would have you love writingmore than your mother and have yourecognize its beauty. For it is greaterthan any profession, and there isnone like it on earth.”

The project honors the importance of Egyptianscribes in preserving their people’s knowledgeand way of life for centuries by naming theapplication in Dua-Khety’s name with the aimof essentially placing a digital Egyptian scribe inpeople’s pockets.

2 Related Work

There are currently no widely accepted modelsfor hieroglyph detection and classification, andnone of them are implemented on mobile devices.The most comprehensive work done is byMorris Franken and Jan Gemert [3] who providea extensive dataset of hieroglyphic symbolsorganized by their Gardner Codes, which isutilized in this project. Their method entailsessentially treating the symbols as text to localizeindividual symbols and using di↵erent methodssuch as the bag-of words (BOG) approach [6] anda combination of HOG [7] and Shape-Context[8] techniques to classify them. Their datasetis also the most realistic when compared tothe noisy and irregular real-life context thathieroglyphs are found in, as opposed to otherpapers who manually extract and clean individualhieroglyphs [7–9]. Similar work was doneby Mark-Jan Nederhof at the University ofSt Andrews [10] who uses optical characterrecognition (OCR) on handwritten transcriptions

of hieroglyphs, essentially treating tehm as textas well, instead of using the hieroglyphs in theirnatural form. While there is a large amountof work done on Western and Asian languages[11–14], what little work exists on pictographiclanguages is usually focused on Meso-Americanand Mayan languages as opposed to Egyptianhieroglyphs [6–9]. These papers also mainly relyon computer-vision techniques such as GradientField Histograms and Histograms of OrientationShape Context (HOSC).

The project thus presents a new approach tolocalizing and detecting hieroglyphics, using amixture of computer vision and machine learningtechniques by combining pre-processing and edgedetection with a neural network that is then usedto classify the hieroglyphics. The entire processoccurs o✏ine on a mobile device, introducing atruly novel application.

3 Dataset

The dataset used was provided by Morris Frankenand Jan C. van Gemert from the Universityof Amsterdam [3]. It contains 4,210 manuallyannotated images of Egyptian hieroglyphs foundin the Pyramid of Unas in Egypt. Withinthe dataset there are around 200 di↵erentclasses (e.g. humans, flowers, etc.) with thenumber of images per class varying significantlyfrom 1 to 300 images. This was undesirablefor the classification step as the dataset wasseverely unbalanced due to the varying frequencyof appearance across di↵erent symbols. Toovercome this, only the 22 largest classes wereused. The number of images per class stillvaried but capped at the minimum of around 10pictures.

Figure 4: Example symbols from the dataset

4

4 Implementation

4.1 Pre-Processing and Segmentation

OpenCV was used to to pre-process and segmentthe images taken by users, isolating individualhieroglyphs. Firstly the images are converted togray-scale and blurred with a radius of three,which was found to have a better result atreducing noise than various types of filters.Following this, the image pixel average iscomputed and it is thresholded with a minimumvalue of the computed average and a maximumvalue of 255. Afterwards, Canny Edge Detectionis applied with thresholds of:

0.66 ⇤ imageAverage

and

1.33 ⇤ imageAverage

as well as an aperture size of three. Thesevalues were found to be optimal throughexperimentation to balance between removingedges that cause noise and avoiding rendering theshapes indistinguishable.

The image is then diluted and eroded, and isthen segmented based on vertical columns orhorizontal rows to be able to determine theorientation of the hieroglyphics by using HoughLinear Transform to detect the largest lines(which separate rows or columns).

The final step is to use a Connected ComponentsList to create bounding boxes around individualhieroglyphics based on the determined order.These bounding boxes are placed on the originalimage taken and used to crop out and extractthe individual symbols for classification andpresentation to the user. These cropped symbolsfrom the original image are only turned intoblack and white and thresholded in the sameway (without the Canny Edge Detection as thissignificantly lowers the classification accuracy).

Figure 6: A summary of the pre-processingand segmentation steps

4.2 Classification

As a first step the Xception network wasused, however, it appeared that di↵erent classesneeded di↵erent weights matrices as some classesrequired classifiers that pay close attention todetails, while others only needed to consider theouter shape. For example, class G symbols (whichrepresent birds) achieved a 69% accuracy withXception, as it needed to search for small detailsto di↵erentiate classes. On the other hand, classN symbols (which represent elements like the sunmoon, and sea) have much less details and did nothave a good validation accuracy with Xception.Those symbols reached 65% with a CNN of 2layers however (using maxpooling, followed bydropout, then flattening the output, and using1 dense layer of 128 neurons, followed by anotherdropout, and a final dense later who’s neurons arethe same number of classes to be distinguished).As there is no way of knowing which class ahieroglyphic in a picture taken by the user is in,CNNs were used instead of Xception.

The initial classifier used consists of a 32 matrixconvolutional layer with a kernel size of 3x3 andRelu as an activation function followed by anidentical 64 matrix convolutional layer. This isfollowed by max pooling with a pool size of 2x2and a drop out of 0.25. The model is thenflattened and this is followed by a dense layer40 with a Relu activation function that is alsoflattened. Finally, a dense layer 22 representingthe 22 classes that sir Gardiner has used tocategorize hieroglyphics is used. This layer’s

5

activation function is Soft Max and it uses anRMP Prop optimizer.32 Matrix Convolutional LayerKernal Size 3x3Activation Function Relu64 Matrix Convolutional LayerKernal Size 3x3Activation Function ReluMax PoolingPool Size 2x2Drop OutValue 0.25Flatten ModelDense Layer 40 (Flattened)Activation Function ReluDrop OutValue 0.5Dense Layer (22 vs. 162)Activation Function Soft MaxOptimizer RMS Prop

Figure 7: Summary of the initial machinelearning model

The resulting .h5 file was optimized and frozento be converted into a .pb file that is then usedwithin the Tensorflow framework on Androiddevices, and is also fed into the CoreMLconversion tools to fit within the CoreMLframework on iOS devices. The classifier acceptsimages of dimensions 150x150 (any incomingimages from the application are resized to thesedimensions). Initially, classifier results were poor,

however, they swiftly improved after binarizingthe images before feeding them to the classifier.While the following results were achieving around70% accuracy, it was found that the trainedmodel was over-fitting on both the training andtesting data. After looking for the root of theproblem, it was found to be the dataset whichwas extracted from one place (the Pyramid ofUnas) in one go from one person using one deviceand was in fact captured from a book of images.The testing data was simply not diverse enoughas the aim of the system is to classify hieroglyphsin di↵erent temples and on di↵erent stones.

Moving forward, hieroglyphic scriptures from the

Louvre, the British Museum, and the Museum ofAlexandria were obtained as new testing data, asthey proved to be more diverse. In addition tothis, the previous dataset from the Pyramid ofUnas was also augmented and cleaned to removemisleading symbols. Augmentation used a shearrange of 0.2, zoom range of 0.2, and rotationrange of 0.2.

With some research, it was found that SiameseNeural Networks would be more suitable to solvethe problem at hand. These networks di↵er fromnormal ones as they take pairs of images as aninput, as well as a label representing whetherthose images are from the same class or not(displayed as a 0 or 1). In other words, half ofwhat is fed to the network are pairs of imagesof the same classes with their label as 1, andthe other half, are two di↵erent images from twodi↵erent classes, with their label as 0. The imagesare chosen randomly. The model consists of fourseparable convolutional layers with varying kernelsizes and ‘relu’ as an activation function with MaxPooling between each one. The model is flattenedat the end to get the features at that stage. Fortraining, two of these exact layers are merged andanother dense layer is added at the end with oneoutput to represent whether the pairs of imagesare from the same class or not, that uses ‘sigmoid’as an activation function.

Figure 9: The architecture of the SiameseNetwork used

6

4.3 The Mobile Applications

Applications were created for both Androiddevices (using Java) and iOS devices (using Swiftand Objective-C). The appropriate OpenCVlibraries were used for both applications andCoreML was used to integrate the machinelearning model into the iOS application, whileTensorflow was used for the Android application.The iOS version of the application was uploadedto Apple’s App Store (view here), while theAndroid version was uploaded to Google’s Playstore (view here).

The applications also o↵er a number of otherfeatures. Users can look at their own searchhistory or browse a social feed showing otherusers’ recent photographs (which they can thenanalyze on their own device). A symboldictionary provides users with a digitized versionof Sir Gardner’s dictionary which they can browsethrough or use to search for specific symbols.Users can also send new pictures they take to thedevelopment team in order to gradually increasethe size of the dataset thus improving the qualityof the classifier through crowd-sourcing. All thefeatures except for the social feed fully worko✏ine. This was an important goal for the projectas most temples, tombs, etc. are out in the desertwhere there is no proper cellular connection orWiFi. Firebase was used for storing the databaserequired to show the symbol dictionary, as well asuser submissions for their history and social feed.It was also used for account management (whichwas optional for transliterating, but necessary toaccess the social feed).

Finally, after taking a picture, the applicationpresents the user with a list of the hieroglyphicsfound within it. When the user selects onesymbol, they can see its Gardner code, itsliteral meaning, its semantic meaning, and linksfor more information. This is considered atransliteration as opposed to a translation ashieroglyphics are not grammar-based but ratherwork by stringing together words to form largersemantic contexts (as mentioned earlier in thepaper) [1]. Screenshots of the iOS application’suser interface can be found in Appendix A.

5 Results

Classification began at 2% accuracy when usinga deep layer network with 100+ layers, andusing the raw images provided by Frankenand Gemet. Binarizing the images increasedthe accuracy to 10% as the dataset becamegray-scale but the actual images taken from theapplication were still colored. Augmenting theimages in the dataset increased accuracy by afurther 3%. Using a smaller neural networkincreased the accuracy to 55% and increasingthe kernel size to 30x30 reached an accuracyof 70% however it su↵ered from over-fitting.The final classification accuracy reached withSiamese Networks was a 66% accuracy whenpredicting the top match, and an 82%accuracy when predicting the top fivematches.

Figure 11: A graph of the classifier’s accuracyas the project progressed

6 Conclusion

Dua-Khety has been presented as an iOS andAndroid mobile application capable of detecting,segmenting, and classifying hieroglyphicscompletely o✏ine. Providing other featuresalso makes the applications more comprehensiveand engaging. By utilizing computer-visionpre-processing and segmentation techniques,as well as a Siamese Neural Network forclassification, the application presents a novelway of solving a mystery as old as civilizationitself.

7

References

[1] M. Halliday, “Spoken and written language,” Deakin University Press, p.19, 1985.

[2] B. d’Alexandrıa, “Hieroglyphs step by step.” https://www.bibalex.org/learnhieroglyphs/lesson/LessonDetails_En.aspx?l=57. Accessed: 2018-02-30.

[3] M. Franken and J. C. van Gemert, “Automatic egyptian hieroglyph recognition byretrieving images as texts,” in MM ’13, 2013.

[4] A. Gardiner, Egyptian Hieroglyphic Dictionary. John Murray, London, 1920.

[5] J. Mark, “The satire of trades.” https://www.ancient.eu/article/1074/the-satire-of-the-trades/. Accessed: 2018-03-02.

[6] G. C. O. J. G.-P. D. Roman-Rangel, E., “Searching the past: an improved shapedescriptor to retrieve maya hieroglyphs,” ACM MM, 2011.

[7] T. B. Dalal, N., “Histograms of oriented gradients for human detection,” INCVPR, 2005.

[8] M. J. Belongie, S., “Shape matching and object recognition using shape contexts,”TPAMI, 24(4), 2002.

[9] O. B. E. Frauela Y., Quesadab, “Detection of a mesoamerican symbol using a rule-basedapproach.,” Pattern Recognition, 39, 2006.

[10] M.-J. Nederhof, “Ocr of handwritten transcriptions of ancient egyptian hieroglyphictext.,” In Altertumswissenschaften in a Digital Age: Egyptology, Papyrology and beyond,

proceedings of a conference and workshop in Leipzig, 2015.

[11] O. E. W. Y. Epshtein, B., “Detecting text in natural scenes with stroke width transform.,”InCVPR, 2963-2970, 2010.

[12] G. J. G. T. Karaoglu, S., “Object reading: Text recognition for object recognition,”InECCV-IFCVCR Workshop, 2012.

[13] G. J. G. T. Karaoglu, S., “Con-text:text detection using background connectivity forfine-grained object classification,” ACM-MM, 2013.

[14] T. K. T. S. M. Y. Kimura, F., “Modified quadratic discriminant functions and theapplication to chinese character recognition.,” PAMI, 1987.

8

Appendix A iOS Application Screenshots

(a) Main Menu(b) Symbol Dictionary

9

(a) Symbols recognized in photo (b) Analyzing Progress Screen

10

(a) Social Feed Screen (b) Symbol Prediction Screen Result

11

dua khety paper...dua-khety: hieroglyphics detection, classiﬁcation and transliteration the...

Documents