improving text recognition in tor darknet with...

7
Improving Text Recognition in Tor darknet with Rectification and Super-Resolution techniques Pablo Blanco-Medina 1,2 , Eduardo Fidalgo 1,2 , Enrique Alegre 1,2 , Francisco J ´ nez-Martino 1,2 1 Universidad de Le´ on, Espa ˜ na, {pblanm, efidf, ealeg, fjanm}@unileon.es 2 Researcher at INCIBE (Spanish National Cybersecurity Institute), Le ´ on, Spain Keywords: Text Spotting, Text Recognition, Super- Resolution, Tor Darknet. Abstract Text recognition can be used to retrieve textual information em- bedded in images. This task can be complex due to the low- resolution of the images and the orientation of the text, which are problems commonly found in Tor darknet images. In this work, we combine three different super-resolution algorithms, together with a rectification network, to increase the perfor- mance in the text recognition step. We evaluated these com- binations in four state-of-the-art datasets, and in TOICO-1K, a Tor-based image dataset which was semi-automatically la- belled for the task of Text Spotting in Tor darknet. We obtained the highest performance increase in ICDAR 2015 dataset, with an improvement of 3.77% when combining Residual Dense and the rectification networks. In TOICO-1K, we achieved a 3.41% of improvement when we combined Deep CNN and the rectification network. Our conclusion is that rectification per- forms slightly better than super-resolution when they are ap- plied standalone, while their combination obtains the best re- sults in the datasets evaluated. 1 Introduction The concept of darknet refers to those networks from the Dark Web that require the use of specific browsers in order to be accessed. The Onion Router (Tor) network is one such darknet inside of the Dark Web, offering anonymity to its users thanks to its layered domains, also known as hidden services. Due to this high level of privacy and anonymity, Tor is commonly used by journalists to protect their sources, IT professionals to test the security of their networks and other users that wish to maintain their anonymity. However, Tor darknet also hosts suspicious content and ser- vices, like traders that want to sell different kinds of unregu- lated products. Al-Nabki et al. [1] concluded that 29% of the active domains crawled from Tor darknet, during their study, contained different kinds of suspicious or potentially illegal ac- tivities. Figure 1 shows four images showing part of the con- tent that could be found on Tor darknet, where it can be easily found domains with weapons and drugs selling, Child Sexual Exploitation Material [2] or personal identification counterfeit- ing. Figure 1. Images crawled from Tor darknet. Samples from a weapon dismantled (a), fake id (b), fake money (c) and credit cards (d). Literature presents different efforts to monitor Tor domains, and extract their insights, to supervise suspicious activities, of- fering solutions from Natural Language Processing [3, 4], to Computer Vision [5, 6] or Graph Theory [7, 8]. While some of the Computer Vision approaches on Tor darknet work with im- ages [9], they omit the analysis of the text found within them, which could be an important source of information. In Tor, depending on the context or activity hosted in the hidden ser- vice, images could contain from drugs to seller names, among others. Figure 1 illustrates names of different sellers. Text Spotting, a task that combines text detection and recognition to transcribe information found within images could be an approach to retrieve this information. To the best of our knowledge, there are only two works which focus on the application of Text Spotting to Tor darknet. In the first, Blanco et al. [10] obtained 57% F-Measure in the text detection task, but their performance on Text Recognition with state-of-the- art techniques was very much lower. In their next work [11], they used dictionaries and string-matching techniques, improv- ing the precision in the recognition stage to 39.70%. The lower score obtained in their works [10, 11] could be attributed to different factors, such as color distribution, bright- ness, and partial occlusion. Figure 2 illustrates some common

Upload: others

Post on 10-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving Text Recognition in Tor darknet with ...gvis.unileon.es/wp-content/uploads/2020/01/PREPRINT-ICDP.pdf · maintain their anonymity. However, Tor darknet also hosts suspicious

Improving Text Recognition in Tor darknet with Rectification andSuper-Resolution techniques

Pablo Blanco-Medina1,2, Eduardo Fidalgo1,2, Enrique Alegre1,2, Francisco Janez-Martino1,2

1Universidad de Leon, Espana, {pblanm, efidf, ealeg, fjanm}@unileon.es2Researcher at INCIBE (Spanish National Cybersecurity Institute), Leon, Spain

Keywords: Text Spotting, Text Recognition, Super-Resolution, Tor Darknet.

Abstract

Text recognition can be used to retrieve textual information em-bedded in images. This task can be complex due to the low-resolution of the images and the orientation of the text, whichare problems commonly found in Tor darknet images. In thiswork, we combine three different super-resolution algorithms,together with a rectification network, to increase the perfor-mance in the text recognition step. We evaluated these com-binations in four state-of-the-art datasets, and in TOICO-1K,a Tor-based image dataset which was semi-automatically la-belled for the task of Text Spotting in Tor darknet. We obtainedthe highest performance increase in ICDAR 2015 dataset, withan improvement of 3.77% when combining Residual Denseand the rectification networks. In TOICO-1K, we achieved a3.41% of improvement when we combined Deep CNN and therectification network. Our conclusion is that rectification per-forms slightly better than super-resolution when they are ap-plied standalone, while their combination obtains the best re-sults in the datasets evaluated.

1 Introduction

The concept of darknet refers to those networks from the DarkWeb that require the use of specific browsers in order to beaccessed. The Onion Router (Tor) network is one such darknetinside of the Dark Web, offering anonymity to its users thanksto its layered domains, also known as hidden services. Dueto this high level of privacy and anonymity, Tor is commonlyused by journalists to protect their sources, IT professionals totest the security of their networks and other users that wish tomaintain their anonymity.

However, Tor darknet also hosts suspicious content and ser-vices, like traders that want to sell different kinds of unregu-lated products. Al-Nabki et al. [1] concluded that 29% of theactive domains crawled from Tor darknet, during their study,contained different kinds of suspicious or potentially illegal ac-tivities. Figure 1 shows four images showing part of the con-tent that could be found on Tor darknet, where it can be easilyfound domains with weapons and drugs selling, Child SexualExploitation Material [2] or personal identification counterfeit-

ing.

Figure 1. Images crawled from Tor darknet. Samples from aweapon dismantled (a), fake id (b), fake money (c) and creditcards (d).

Literature presents different efforts to monitor Tor domains,and extract their insights, to supervise suspicious activities, of-fering solutions from Natural Language Processing [3, 4], toComputer Vision [5, 6] or Graph Theory [7, 8]. While some ofthe Computer Vision approaches on Tor darknet work with im-ages [9], they omit the analysis of the text found within them,which could be an important source of information. In Tor,depending on the context or activity hosted in the hidden ser-vice, images could contain from drugs to seller names, amongothers. Figure 1 illustrates names of different sellers.

Text Spotting, a task that combines text detection andrecognition to transcribe information found within imagescould be an approach to retrieve this information. To the bestof our knowledge, there are only two works which focus on theapplication of Text Spotting to Tor darknet. In the first, Blancoet al. [10] obtained 57% F-Measure in the text detection task,but their performance on Text Recognition with state-of-the-art techniques was very much lower. In their next work [11],they used dictionaries and string-matching techniques, improv-ing the precision in the recognition stage to 39.70%.

The lower score obtained in their works [10, 11] could beattributed to different factors, such as color distribution, bright-ness, and partial occlusion. Figure 2 illustrates some common

Page 2: Improving Text Recognition in Tor darknet with ...gvis.unileon.es/wp-content/uploads/2020/01/PREPRINT-ICDP.pdf · maintain their anonymity. However, Tor darknet also hosts suspicious

Figure 2. Common problems found in Tor images. Orientation(left, up-middle and right) and low-resolution (middle)

images with these types of problems. Although these are usualproblems, in the Tor-based image context, low resolution andorientation are the most frequent ones. Orientation is very com-mon to fit several documents in a single picture, while low res-olution can be worsened after taking crops of the images fol-lowing the detection stage.

In this work, we continue working on the application ofText Spotting to Tor darknet [10, 11], and we focus on improv-ing the text recognition performance over not-horizontal andlow resolution text. To improve the former, we use rectifica-tion [12, 13], which helps correct an image’s orientation to re-duce transcription mismatches. For the latter, we selected threestate-of-the-art super-resolution techniques, which can enhancethe original quality of the images. We combined both rectifi-cation and super-resolution techniques with a well-known textrecognition algorithm, ASTER [12], trying to increasing theaccuracy of the transcription. We evaluated the influence ofthese two approaches, i.e. rectification and super-resolution, infour state-of-the-art datasets, and in TOICO-1K, a Tor-basedimage dataset we release and make publicly available.

The rest of the paper is organized as follows. Section 2presents a brief summary of text recognition and text-basedsuper-resolution approaches. Section 3 presents the selectedmethods and the datasets used, while Section 4 details ourexperiments and discusses the obtained results. Finally, wepresent our conclusions and future lines of work to further im-prove text recognition.

2 Related Works

Text Spotting is a task that can be used in in different appli-cations, such as automatic navigation and document analysis.Researchers have studied the most relevant problems in bothdetection and recognition, with varying orientation and the seg-mentation step as the most common problems [14, 15].

Currently, most approaches focus on the combination ofConvolutional Neural Networks with Recurrent Neural Net-works. Shi et al. [16] proposed the first end-to-end recognitionsystem based on this combination, creating a framework thatcombined sequence modeling, feature extraction and transcrip-tion. This proposal was able to handle sequences of arbitrarylength on a smaller model while achieving good performancein both lexicon and non-lexicon based recognition.

The combination of both text detection and text recognitioninto a single system is also a strong focus of research. Liuet al. [17] proposed a unified trainable network that performsbetter than separate approaches due to sharing features betweenboth stages using a RoiRotate operator, reducing computationalcost. While these approaches often achieve good performanceon most datasets, they have difficulties treating irregular text,most notably in the recognition stage. Recently, this kind oftext has been of interest to both detection and recognition tasksdue to its varying sizes, patterns and shapes. Text can alsoappear inside images with low resolution, which could cause aperformance drop.

Two approaches can be found in order to enhance the recog-nition step; the bottom-up and the top-down approaches [13].The former attempts to search for each separate character be-fore joining them and transcribing the complete sequence [18],while the latter focuses on matching the different shapes of thetext and correct orientation and size differences before tran-scribing it. Luo et al. [13] followed a top-down approach bydeveloping a rectification network combined with an attention-based sequence recognition network. This rectification net-work helps correct orientation on distorted images, reducingthe impact of irregular text in text recognition. This approachis similar to the one found in [12], but without the use of abidirectional decoder to improve transcription accuracy.

Alongside irregular text, images with low resolution canpose challenges for text recognition approaches. The goal ofSingle Image Super-Resolution (SISR) techniques is to createan enhanced version of a low-resolution input image. Thistask is used in a wide variety of fields, most notably surveil-lance with identification purposes as well as medical imaging.The resulting high quality images can be used as input for textrecognition algorithms, improving transcription results.

The ICDAR 2015 Competition on Text Image Super-Resolution [19] reported an improvement of over 15% whenusing Optical Character Recognition on super-resolution en-hanced images. The highest scoring method, SRCNN [20],focused on a deep convolutional neural network approach. Itextracts batches from the original image after applying bi-cubic interpolation and represents them as feature maps thatare matched into each other, representing the high-resolutionbatch. Following this approach, super-resolution have focusedon improving performance increasing network depth with moreconvolutional layers or reducing the architectures complexityfor real-time applications [21].

However, these approaches often miss high-frequency im-age details. Recent works have focused on obtaining a moreaccurate match between the original image and the super res-olution variant, by implementing sub-pixel convolutional lay-ers or using residual learning [22]. The SRGAN method [23]proposes the use of generative adversarial networks (GAN),implemented with a deep residual network alongside skip-connection. The resulting work is able to generate photorre-alistic images of up to 4x scaling, establishing a new state-of-the-art reference.

Page 3: Improving Text Recognition in Tor darknet with ...gvis.unileon.es/wp-content/uploads/2020/01/PREPRINT-ICDP.pdf · maintain their anonymity. However, Tor darknet also hosts suspicious

3 Methodology

3.1 ASTER’s Rectification Network

We selected ASTER [12] due to its state-of-the-art perfor-mance as well as the proposed rectification network, which al-lows to correct an image’s orientation before performing textrecognition. This technique is relevant in the context of Tor-based images, due to the common presence of multiple ori-ented items in the same picture used to express quantity. Fur-thermore, this algorithm can also complement text detectors bycorrecting the area obtained from the detection stage in textspotting systems, making it suitable for creating end-to-endsystems.

ASTER is composed of two stages: the rectification and therecognition network. When an image is fed to the algorithm,it is rectified using the Thin-Plate-Spline (TPS) transformation[24], which rectifies oriented and perspective text. After therectified image is obtained, the recognition network processesit and outputs a character sequence. This output is generatedusing a bidirectional decoder in order to improve transcrip-tion accuracy, choosing as the final result the sequence withthe highest recognition score.

3.2 Super-Resolution Approaches

Because of the high presence of low-resolution images inour Tor-based dataset, we selected three super-resolution ap-proaches to try to improve ASTER’s performance: (i) Resid-ual Dense Networks (RDN) [22], (ii) Deep CNN with SkipConnection (DCSCN) [21] and (iii) Neural Enhance (NE). Wechose RDN for its focus on exploiting the hierarchical featuresfrom all convolutional layers, which is valuable on images withdifferently scaled objects and aspect ratios, DCSCN due to itsfocus on building a smaller architecture that can lower the com-putational cost, and NE due to its four different models to en-hance images with.

Residual Dense Networks [22] propose the use of hierar-chical features using residual dense blocks. This structure ismade of densely connected layers combined with local featurefusion and local residual learning. The result is a contiguousmemory mechanism by acquiring the state of preceding blocksto each layer of the current one. After extracting these localfeatures, dense feature fusion is used to process hierarchicalglobal features before upscaling the final. Combining both lo-cal and global features, the proposed structure obtains the final,high-resolution image after upscaling.

The Deep CNN with Skip Connection algorithm [21] pro-poses a fully convolutional neural network approach in orderto decrease power consumption and processing time, focusingon a smaller model with faster and more efficient computationthat is suitable for less powerful systems. It is comprised ofboth a feature extraction and a reconstruction network. Theextraction network obtains both local and global features us-ing a cascade of CNNs combined with Skip connection layers,with a decreased number when compared to other similar ap-proaches. After joining all of the features, DCSCN uses par-allelized CNNs in order to reduce the input dimension before

creating the enhanced image.Lastly, the open-source approach called Neural Enhance

(https://github.com/alexjc/neural-enhance) is based on multi-ple super-resolution techniques attempting to combine all ofthem in a single implementation. This approach also allows forthree different enhance modes, repair, deblur and default, withdefault allowing the possibility for x2 and x4 scaling to ob-tain higher quality images. Combining all of the chosen super-resolution methods, we obtain a total of six possible configura-tions for resolution enhancement.

3.3 Datasets

To test both the rectification and super-resolution approaches,we used four state-of-the-art datasets and TOICO-1K, a customdataset manually labelled for the tasks of Text Detection andText Recognition on images crawled from Tor darknet.

SVT [25] dataset comprises images from Google’s StreetView, containing 647 text regions. From the InternationalConference on Document Analysis and Recognition (ICDAR)competition, we selected the ICDAR 2013 [26] and 2015 [27]datasets, which hold 1093 and 2096 images respectively. Fi-nally, we selected the IIIT5K-Words [28] dataset, which con-tains a total of 3000 cropped regions from both scene text andborn-digital images.

3.4 TOICO-1K

To demonstrate the effectiveness of our approach, we used ourown custom dataset, [10], we named in this paper TOICO-1K and we make it publicly available in our group website(http://gvis.unileon.es/dataset/tor images in context-1k/).

The process of creation was semi-automatic. We took 101images from the TOIC dataset [29, 30], and we generate a firstversion of TOICO-1K using a text spotting approach, separat-ing both detection and recognition stages. We used the formerto assist the task of text region labelling and generate most ofthe text bounding boxes, and the latter to obtain an initial tran-scription of the text detected. Then, we manually inspect the

Figure 3. TOICO-1K sample images. Original image (a),cropped text regions (b) and labelling examples (c).

Page 4: Improving Text Recognition in Tor darknet with ...gvis.unileon.es/wp-content/uploads/2020/01/PREPRINT-ICDP.pdf · maintain their anonymity. However, Tor darknet also hosts suspicious

101 images, and we (i) added missing text regions not auto-matically detected, (ii) correct the predictions made by the textrecognizer algorithm and (iii) added additional information pertext region inspired by [26, 27]. We exported all the informa-tion to a json file, which contains: the type of text found (hand-written or machine-based), whether or not the text is legible,language, number of text regions per image, image dimensionsand bounding box locations.

The resulting dataset consists of 1101 documented text re-gions. For our experimentation, we only chose the croppedareas labelled as ”legible”, reducing the quantity of croppedregions to 675.

4 Experimental Results and Discussion

4.1 Experimental Setup

We evaluated all the methods on an Intel Xeon E5 v3 computerwith 128GB of RAM using an NVIDIA Titan Xp GPU. Wemeasured text recognition performance using the accuracy met-ric, which is the percentage of correctly matched recognizedwords to the documented labels of each cropped image.

As a baseline, we used ASTER with the rectification net-work disabled, and without using lexicons. Then, we separatedthe text regions that were not correctly recognized by ASTER

using the baseline configuration, and we enhanced them withsuper-resolution and rectification. Finally, we took the super-resolution enhanced images and we feed them into ASTERwith two different configurations, with and without the rectifi-cation network enabled, obtaining the final transcriptions. Forthe NE method, we used each separate technique (repair, deblurand default) without combining them.

4.2 Results and Discussion

The recognition results on each dataset by the proposed meth-ods and their combinations is summarized in Table 1. The im-provements achieved by each approach can be seen summa-rized in Table 2.

We found that both super-resolution and rectification tech-niques are very close in terms of performance, with rectifi-cation obtaining slightly higher results than super-resolution.Out of all the possible combinations, when both techniques areapplied separate, rectification outperforms all super-resolutionapproaches in every dataset except in two particular cases. Inthe ICDAR 2015 dataset, RDN and both Neural Enhance De-fault modes obtain higher results than rectification. Under therepair configuration, Neural Enhance also outperforms rectifi-cation on the IIIT5K dataset.

Combining both rectification and super-resolution im-

Method TOICO-1K SVT IC15 IC13 IIIT5KASTER (Baseline) 0.2830 0.8825 0.7235 0.8829 0.8413ASTER (Baseline) + Rectification 0.3052 0.9042 0.7424 0.8984 0.8540RDN 0.2993 0.8934 0.7453 0.8957 0.8483RDN + Rectification 0.3096 0.9042 0.7612 0.9122 0.8557DCSCN 0.3037 0.8995 0.7414 0.8930 0.8470DCSCN + Rectification 0.3170 0.9104 0.7574 0.9076 0.8560NE Repair 0.2933 0.8964 0.7351 0.8939 0.8543NE Repair + Rectification 0.3007 0.9042 0.7487 0.9021 0.8620NE Deblur 0.2933 0.9011 0.7380 0.8911 0.8540NE Deblur + Rectification 0.3007 0.9057 0.7477 0.9039 0.8613NE Default x2 0.2993 0.8964 0.7453 0.8939 0.8477NE Default x2 + Rectification 0.3126 0.9042 0.7598 0.9058 0.8580NE Default x4 0.3022 0.8949 0.7438 0.8893 0.8493NE Default x4 + Rectification 0.3096 0.9042 0.7593 0.9039 0.8590

Table 1. Results of combining ASTER’s rectification network with super-resolution on five different datasets. RDN, DCSCNand NE combinations are applied over ASTER Baseline.

Method TOICO-1K SVT IC15 IC13 IIIT5K-Words AvgRectification 2.22% 2.16% 1.88% 1.56% 1.27% 1.82%RDN 1.63% 1.08% 2.17% 1.28% 0.70% 1.37%RDN + Rectification 2.67% 2.16% 3.77% 2.93% 1.43% 2.59%DCSCN 2.07% 1.70% 1.79% 1.01% 0.57% 1.43%DCSCN + Rectification 3.41% 2.78% 3.38% 2.47% 1.47% 2.70%NE 1.93% 1.85% 2.17% 1.10% 1.30% 1.67%NE + Rectification 2.96% 2.32% 3.62% 2.29% 2.07% 2.65%Avg 2.44% 1.98% 2.82% 1.85% 1.26% /

Table 2. Overall text recognition precision increases, measured as the difference between best-case result (NE) and the ASTERBaseline. Avg. Improvement presents the average improvement per dataset (columns) and per method (rows).

Page 5: Improving Text Recognition in Tor darknet with ...gvis.unileon.es/wp-content/uploads/2020/01/PREPRINT-ICDP.pdf · maintain their anonymity. However, Tor darknet also hosts suspicious

proves the results on all combinations and datasets, with twoexceptions. The combination of both Neural Enhance Deblurand Repair models with the rectification network results ina lower precision score than only using the rectification net-work. These are the only instances in which combining bothtasks results in a lower score than simply applying rectification,which suggests that these methods cause bigger alterations tothe high-resolution images.

Furthermore, the combination of both tasks often resultson a lower score than the sum of each task, which suggestssome images can be properly recognized with the use of onlyone technique. This can be due to orientation being a moredifficult challenge than resolution, as indicated by the higherresults achieved with rectification.

Without rectification, the DCSCN scores the highest inTOICO-1K dataset, with an initial improvement of 2.07% andlater increased to 3.41 when adding rectification. When im-ages are not corrected, the neural-enhance approach under thedeblur and default x2 configurations obtains the best perfor-mance on average. When combined with rectification, DCSCNalso scores as the best method with an average improvement onall datasets of 2.70%.

5 Conclusions

We have analyzed the performance of three different super-resolution techniques as the effect of ASTER’s rectificationnetwork on both Tor-based images and on four state-of-the-artdatasets without the use of any lexicon.

Our results demonstrates that the combination of both tasksis highly desirable in both Tor-based images as well as natural-scene based datasets. When analyzed separately, rectificationslightly outperforms super-resolution. The best results are ob-tained combining both tasks, although the scores are than theaddition of each separate task, except in the case of combin-ing RDN and rectification on the ICDAR13 dataset. This resultsuggests that specific images can be recongized after the use ofeither technique.

We obtained the maximum improvement on ICDAR 2015dataset, 3.77%, combining the RDN and the Rectification net-work. In TOICO-1K, DCSCN combined with the rectificationimproved the baseline results a 3.41%. Based on these results,we recommend the use of DCSCN for Tor specific images andRDN for global image-resolution improvements.

Among the super-resolution approaches, neural-enhanceobtains the best results when no rectification is involved, withthe deblur and default x2 scaling parameters as the best con-figurations. However, when combined with the rectificationnetwork, the neural enhance approach under repair and deblurconfigurations is the only method that obtains lower results inTor-based images compared to the initial rectification baseline.

With the obtained results, we conclude that text recognitionapproaches might benefit more from the rectification approachthan super-resolution, highlighting low-resolution as a relevantchallenge for text recognizers when compared to curved text.

Our future work will focus on studying text-based super-resolution approaches and their computational efficiency, as

well as other Tor-based image problems, such as aspect ratioand partial occlusion in order to retrieve information from Torimages that can be used to prevent illegal activities.

Acknowledgements

This research has been supported by the grant Universidad deLeon 2018 and by the framework agreement between Univer-sidad de Leon and INCIBE (Spanish National CybersecurityInstitute) under Addendum 01. We acknowledge NVIDIA Cor-poration with the donation of the Titan Xp GPU used for thisresearch.

References

[1] Mhd Wesam Al-Nabki, Eduardo Fidalgo, Enrique Alegre,and Laura Fernandez-Robles. ToRank: Identifying themost influential suspicious domains in the Tor network.Expert Systems with Applications, 123:212 – 226, 2019.

[2] A. Gangwar, E. Fidalgo, E. Alegre, and V. Gonzalez-Castro. Pornography and child sexual abuse detection inimage and video: A comparative evaluation. In 8th Inter-national Conference on Imaging for Crime Detection andPrevention (ICDP 2017), pages 37–42, Dec 2017.

[3] Siyu He, Yongzhong He, and Mingzhe Li. Classificationof illegal activities on the dark web. In ACM InternationalConference Proceeding Series, volume Part F1483, pages73–78, New York, New York, USA, 2019. ACM Press.

[4] Akanksha Joshi, E. Fidalgo, E. Alegre, and LauraFernandez-Robles. Summcoder: An unsupervisedframework for extractive text summarization based ondeep auto-encoders. Expert Systems with Applications,129:200 – 215, 2019.

[5] Xiangwen Wang, Peng Peng, Chun Wang, and GangWang. You are your photographs: Detecting multipleidentities of vendors in the darknet marketplaces. In Pro-ceedings of the 2018 on Asia Conference on Computerand Communications Security, pages 431–442. ACM,2018.

[6] R. Biswas, E. Fidalgo, and E. Alegre. Recognition ofservice domains on tor dark net using perceptual hashingand image classification techniques. IET Conference Pro-ceedings, pages 7–12(5), 2017.

[7] Scott W. Duxbury and Dana L. Haynie. The NetworkStructure of Opioid Distribution on a Darknet Cryptomar-ket. Journal of Quantitative Criminology, 34(4):921–941,dec 2018.

[8] Mhd Wesam, Al Nabki, Eduardo Fidalgo, Enrique Ale-gre, and Victor Gonzalez-Castro. Detecting EmergingProducts in TOR Network Based on K-Shell Graph De-composition. pages 24–30, 2017.

Page 6: Improving Text Recognition in Tor darknet with ...gvis.unileon.es/wp-content/uploads/2020/01/PREPRINT-ICDP.pdf · maintain their anonymity. However, Tor darknet also hosts suspicious

[9] Xiangwen Wang, Peng Peng, Chun Wang, and GangWang. You are your photographs: Detecting multipleidentities of vendors in the darknet marketplaces. In ASI-ACCS 2018 - Proceedings of the 2018 ACM Asia Confer-ence on Computer and Communications Security, pages431–442, New York, New York, USA, 2018. ACM Press.

[10] P Blanco-Medina, Eduardo Fidalgo, Enrique Alegre, andMhd Wesam Al-Nabki. Detecting textual information inimages from onion domains using text spotting. Actasde las XXXIX Jornadas de Automatica, pages 975–982,2018.

[11] Pablo Blanco-Medina, Eduardo Fidalgo, Enrique Alegre,Mhd Wesam Al-Nabki, and Deisy Chaves. Enhancingtext recognition on tor darknet images. In XL Jornadasde Automatica, pages 828–835. Universidade da Coruna,Servizo de Publicacions, 2019.

[12] Baoguang Shi, Mingkun Yang, Xinggang Wang,Pengyuan Lyu, Cong Yao, and Xiang Bai. Aster: Anattentional scene text recognizer with flexible rectifica-tion. Transactions on Pattern Analysis and Machine In-telligence, 2018.

[13] Canjie Luo, Lianwen Jin, and Zenghui Sun. Moran:A multi-object rectified attention network for scene textrecognition. Pattern Recognition, 90:109 – 118, 2019.

[14] Qixiang Ye and David Doermann. Text detection andrecognition in imagery: A survey. Transactions onPattern Analysis and Machine Intelligence, 37(7):1480–1500, 2014.

[15] Yingying Zhu, Cong Yao, and Xiang Bai. Scene textdetection and recognition: Recent advances and futuretrends. Frontiers of Computer Science, 10(1):19–36,2016.

[16] Baoguang Shi, Xiang Bai, and Cong Yao. An end-to-end trainable neural network for image-based sequencerecognition and its application to scene text recognition.Transactions on Pattern Analysis and Machine Intelli-gence, 39(11):2298–2304, 2016.

[17] Xuebo Liu, Ding Liang, Shi Yan, Dagui Chen, Yu Qiao,and Junjie Yan. Fots: Fast oriented text spotting with aunified network. In Proceedings of the IEEE conferenceon computer vision and pattern recognition, pages 5676–5685, 2018.

[18] Zhanzhan Cheng, Yangliu Xu, Fan Bai, Yi Niu, Shil-iang Pu, and Shuigeng Zhou. Aon: Towards arbitrarily-oriented text recognition. In Proceedings of the IEEEConference on Computer Vision and Pattern Recognition,pages 5571–5579, 2018.

[19] Clement Peyrard, Moez Baccouche, Franck Mamalet, andChristophe Garcia. Icdar2015 competition on text imagesuper-resolution. In 2015 13th International Conference

on Document Analysis and Recognition (ICDAR), pages1201–1205. IEEE, 2015.

[20] Chao Dong, Chen Change Loy, Kaiming He, and XiaoouTang. Learning a deep convolutional network for imagesuper-resolution. In European conference on computervision, pages 184–199. Springer, 2014.

[21] Jin Yamanaka, Shigesumi Kuwashima, and Takio Kurita.Fast and accurate image super resolution by deep cnnwith skip connection and network in network. In Inter-national Conference on Neural Information Processing,pages 217–225. Springer, 2017.

[22] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong,and Yun Fu. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, pages 2472–2481, 2018.

[23] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Ca-ballero, Andrew Cunningham, Alejandro Acosta, AndrewAitken, Alykhan Tejani, Johannes Totz, Zehan Wang,et al. Photo-realistic single image super-resolution usinga generative adversarial network. In Proceedings of theIEEE conference on computer vision and pattern recog-nition, pages 4681–4690, 2017.

[24] Fred L. Bookstein. Principal warps: Thin-plate splinesand the decomposition of deformations. Transactions onPattern Analysis and Machine Intelligence, 11(6):567–585, 1989.

[25] Kai Wang and Serge Belongie. Word spotting in the wild.In European Conference on Computer Vision, pages 591–604. Springer, 2010.

[26] Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida,Masakazu Iwamura, Lluis Gomez i Bigorda, Sergi Rob-les Mestre, Joan Mas, David Fernandez Mota, Jon Al-mazan Almazan, and Lluis Pere De Las Heras. ICDAR2013 robust reading competition. In 2013 12th Interna-tional Conference on Document Analysis and Recogni-tion, pages 1484–1493. IEEE, 2013.

[27] Dimosthenis Karatzas, Lluis Gomez-Bigorda, AnguelosNicolaou, Suman Ghosh, Andrew Bagdanov, MasakazuIwamura, Jiri Matas, Lukas Neumann, Vijay RamaseshanChandrasekhar, Shijian Lu, et al. ICDAR 2015 competi-tion on robust reading. In 2015 13th International Con-ference on Document Analysis and Recognition (ICDAR),pages 1156–1160. IEEE, 2015.

[28] Anand Mishra, Karteek Alahari, and CV Jawahar. Scenetext recognition using higher order language priors.In BMVC-British Machine Vision Conference. BMVA,2012.

[29] Eduardo Fidalgo, Enrique Alegre, Laura Fernandez-Robles, and Vıctor Gonzalez-Castro. Classifying suspi-cious content in tor darknet through Semantic Attention

Page 7: Improving Text Recognition in Tor darknet with ...gvis.unileon.es/wp-content/uploads/2020/01/PREPRINT-ICDP.pdf · maintain their anonymity. However, Tor darknet also hosts suspicious

Keypoint Filtering. Digital Investigation, 30:12–22, jun2019.

[30] E. Fidalgo, E. Alegre, L. Fernandez-Robles, andV. Gonzalez-Castro. Early fusion of multi-level saliencydescriptors for image classification. Revista Iberoameri-cana de Automatica e Informatica industrial, 16(3):358,dec 2019.