automatic perspective correction of manuscript...

Automatic Perspective Correction of ManuscriptImages

Ryan Baumann1, Christopher Blackwell2, and W. Brent Seales1

1 University of Kentucky2 Furman University

1 Introduction

Frequently, images of rare documents must be taken under strict time con-straints, when a chance opportunity arises, and with equipment that is less thanideally suited for precise digitization. This can often result in uncalibrated im-ages whose contents are nonetheless qualitatively useful. Due to the logistics ofdocument imaging, perspective distortion is a common artifact which can man-ifest itself in images taken under these constrained circumstances. We proposean automated approach for correcting this perspective distortion, even for un-calibrated images of documents with irregular page edges containing no regulartext.

2 Background

In order to frame the problem and our need for a new approach, it may be help-ful to first describe the images we were working with, the circumstances underwhich they were taken, as well as why perspective correction was desirable. InNovember of 2011, C. Blackwell and A. Hackney Blackwell photographed thebotanical specimens collected in the Sloane Herbarium of the Natural HistoryMuseum, London. This was a digitization project of opportunity. Blackwell andHackney Blackwell were in London working on another project, and were ableto negotiate permission for this digitization at the last minute; the botanicalimages would complement their work on digital codicology (Blackwell) and en-vironmental science and herbarium collections (Hackney Blackwell). The pairhad one and a half working days to take as many images as possible. They had aportable conservation copystand, a tripod, and two digital cameras; the workingspace at the Museum was small, with limited working surfaces. The botanicalspecimens are pasted onto the pages of large quarto- or folio-sized volumes. Aninitial process of triage divided the volumes into one group that would fit on thecopystand, and one group that were too large. The former were photographedwith a Nikon D700 camera mounted in good alignment on the copystand, whichincluded lightbars for even illumination. The other volumes presented a chal-lenge.

These were photographed with a Nikon D3x camera on a tripod; the vol-umes were supported by foam wedges. The tripod was weighted with exercise

Fig. 1. Sample original image

weights for stability, and the book, tripod, and camera were situated on an L-shaped bench. There was limited time to experiment, but the team devised anarrangement that resulted in sufficient light, good safety for the volumes be-ing photographed, but less-than-ideal alignment of camera to page. Knowingthat this alignment was imperfect, the team elected to use the camera with thehigher resolution (the D3x) for these, in the hope that the greater resolutionwould afford greater potential for post-processing the images.

The team captured over 1,800 images in the time allotted. These books hadnever been photographed, and having digital access (under an Open ContentLicense) to these 300-year-old botanical specimens is invaluable. Nevertheless,the inevitable misalignment of the images of some of the volumes is regrettable.It in no way precludes identification and study of the plant specimens, but doespreclude the finest level of comparative measurement. This project resulted ina quantity of data that would not otherwise exist, but whose quality stands inneed of enhancement.

3 Related Work

Numerous work exists for performing document perspective correction by ex-ploiting the presence of common text features, such as linear text baselines inprinted texts [1–21]. As a result, many of these techniques need to be tweakeddepending upon morphological features of the written language of the documentin question. However, for our documents, many pages had no such features, con-sisting only of handwritten text and physical plant specimens. Other techniquesuse extensive calibration and data acquisition to correct for geometric distortion[22–27]. However, as detailed above, there was not sufficient time to perform thissort of data acquisition. As a result, our approach is fairly unique in that it usesonly physical page boundaries of uncalibrated images to perform the perspectivecorrection. An advantage to this is that it can handle blank pages or pages con-sisting only of images or haphazard writing. In addition, because the manuscriptpages we are dealing with have cockled, uneven edges, our boundary detectionneeds to be relatively robust to the noise of these uneven edges.

4 Perspective Correction Algorithm

The perspective correction algorithm functions by applying a few steps in se-quence:

1. Foreground/background segmentation. (Figure 2a)2. Canny edge detection followed by dilation to turn the mask image into a line

image. (Figure 2b)3. Hough line detection to detect lines. (Figure 2c)4. Line classification to classify the detected lines as top or bottom page edge

lines. (Figure 2d)5. Line averaging to find the average line for the top and bottom edge lines.

(Figure 3a)6. Compute homography for transforming the slanted top and bottom page

lines into straight lines.7. Application of the resulting perspective warp. (Figure 3b)

For our images, most were made with a fairly consistent backdrop, so theforeground/background segmentation operates by performing a flood fill on thecorners of a gaussian-smoothed version of the image. This exploits the assump-tion that the page will not fill the full image all the way around, as in such acase, our perspective correction would likely be unable to accurately detect allthe page edges which it uses to compute homography. We then remove blobswhose area is less then half the image, as we assume the page still makes upmost of the image.

This gives us a mask image as seen in Figure 2a. Since we want to use Houghline detection, we first need to transform the mask edges into lines, which we dousing a Canny edge detector and dilation, the results of which can be seen inFigure 2b.

We can then perform Hough line detection, using a probabilistic Hough trans-form which gives us line segments for every line detected in the image. We classifythese lines as belonging to either the top or bottom of the page by seeing whichhalf of the image height the start and end points are in, and restricting it toonly lines with a ±45 ◦ angle. This also filters out edges detected along the leftand right side of the page.

We can then average these lines to get the average line for the top andbottom page edges. We extend these lines to the image edges, and construct aquadrilateral matrix using them for the top and bottom. We then compute adestination quadrilateral matrix using the assumption that the top edge shouldbe brought up to the highest point to form a straight line (while maintainingline length), and similarly for the bottom edge being brought down, forming arectangle. We can then use these two quads to compute a perspective transformwhich will warp our source quadrilateral into our destination rectangle. We thenapply the resulting perspective warp to the whole image, resulting in a dewarpedimage, as seen in Figure 3b.

Because the process currently operates with no knowledge of the camera fieldof view or focal length parameters, the aspect ratio of the final dewarped imagemust still be corrected. This is because although we can bring the edges parallel,we do not know their actual physical depth and the resulting scale needed inone direction, which will be affected by the camera field of view. This can becorrected using a priori knowledge of the document’s aspect ratio, or in our casewe have a small ruler in the image on the page which we can use after perspectivecorrection to correct the aspect ratio.

5 Results

In processing the hundreds of herbarium images which required perspective cor-rection, almost all succeeded with no need for manual intervention. Occasion-ally, an image with some discontinuous feature in the background would fail theforeground/background segmentation step, but this is fairly easy to correct withsome minor image pre-processing. In any case, the perspective correction enablesmuch more accurate comparative measurements to be made on the specimenscontained in our page images.

In order to verify the system, we also applied it using control 3D manuscriptimage data we had previously acquired. In this case, we had accurate, met-rically calibrated, 3D surfaces and corresponding texture data for an ancientmanuscript. By placing synthetic 1cm measurement fiducials on the surface andrendering it in a scene with perspective distortion, we can then in turn applyour automated perspective correction algorithm to the rendered image to verifyits accuracy. For the image shown in Figure 4, the 1cm fiducials are 144px and111px before correction, and both measure 153px after correction.

6 Conclusions and Future Work

The method described here is relatively robust and fast to perform on a largenumber of high-resolution images. For different images where assumptions ex-ploited by this method might not hold, for example in foreground/backgroundsegmentation, a different approach or altered parameters may need to be usedfor a particular step.

Future work could be done in automating the aspect ratio correction afterperspective correction. We plan to investigate if camera metadata is sufficientfor using focal length and field of view parameters to perform this correctionautomatically, or performing a first pass for refinement.

7 Acknowledgments

We would like to thank the Natural History Museum, London for permission andaccess in imaging items from their collection. We would also like to thank theChapter of Lichfield Cathedral for their permission in imaging the Chad gospels,which are used for the synthetic image results and distributed under a CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

This material is based in part upon work supported by the National Sci-ence Foundation under Grant Numbers IIS-0916421 (other grants go here). Anyopinions, findings, and conclusions or recommendations expressed in this ma-terial are those of the authors and do not necessarily reflect the views of theNational Science Foundation.

References

1. Clark, P., Mirmehdi, M.: Estimating the orientation and recovery of text planes ina single image. Proceedings of the 12th British Machine Vision Conference (2001)421–430

2. Clark, P., Mirmehdi, M.: On the recovery of oriented documents from single images.Proceedings of ACIVS (2002) 190–197

3. Clark, P., Mirmehdi, M.: Rectifying perspective views of text in 3D scenes usingvanishing points. Pattern Recognition 36(11) (2003) 2673–2686

4. Cao, Y., Wang, S., Li, H.: Skew detection and correction in document imagesbased on straight-line fitting. Pattern Recognition Letters 24(12) (August 2003)1871–1879

5. Lu, Y., Lim Tan, C.: A nearest-neighbor chain based approach to skew estimationin document images. Pattern Recognition Letters 24(14) (October 2003) 2315–2323

6. Zhang, L., Tan, C.: Warped image restoration with applications to digital libraries.Document Analysis and Recognition, 2005. Proceedings. Eighth International Con-ference on (2005) 192–196 Vol. 1

7. Monnier, C., Ablavsky, V., Holden, S., Snorrason, M.: Sequential correction ofperspective warp in camera-based documents. In: Document Analysis and Recog-nition, 2005. Proceedings. Eighth International Conference on. (2005) 394–398

8. Liang, J., DeMenthon, D., Doermann, D.: Flattening curved documents in images.In: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE ComputerSociety Conference on. (2005) 338–345

9. Ulges, A., Lampert, C.H., Breuel, T.: Document image dewarping using robustestimation of curled text lines. Document Analysis and Recognition, 2005. Pro-ceedings. Eighth International Conference on (2005) 1001–1005 Vol. 2

10. Ezaki, H., Uchida, S., Asano, A., Sakoe, H.: Dewarping of document image byglobal optimization. In: Document Analysis and Recognition, 2005. Proceedings.Eighth International Conference on. (2005) 302–306

11. Pollard, S., Pilu, M.: Building cameras for capturing documents. InternationalJournal on Document Analysis and Recognition 7(2) (2005) 123–137

12. Lu, S., Chen, B.M., Ko, C.C.: Perspective rectification of document images usingfuzzy set and morphological operations. Image and Vision Computing 23(5) (May2005) 541–553

13. Avila, B.T., Lins, R.D.: A fast orientation and skew detection algorithm formonochromatic document images. In: DocEng ’05: Proceedings of the 2005 ACMsymposium on Document engineering, ACM Request Permissions (November 2005)

14. Zhang, W., Li, X., Ma, X.: Perspective Correction Method for Chinese Docu-ment Images. Intelligent Information Technology Application Workshops, 2008.IITAW’08. International Symposium on (2008) 467–470

15. Liang, J., DeMenthon, D., Doermann, D.: Geometric Rectification of Camera-Captured Document Images. IEEE Transactions on Pattern Analysis and MachineIntelligence 30(4) (2008) 591–605

16. Bukhari, S., Shafait, F.: Dewarping of document images using coupled-snakes.Proc of 3rd Int Workshop on . . . (2009)

17. Beusekom, J., Shafait, F., Breuel, T.M.: Combined orientation and skew detectionusing geometric text-line modeling. International Journal of Document Analysisand Recognition (IJDAR) 13(2) (2010) 79–92

18. Luo, S., Fang, X., Zhao, C., Luo, Y.: Text Line Based Correction of Distorted Doc-ument Images. In: Trust, Security and Privacy in Computing and Communications(TrustCom), 2011 IEEE 10th International Conference on. (2011) 1494–1499

19. Rahnemoonfar, M., Antonacopoulos, A.: Restoration of Arbitrarily Warped His-torical Document Images Using Flow Lines. Document Analysis and Recognition(ICDAR), 2011 International Conference on (2011) 905–909

20. Golpardaz, M., Nezamabadi-Pour, H.: Perspective Rectification and Skew Correc-tion in Camera-Based Farsi Document Images. Machine Vision and Image Pro-cessing (MVIP), 2011 7th Iranian (2011) 1–5

21. Yang, P., Antonacopoulos, A., Clausner, C., Pletschacher, S.: Grid-based modellingand correction of arbitrarily warped historical document images for large-scale digi-tisation. In: HIP ’11: Proceedings of the 2011 Workshop on Historical DocumentImaging and Processing, ACM Request Permissions (September 2011)

22. Pilu, M.: Undoing paper curl distortion using applicable surfaces. ComputerVision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEEComputer Society Conference on 1 (2001) I–67–I–72 vol. 1

23. Brown, M., Seales, W.: Document restoration using 3D shape. Proceedings ofICCV’01 (2001)

24. Brown, M., Seales, W.: Image restoration of arbitrarily warped documents. PatternAnalysis and Machine Intelligence (2004)

25. Brown, M., Pisula, C.: Conformal deskewing of non-planar documents. ComputerVision and Pattern Recognition (2005)

26. Brown, M., Sun, M., Yang, R., Yun, L., Seales, W.: Restoring 2D Content fromDistorted Documents. Pattern Analysis and Machine Intelligence (2007)

27. Zhang, L., Zhang, Y., Tan, C.L.: An Improved Physically-Based Method for Geo-metric Restoration of Distorted Document Images. IEEE Transactions on PatternAnalysis and Machine Intelligence 30(4) (2008) 728–734

(a) Masked image (b) Canny edge lines

(c) Hough line detection (d) Classified lines

Fig. 2. Pre-processing steps

(a) Averaged lines (b) Unwarped image

Fig. 3. Final perspective processing

(a) Before perspective correc-tion

(b) After perspective correction

Fig. 4. Verifying system accuracy with synthetic page images rendered using 3D scandata. The white fiducial markers can be seen on the left and right margins of the page.