3d reconstruction

10
538 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 5, MAY 2000 A 3-D Reconstruction System for the Human Jaw Using a Sequence of Optical Images Sameh M. Yamany, Member, IEEE, Aly A. Farag*, Senior Member, IEEE, David Tasman, and Allan G. Farman Abstract—This paper presents a model-based vision system for dentistry that will assist in diagnosis, treatment planning, and surgical simulation. Dentistry requires an accurate three-dimen- sional (3-D) representation of the teeth and jaws for diagnostic and treatment purposes. The proposed integrated computer vision system constructs a 3-D model of the patient’s dental occlusion using an intraoral video camera. A modified shape from shading (SFS) technique, using perspective projection and camera calibration, extracts the 3-D information from a sequence of two-dimensional (2-D) images of the jaw. Data fusion of range data and 3-D registration techniques develop the complete jaw model. Triangulation is then performed, and a solid 3-D model is reconstructed. The system performance is investigated using ground truth data, and the results show acceptable reconstruction accuracy. Index Terms—Camera calibration, data fusion, dental occlusion, image sequence analysis, registration, shape representation. I. INTRODUCTION D ENTISTRY requires accurate three-dimensional (3-D) representation of the teeth and jaw for diagnostic and treatment purposes. For example, orthodontic treatment in- volves the application, over time, of force systems to teeth to correct malocclusion. In order to evaluate tooth movement progress, the orthodontist monitors this movement by means of visual inspection, intraoral measurements, fabrication of casts, photographs, and radiographs; this process is both costly and time-consuming. Moreover, repeated acquisition of radiographs may result in untoward effects. Obtaining a cast of the jaw is a complex operation for the dentist, an unpleasant experience for the patient, and may not provide all of the necessary details of the jaw. Oral and maxillofacial radiology can provide the den- tist with abundant 3-D information of the jaw. Current and evolving methods include computed tomography (CT), to- mosynthesis [1], tuned-aperture CT (TACT) [2], and localized, or “cone-beam,” CT [3]. Although oral and maxillofacial radiology is now widely accepted as a routine technique for dental examinations, the equipment is rather expensive and the Manuscript received August 15, 1999; revised December 15, 1999. This work was supported by the Whitaker Foundation, under Grant 94-0228, and the Na- tional Science Foundation, under Grant ECS-9505674. The Associate Editors responsible for coordinating the review of this paper and recommending its publication were F. J. Beekman and M. A. Viergever. Asterisk indicates cor- responding author. S. M. Yamany and *A. A. Farag are with the Computer Vision and Image Pro- cessing Laboratory, Electrical and Computer Engineering Department, Univer- sity of Louisville, Louisville, KY 40292 USA (e-mail: [email protected]). D. Tasman and A. G. Farman are with the School of Dentistry, University of Louisville, Louisville, KY 40292 USA. Publisher Item Identifier S 0278-0062(00)05302-7. resolution is frequently too low for 3-D modeling of dental structures. Furthermore, the radiation dose required to enhance both contrast and spatial resolution can be unacceptably high. Much effort has focused recently on computerized diagnosis in dentistry [4]. Bernard et al. [5] developed an expert system in which cephalometric measurements are acquired manually from the analysis of radiographs and plaster models. Lauren- deau and Possart [4] presented a computer-vision technique for the acquisition of jaw data from inexpensive dental wafers. That system was capable of obtaining imprints of the teeth. Usually, most of the 3-D systems for dental applications found in the lit- erature rely on obtaining an intermediate solid model of the jaw (cast or teeth imprints) and then capturing the 3-D information from that model. User interaction is needed in such systems to determine the 3-D coordinates of fiducial reference points on a dental cast. Other systems that can measure the 3-D coordinates have been developed using either mechanical contact [6] or a traveling light principle [7], [8]. Goshtasby et al. [9] designed a range scanner based on white light to reconstruct the cast. The scanner used the subtractive light principle to create very thin shadow profiles on the cast. We are developing a system for dentistry to replace traditional approaches in diagnosis, treatment planning, surgical simula- tion, and prosthetic replacements [10]–[12]. Specific objectives of our dental imaging research (the Jaw Project) are as follows: 1) to design a data acquisition system that can obtain se- quences of calibrated video images, with respect to a common reference in 3-D space, of the upper/lower jaw using a small intraoral camera; 2) to develop methods for accurate 3-D reconstruction from the acquired sequence of intraoral images; this involves using a new algorithm for shape from shading (SFS) that incorporates the camera parameters; 3) to develop a robust algorithm for the fusion of data ac- quired from multiple views, including the implementation of an accurate and fast 3-D data registration; 4) to develop a specific object segmentation and recognition system to separate and recognize individual 3-D tooth in- formation for further analysis and simulations; 5) to develops to study and simulate tooth movement based on the finite element method and deformable model ap- proaches. This research has an immense value in various dental practices, including implants, tooth alignment, and craniofacial surgery. The research also has wide applications in teledentistry, dental education, and training. This paper describes the project’s first phase concerning the development of a 3-D model of the jaw, not from a cast, but 0278–0062/00$10.00 © 2000 IEEE

Upload: soumiabimal2014

Post on 09-Apr-2016

16 views

Category:

Documents


0 download

DESCRIPTION

3d Reconstruction

TRANSCRIPT

538 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 5, MAY 2000

A 3-D Reconstruction System for the Human JawUsing a Sequence of Optical Images

Sameh M. Yamany, Member, IEEE, Aly A. Farag*, Senior Member, IEEE, David Tasman, and Allan G. Farman

Abstract—This paper presents a model-based vision systemfor dentistry that will assist in diagnosis, treatment planning, andsurgical simulation. Dentistry requires an accurate three-dimen-sional (3-D) representation of the teeth and jaws for diagnosticand treatment purposes. The proposed integrated computervision system constructs a 3-D model of the patient’s dentalocclusion using an intraoral video camera. A modified shapefrom shading (SFS) technique, using perspective projection andcamera calibration, extracts the 3-D information from a sequenceof two-dimensional (2-D) images of the jaw. Data fusion of rangedata and 3-D registration techniques develop the complete jawmodel. Triangulation is then performed, and a solid 3-D modelis reconstructed. The system performance is investigated usingground truth data, and the results show acceptable reconstructionaccuracy.

Index Terms—Camera calibration, data fusion, dental occlusion,image sequence analysis, registration, shape representation.

I. INTRODUCTION

DENTISTRY requires accurate three-dimensional (3-D)representation of the teeth and jaw for diagnostic and

treatment purposes. For example, orthodontic treatment in-volves the application, over time, of force systems to teeth tocorrect malocclusion. In order to evaluate tooth movementprogress, the orthodontist monitors this movement by means ofvisual inspection, intraoral measurements, fabrication of casts,photographs, and radiographs; this process is both costly andtime-consuming. Moreover, repeated acquisition of radiographsmay result in untoward effects. Obtaining a cast of the jaw isa complex operation for the dentist, an unpleasant experiencefor the patient, and may not provide all of the necessary detailsof the jaw.

Oral and maxillofacial radiology can provide the den-tist with abundant 3-D information of the jaw. Current andevolving methods include computed tomography (CT), to-mosynthesis [1], tuned-aperture CT (TACT) [2], and localized,or “cone-beam,” CT [3]. Although oral and maxillofacialradiology is now widely accepted as a routine technique fordental examinations, the equipment is rather expensive and the

Manuscript received August 15, 1999; revised December 15, 1999. This workwas supported by the Whitaker Foundation, under Grant 94-0228, and the Na-tional Science Foundation, under Grant ECS-9505674. The Associate Editorsresponsible for coordinating the review of this paper and recommending itspublication were F. J. Beekman and M. A. Viergever.Asterisk indicates cor-responding author.

S. M. Yamany and *A. A. Farag are with the Computer Vision and Image Pro-cessing Laboratory, Electrical and Computer Engineering Department, Univer-sity of Louisville, Louisville, KY 40292 USA (e-mail: [email protected]).

D. Tasman and A. G. Farman are with the School of Dentistry, University ofLouisville, Louisville, KY 40292 USA.

Publisher Item Identifier S 0278-0062(00)05302-7.

resolution is frequently too low for 3-D modeling of dentalstructures. Furthermore, the radiation dose required to enhanceboth contrast and spatial resolution can be unacceptably high.

Much effort has focused recently on computerized diagnosisin dentistry [4]. Bernardet al. [5] developed an expert systemin which cephalometric measurements are acquired manuallyfrom the analysis of radiographs and plaster models. Lauren-deau and Possart [4] presented a computer-vision technique forthe acquisition of jaw data from inexpensive dental wafers. Thatsystem was capable of obtaining imprints of the teeth. Usually,most of the 3-D systems for dental applications found in the lit-erature rely on obtaining an intermediate solid model of the jaw(cast or teeth imprints) and then capturing the 3-D informationfrom that model. User interaction is needed in such systems todetermine the 3-D coordinates of fiducial reference points on adental cast. Other systems that can measure the 3-D coordinateshave been developed using either mechanical contact [6] or atraveling light principle [7], [8]. Goshtasbyet al. [9] designed arange scanner based on white light to reconstruct the cast. Thescanner used the subtractive light principle to create very thinshadow profiles on the cast.

We are developing a system for dentistry to replace traditionalapproaches in diagnosis, treatment planning, surgical simula-tion, and prosthetic replacements [10]–[12]. Specific objectivesof our dental imaging research (the Jaw Project) are as follows:

1) to design a data acquisition system that can obtain se-quences of calibrated video images, with respect to acommon reference in 3-D space, of the upper/lower jawusing a small intraoral camera;

2) to develop methods for accurate 3-D reconstruction fromthe acquired sequence of intraoral images; this involvesusing a new algorithm for shape from shading (SFS) thatincorporates the camera parameters;

3) to develop a robust algorithm for the fusion of data ac-quired from multiple views, including the implementationof an accurate and fast 3-D data registration;

4) to develop a specific object segmentation and recognitionsystem to separate and recognize individual 3-D tooth in-formation for further analysis and simulations;

5) to develops to study and simulate tooth movement basedon the finite element method and deformable model ap-proaches.

This research has an immense value in various dental practices,including implants, tooth alignment, and craniofacial surgery.The research also has wide applications in teledentistry, dentaleducation, and training.

This paper describes the project’s first phase concerning thedevelopment of a 3-D model of the jaw, not from a cast, but

0278–0062/00$10.00 © 2000 IEEE

YAMANY et al.: 3-D RECONSTRUCTION SYSTEM FOR THE HUMAN JAW 539

Fig. 1. Reconstructing the 3-D model of the human jaw starts by capturing a sequence of video images using a small intraoral CCD camera. These images arepreprocessed to remove specularity. Reference points are obtained using the CMM system. The range data are fused to the SFS output, and then registration takesplace. A cloud of points representing the jaw is obtained, and by triangulation, a solid digital model is formed. This model is reproduced using a rapidprototypemachine. Further analysis and orthodontics application can be performed on the digital model.

from the actual human jaw. The work reported here is orig-inal and novel in the following aspects: data acquisition is per-formed directly on the human jaw using a small off the shelfcharged-coupled device (CCD) camera, and the acquisition timeis relatively short and is less discomforting to the patient com-pared with current practices. Also, the acquired digital modelcan be stored with the patient data and can be retrieved on de-mand. These models can also be transmitted over a communica-tion network to different remote practitioners for further assis-tance in diagnosis and treatment planning. Using these models,dental measurements and virtual restoration can be performedand analyzed.

II. SYSTEM OVERVIEW

As shown in Fig. 1, our approach to reconstruct the humanjaw consists of the following stages. The first stage is data ac-quisition. A small, intraoral CCD camera, with a built-in whitelight, mounted on a five-link 3-D digitizer arm, is calibrated

and then placed inside the oral cavity. The camera acquires aset of overlapping images for differentparts of the jaw such that covers the entire jaw. Theimages are preprocessed to reduce noise, sharpen edges, and re-move specularity. Removing specularity is important becauseit affects the accuracy of the reconstructed surfaces using SFS.We used an approach similar to the Tong and Funt method toremove specularity [13]. In this approach, changes in the reflec-tion map are removed from the luminance image by calculatingthe logarithmic gradient of the image and thresholding at loca-tions of abrupt chromaticity change. In addition, we used me-dian filtering to remove speckle noise from the images. Usinga modified SFS algorithm, which accounts for the camera per-spective projection, sets of 3-D points are then computed. TheSFS technique is described in detail in Section III. To obtain ac-curate metric measurements, range data are obtained using thefive-link digitizer arm. These data consist of reference points onthe jaw. Fusion of the range data and the SFS output provides ac-curate metric information that can be used later for orthodontic

540 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 5, MAY 2000

Fig. 2. Functional block diagram for the integration process of SFS and range data. The surface approximation process is done using neural networks (NN). Anexample of using this system on the image of a vase is demonstrated.

measurements and implant planning. This stage is described inSection IV. A fast registration technique is required to merge theresulting 3-D points to obtain a complete 3-D description of thejaw [14], [15]. The final stage transforms this model into patchesof free-form surfaces using a triangulation technique. This stepenables the development of a 3-D solid model for visualization.A cast can be fabricated from this model via rapid prototyping.Further processing that can be carried out on the digital modelincludes tooth separation, force analysis, implant planning, andsurgical simulation.

III. SFS USING PERSPECTIVEPROJECTION ANDCAMERA

CALIBRATION

Among the tools used in shape extraction from single view isthe SFS technique. SFS has been primarily studied by Horn andBrooks [16] and their colleagues at the Massachusetts Instituteof Technology (Cambridge, MA). There have been many devel-opments in the algorithms (e.g., [17]–[21]). SFS assumes thatthe surface orientation at a point on a surface is determinedby the unit vector perpendicular to the plane tangent toat .Under the assumption of orthographic projections, the elementalchange in the depth at an image point can be expressedas . The partial derivatives arecalled surface gradients . The normal to a surface patch isrelated to the surface gradient by . Assuming thatsurface patches are homogeneous and uniformly lit by distantlight sources, the brightness seen at the image planeoften depends only on the orientation of the surface. This depen-dence of brightness on surface orientation can be represented asa function defined on the Gaussian sphere. Thus, the SFSproblem is formulated as finding a solution to the brightnessequation: , where is the sur-face reflectance map andis the illuminant direction.

A number of algorithms were developed to estimate the illu-minant direction (e.g., [20]). Because the white light beam inour design is built in the CCD camera, the assumption that theilluminant direction is known beforehand is valid. However, theassumption of orthographic projection is not adequate for thedental application because the camera is very close to the ob-ject. Some SFS approaches using perspective projection werefound in the literature (e.g. [22]–[24]). However, most of these

approaches ignore the camera extrinsic parameters; hence, theycannot provide metric information of the depth. In our approach,the CCD camera is calibrated and the camera parameters areused in the SFS algorithm to obtain a metric representation ofthe teeth and gum surfaces. To calibrate the camera, the relationbetween the 3-D point and the correspondingimage coordinates is written as , where

is a scalar, and are the extended vectors and, and is called the camera calibration matrix. In

general, , where is a matrix containing all of thecamera intrinsic parameters and are the rotation matrix andtranslation vector. The matrix has 12 elements, but it has only11 degrees of freedom because it is defined up to a scale factor(e.g., [25]).

The standard method of calibration is to use an object withknown size and shape and extract the reference points from theobject image. It can be shown that given points ( )in general positions, the camera can be calibrated [25]. The per-spective projection matrix can be decomposed as , where

is a matrix and is a vector such that

(1)

or

(2)

This last equation represents a line in the 3-D space corre-sponding to the visual ray passing through the optical centerand the projected point . By finding the scalar ,will define a unique 3-D point on the object. The surfacenormal at is defined to be the cross product of the two gra-dient vectors and .The surface reflectance becomes a function of the scalardefined in (1); i.e.,

(3)

The formulation of the SFS problem becomes finding the scalarthat solves the brightness equation. This can be solved using a Taylor’s series expansion and ap-

plying the Jacoby iterative method [21]. Afteriterations, for

YAMANY et al.: 3-D RECONSTRUCTION SYSTEM FOR THE HUMAN JAW 541

Fig. 3. The experimental setup of the proposed system consists of a CCD camera mounted on a five-link 3-D digitizer. A DC-regulated light source is connectedto a fiber-optic bundle surrounding the camera. The setup is connected to an SGI Indigo2 machine.

each point in the image, is given as follows (see theAppendix for more details):

(4)

where

(5)

(6)

(7)

where .

Even though camera parameters are used in the SFS imple-mentation, accurate metric information cannot be deduced fromthe resulting shape because only one image is used. The fol-lowing section describes a novel approach to perform the nec-essary calibration.

IV. FUSION OFSFSAND RANGE DATA

The most important information for reconstructing an accu-rate 3-D visible surface, which is missing in SFS, is the metricmeasurement. SFS also suffers from the discontinuities causedby highly textured surfaces and different albedo. The integra-tion of the dense depth map obtained from SFS with sparsedepth measurements obtained from a coordinate measurementmachine (CMM) for the reconstruction of 3-D surfaces with ac-curate metric measurements has two advantages [26]. First, it

542 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 5, MAY 2000

Fig. 4. Illustration of the camera self-calibration. The coordinates measuring system is used to find the transformation matrixT between the optical centerMof the camera at the calibration time and the new location while acquiring the images. This transformation matrix is used to obtain the new camera perspectiveprojection matrix.

helps to remove the ambiguity of the 3-D visible surface dis-continuities produced by SFS. Second, it complements for themissing metric information in the SFS. The integration process,as depicted in Fig. 2, includes the following stages. First, we cal-culate the error difference in the available depth measurementsbetween the two sets of sensory data. Then, we approximate asurface that fits this error difference. Finally, the approximatedsurface is used to correct the SFS.

A multilayer neural network is used for the surface approx-imation process because neural networks were shown to bemore robust in terms of complexity, speed, and accuracy [26]than were other computational approaches (e.g., regularizationtechniques). The learning algorithm applied is the extendedKalman-filter learning technique because of its fast computa-tion of weights. The - and -coordinates of the data points arethe input to the network, whereas the error in the depth valueat the point is the desired response. The error differ-ence between the SFS and the range measurements and their

– -coordinates are used to form the training set. The input tothe network is the – -coordinates, and the output is the errordifference at that coordinate. Once training is accomplished, theneural network provides the approximated smooth surface thatcontains information about the errors in the SFS at the locationswith no range data. This approximated surface is then addedto the SFS. The result is the 3-D surface reconstruction thatcontains accurate metric information about the visible surfaceof the sensed 3-D object. An example performed on the imageof a vase is shown in Fig. 2. The output of the fusion algorithmto each image is a set of 3-D points describing the teeth surfacesin this segment. To compensate for some digitizer inaccuracyin determining the camera location in space and the occasionalpatient movement, we have developed a fast and accurate 3-Dregistration technique to link the 3-D points of all segments toproduce one set of 3-D points describing the whole jaw surface[14]. The technique is a modification of the ICP technique byBesl and McKay [27], which uses a grid transform and geneticalgorithms to reduce the registration time.

V. EXPERIMENTAL SETUP

The experimental setup shown in Fig. 3 consists of the fol-lowing.

1) A MicroScribe 3-D digitizer: It can digitize a workingspace up to 1.27 m (Sphere) with a sampling rate of 1000points/s.

2) A 1/3-in Ultra-Mirco CCD color camera (ELMOUN411E). The camera has 768 H 494 V effectivepicture elements. We used a 5.5-mm lens for this camera.

3) A 150-W DC-regulated white light source. Through afiber-optic bundle that surrounds the CCD camera, thislight source illuminates the oral cavity with a stable whitelight. The light intensity could be adjusted manually tocontrol the shading effect.

4) An SGI Indigo2 machine that hosts the software requiredfor the data processing, reconstruction, and visualizationof the 3-D jaw model.

The CCD camera is mounted on the stylus of the 3-D digi-tizer, and its focal distance is adjusted such that the image willbe in focus only when the stylus tip touches the tooth surface.The image plane is normal to the stylus, and the stylus tip isat pixel (0, 0). The CCD camera is then calibrated as shown inFig. 4. Camera calibration is done once before using the camera,and if the camera is stationary, we do not have to recalibrateagain. Yet in the proposed system, the camera will be moving;this implies the recalculation of the perspective projection ma-trix. However, as the camera will be mounted on a coordinatesmeasuring system, the location of the optical center can betracked as the camera moves, and the camera perspective pro-jection can be recalculated.

The five degrees of freedom provided by the arm enablethe acquisition of a sequence of intraoral images covering theupper/lower jaw. Also, with each image, the camera location inthe 3-D space is measured. The perspective projection matrixis readjusted, and the location and direction of the first pixelin the image are included. This information is used in the datafusion and registration phase to reference the image plane inthe workspace.

VI. RESULTS

We designed a software interface to provide the dentistwith real-time visualization of the reconstructed segments be-fore registration and fusion, as shown in Fig. 5. The 3-D re-constructed segments are displayed with full color texture re-

YAMANY et al.: 3-D RECONSTRUCTION SYSTEM FOR THE HUMAN JAW 543

Fig. 5. Snapshots of the software used. The reconstructed segments are displayed in real time with full color texture. The dentist can rotate and manipulate thedisplayed 3-D model to check what parts still need to be scanned or rescanned. Fusion with range data from the digitizer is performed next, and a complete jawmodel with metric information is obtained. This model can be used to measure any orthodontic parameter and can be reproduced via rapid prototyping with thesame original scale.

flecting the colors of the teeth and gum. This texture is savedwith the scanned model and can be used later for diagnosis,especially for gum diseases. This interface also enables thedentist to know which parts of the jaw are scanned and whichparts still need to be scanned. The interface then promptsthe dentist to digitize some range points for each image seg-ment. These range data are used in the fusion stage. Finally,3-D registration is performed and a complete metric model isobtained. This model is converted into a format suitable forrapid prototyping such that a solid cast can be constructed.

Although data acquisition is relatively fast, referencing thelocation and orientation of the patient’s head is very crucial, es-

pecially to register different acquisitions taken at different times,or during the same session when the patient is allowed to movethe head. For this reason, four points (see Fig. 6) on the patient’shead are captured before starting the data acquisition stage andbetween the different reconstruction stages. The points definethe positions of the upper and lower jaw relative to a commoncoordinate system.

Experiments were conducted on different pediatric subjectsat the Orthodontics Department, University of Louisville, Ken-tucky. The system was used to reconstruct an accurate model ofindividual tooth or teeth segments. Fig. 7(a) shows an exampleof two images taken for a patient’s tooth. The complete tooth

544 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 5, MAY 2000

Fig. 6. Four points are used to reference the upper and lower jaws to the same 3-D coordinates system. This allows for patient head movements between acquisitionstages. Also, these points are used to register different acquisition at different times for the same patient.

Fig. 7. (a) Intraoral intensity images with range data marked as cross signs.(b) Three-dimensional visible surfaces obtained from the SFS and the fusionprocess. (c) A visible surface mesh obtained from registering the two views.

surface is covered in these two images. Using the reference data,we applied the fusion algorithm. This resulted in the 3-D visiblesurfaces shown in Fig. 7(b). Using the registration procedure onthese two data sets produced the complete surface model of thetooth as shown in Fig. 7(c).

More results of applying the reconstruction algorithms on an-other subject are shown in Fig. 8. The resulting jaw models haveacceptable accuracy and are faithful enough to show all of theinformation about the patient’s actual jaw in a metric space.Both the time and convenience for the patient must be consid-ered when comparing these results with the results from scan-ning a cast. The reconstructed digital model can be of enormous

help to the clinician in addition to numerous other advantages.A few of the suggested advantages include helping the clinicianinspect subtle problems, assisting in indirect operative proce-dures, improving the communication with patients because theycan now see a 3-D realization of their jaw, and achieving morethorough examinations, which in turn leads to more completediagnosis, documentation of dental diseases, and treatment. Ya-manyet al. [28] performed further processing on the digital jawmodel. Fig. 8(c) shows the result of the teeth segmentation andidentification stage. The color code is used to visualize indi-vidual teeth segments.

Experimental Validation:The accuracy of the reconstructedsurfaces is of major importance in the dental research. Foursources of inaccuracy exist in the developed approach.

1) The Camera Calibration Process: In the proposed system,we used the camera calibration technique developed inour laboratory [29]. The approach is based on the neuro-calibration technique, in which the calibration problem ismapped into a learning problem of a multilayered feed-forward neural network. The key advantage of this tech-nique, as opposed to others, is that it starts the nonlinearminimization process from a random initial point withoutsacrificing the calibration accuracy. The calibration ac-curacy, measured in terms of 3-D reconstruction error, isfound to be mm.

2) The 3-D digitizer used has a reported accuracymm.

3) The registration technique used has a reported accuracymm [14]. It should be noticed that the registration

process is used to compensate for the digitizer accuracyand the occasional patient’s head movements.

4) The SFS and Data Fusion Process: To validate the SFSand data fusion methods and to find the accuracy of theproposed system, we have used a ground-truth densedepth map obtained from a laser range scanner. The rangescanner also provided intensity images registered withthe depth map. Although laser scanners have accuracylimitation, their output can be considered as ground truthbecause they are now widely accepted in the dentistrypractices. The root mean square (rms) error betweenthe integrated surface and the ground truth is used as ameasure of the system performance. Two different typesof surfaces, a smooth and a free-form surface, were in-vestigated. The neural network topology that best suited

YAMANY et al.: 3-D RECONSTRUCTION SYSTEM FOR THE HUMAN JAW 545

Fig. 8. (a) Some intensity images of a patient’s teeth. (b) and (d) The upper and lower digital jaw models of this patient. (c) The result of the teeth segmentationand identification stage—different shades are used to show teeth separation.

Fig. 9. The rms error in millimeters between the integrated surface and theground truth is used as a measure of the system performance. Two differenttypes of surfaces, a sphere as a smooth surface and a scanned tooth (free-form)surface, were investigated.

the data fusion was analyzed. The best performance (rmsof 0.24 mm for the free-form surface and 0.132 mm forthe smooth surface) has been obtained with the networktopology 2-7-3-1, that is, two neurons in the input layer(the pixel coordinates in the image), two hidden layerswith seven and three neurons, respectively, and oneneuron in the output layer (the depth measurement).Fig. 9 shows the estimated rms error with different

sampling rates. In general, the rms error for the smoothsurface is smaller compared with the free-form surface.This is expected because the surface approximationprocess tends to smooth the surface where range dataare not available. This produces a large rms error inthe case of free-form surfaces. The results show thathigher sampling rate increases the accuracy in the caseof free-form surfaces; yet this increases the time andcomplexity to acquire the data. However, the samplingrate has minimal effect in the case of smooth surfaces.In the dental application, most of the teeth surfaces canbe considered smooth. Rough edges are only found inthe top portion of the teeth surface. In this case, most ofthe reference points are obtained in this area. The aboveanalysis shows that the system can achieve acceptableaccuracy with only a small number of reference points.

From the above analysis of the four sources of accuracylimitation, it can be concluded that the system can achievean overall submillimeter accuracy in reconstructing thejaw model. Fig. 10 shows the superposition of a recon-structed jaw model using the proposed techniques and thelaser scanned model of the cast of the same subject. Theabove analysis shows that the system can approach theaccuracy of the laser range scanner. Another validationexperiment was conducted to compare the reconstructedmodel to the actual subject’s jaw. This was done by com-paring orthodontics measurements done on both the ac-tual jaw and the reconstructed one. Forty measurementswere collected and compared with their correspondingdistance measurements on the digital model, as shown

546 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 5, MAY 2000

Fig. 10. (a) The jaw model obtained using our proposed approach. (b) The laser scanned model of the cast of the same jaw. (c) The two models superimposed(the dark represents the laser scanned model).

Fig. 11. Corresponding distance measurements on the actual subject’s jaw andthe reconstructed model.

in Fig. 11. The average error in distance calculation wasfound to be 0.74 mm, which is an acceptable resolutionfor many orthodontics and maxillofacial applications.

VII. CONCLUSIONS ANDFUTURE EXTENSIONS

The 3-D reconstruction of the human jaw has tremendous ap-plications. The model can be stored along with the patient dataand can be retrieved on demand. The model can also be used intelemedicine; that is, it can be transmitted over a communica-tion network to different remote dentists for further assistancein diagnosis. Dental measurements and virtual restoration canbe performed and analyzed. This work enables many orthodon-tics and dental imaging researches, applied directly to a digitaljaw model, not to a cast, using computer vision and medicalimaging tools.

This paper describes the result of the first phase in a projectaimed to replace current dental modeling practices. The nextphase includes the analysis and simulation of dental operations,including tooth alignment, implant planing, restoration, andmeasurement of distances and orientation of teeth with respectto each other. We have some preliminary results in this secondphase [28].

Fig. 12.Considering the surface to be a set of triangular patches, each with anormalNNN , the dot product of the light directionLLL and the normalNNN is calledthe reflectance, which determines the image intensity at the corresponding pixellocationmmm.

APPENDIX

MODIFIED SFS ALGORITHM

From Fig. 12 and using (1), we can define the following;

(8)

(9)

(10)

The unit normal to the patch formed by , , and canbe calculated as follows:

(11)

YAMANY et al.: 3-D RECONSTRUCTION SYSTEM FOR THE HUMAN JAW 547

The reflection function defined by the SFS is written as

Thus, the SFS brightness equation becomes

(12)

The solution to the SFS problem is to find such thatis minimized. Using Taylor expansion and Jacoby iterativemethods, can be found by iteration as follows:

(13)

(14)

Let , where

(15)

(16)

Algorithm: The steps involved in the implementation are asfollows.

Step 1) Read image , light direction , and cameraparameters .

Step 2) Initialize .Step 3) get and as shown in (9) and (10).Step 4) Calculate as shown in (11).Step 5) Obtain the error using the brightness equation (12).Step 6) Estimate the new using (13)–(16).Step 7) Repeat steps 3)–7) until ,

where is a predefined positive threshold.Step 8) Recover the surface 3-D points using (8).Step 9) Construct triangular patches as shown in Fig. 12.

REFERENCES

[1] P. F. van der Stelt and S. M. Dunn, “3D-imaging in dental radiography,”in Advances in Maxillofacial Imaging, A. G. Farman, Ed. New York:Elsevier Science, 1997, pp. 367–372.

[2] R. L. Webber, R. A. Horton, D. A. Tyndall, and J. B. Ludlow,“Tuned-aperture computed tomography (TACT). Theory and applica-tion for three-dimensional; Dento-alveolar imaging,”Dentomaxillofac.Radiol., vol. 26, pp. 51–62, 1997.

[3] P. F. van der Stelt, S. M. Dunn, O. Tokuota, L. Jurgens, and J. H. Slater,“Comparison of two localized CT-techniques for detecting periodontalangular bone defects,”J. Dent. Res., vol. 75, p. 128, 1996.

[4] D. Laurendeau and D. Possart, “A computer-vision technique for theacquisition and processing of 3D profiles of dental imprints: An appli-cation in orthodontics,”IEEE Trans. Med. Imag., vol. 10, pp. 453–461,Sept. 1991.

[5] C. Bernard, A. Fournier, J. M. Brodeur, H. Naccache, and R. Guay,“Computerized diagnosis in orthodontics,” inProc. 66th Gen. SessionInt. Assoc. Dental Res., Montreal, 1988.

[6] F. P. van der Linden, H. Boersma, T. Zelders, K. A. Peters, and J. H.Raben, “Three-dimensional analysis of dental casts by means of op-tocom,”J. Dent. Res., vol. 51, no. 4, p. 1100, 1972.

[7] S. N. Bhatia and V. E. Harrison, “Operational performance of the trav-eling microscope in the measurement of dental casts,”Br. J. Orthod.,vol. 14, pp. 147–153, 1987.

[8] K. Tanaka, A. A. Lowe, and R. DeCou, “Operational performance of thereflex monograph and its applicability to the three-dimensional analysisof dental casts,”Amer. J. Orthod., vol. 83, pp. 304–305, 1983.

[9] A. A. Goshtasby, S. Nambala, W. G. deRijk, and S. D. Campbell, “Asystem for digital reconstruction of gypsum dental casts,”IEEE Trans.Med. Imag., vol. 16, pp. 664–674, Oct. 1997.

[10] M. Ahmed, S. M. Yamany, E. E. Hemayed, S. Roberts, S. Ahmed, andA. A. Farag, “3D reconstruction of the human jaw from a sequence ofimages,” inProc. IEEE Comput. Vision Pattern Recognit. Conf. (CVPR),Puerto Rico, June 1997, pp. 646–653.

[11] S. M. Yamany, A. A. Farag, D. Tasman, and A. G. Farman, “A robust3-D reconstruction system for human jaw modeling,” in2nd Int. Conf.Med. Image Comput. Comput.-Assisted Intervention (MICCAI’99),Cambridge, U.K., Sept. 1999, pp. 778–787.

[12] S. M. Yamany and A. A. Farag, “A system for human jaw modelingusing intra-oral images,” inProc. IEEE Eng. Med. Biol. Soc. (EMBS)Conf., vol. 20, Hong Kong, Oct. 1998, pp. 563–566.

[13] F. Tong and B. V. Funt, “Removing specularity from color images forshape from shading,” inComputer Vision and Shape Recognition, A.Krzyzak, T. Kasvand, and C. Y. Suen, Eds. Singapore: World Scien-tific, 1989, vol. 14, Computer Science, pp. 275–290.

[14] S. M. Yamany, M. N. Ahmed, and A. A. Farag, “A new genetic-basedtechnique for matching 3D curves and surfaces,”Pattern Recognit., vol.32, p. 1817, 1999.

[15] S. M. Yamany and A. A. Farag, “Free-form surface registration usingsurface signatures,” inProc. IEEE Int. Conf. Comput. Vision (ICCV),vol. 2, Greece, Sept. 1999, pp. 1098–1104.

[16] B. K. P. Horn and M. J. Brooks,Shape from Shading. Cambridge, MA:McGraw-Hill, 1989.

[17] K. Kimmel and A. Bruckstein, “Tracking level sets by level sets: Amethod for solving the shape from shading problem,”Comput. VisionImage Understanding, vol. 62, pp. 47–58, 1995.

[18] G. Q. W. and G. Hirzinger, “Learning shape from shading by a mul-tilayer network,”IEEE Trans. Neural Networks, vol. 17, pp. 985–995,July 1996.

[19] R. Zhang, P. Tsai, J. Cryer, and M. Shah, “Analysis of shape fromshading techniques,” inProc. IEEE Comput. Vision Pattern Recognit.Conf. (CVPR), Seattle, WA, June 1994, pp. 377–384.

[20] A. Pentland,Extract Shape From Shading, 2 ed. New York: AcademicPress/MIT Media Lab, 1988.

[21] P. S. Tsai and M. Shah, “A fast linear shape from shading,” inProc.IEEE Comput. Vision Pattern Recognit. Conf. (CVPR), Urbana, IL, June1992, pp. 734–736.

[22] K. M. Lee and J. Kuo, “Shape from shading with perspective projection,”Comput. Vision, Graphics, Image Process.: Image Understanding, vol.59, pp. 202–212, 1994.

[23] J. K. Hasegawa and C. L. Tozzi, “Shape from shading with perspectiveprojection and camera calibration,”Comput. Graphics, vol. 20, no. 3,pp. 351–364, 1996.

[24] K. M. Lee and J. Kuo, “Shape from shading with generalized re-flectance map model,”Comput. Vision Image Understanding, vol. 67,pp. 143–160, Aug. 1997.

[25] G. Xu and Z. Zhang,Epipolar Geometry in Stereo, Motion and ObjectRecognition. Amsterdam, The Netherlands: Kluwer Academic, 1996.

[26] M. G.-H. Mostafa, S. M. Yamany, and A. A. Farag, “Integrating shapefrom shading and range data using neural networks,” inProc. IEEEComput. Vision Pattern Recognit. Conf. (CVPR), Fort Collins, CO, June1999, pp. 15–20.

[27] B. J. Besl and N. D. McKay, “A method for registration of 3-D shapes,”IEEE Trans. Pattern Anal. Machine Intell., vol. 14, pp. 239–256, 1992.

[28] S. M. Yamany, A. A. Farag, and N. A. Mohamed, “Orthodontics mea-surements using computer vision,” inProc. IEEE Eng. Med. Biol. Soc.(EMBS) Conf., vol. 20, Hong Kong, Oct. 1998, pp. 536–539.

[29] M. T. Ahmed, E. E. Hemayed, and A. A. Farag, “Neurocalibration: Aneural network that can tell camera calibration parameters,” inProc.IEEE Int. Conf. Comput. Vision (ICCV), vol. 1, Greece, Sept. 1999, pp.463–468.