chemical images: technical approaches and issues

10
Chemical Images: Technical Approaches and Issues Don Clark * and Slobodan S ˇ as ˇi c * Pharmaceutical Sciences, Pfizer Global R&D, Sandwich, Kent, United Kingdom Received 29 September 2005; Revision Received 20 February 2006; Accepted 1 March 2006 Chemical mapping techniques using Raman microscopy are introduced, and using an example of a pharmaceutical tablet, the practical aspects of data collection and process- ing to produce a chemical image of the sample are detailed. Issues related to data processing, instrument standards, chemical image reportable errors, and the inter- pretation of chemical images are presented to encourage debate, develop solutions, and promote use in other chal- lenging scientific applications. q 2006 International Society for Analytical Cytology Key terms: chemical images; pharmaceuticals; Raman mapping microscopy; chemometrics; principal compo- nent analysis (PCA) The focus of this article is to introduce the Journal’s au- dience to chemical mapping (the technique) and chemical images (the visual result of the experiment) using pharma- ceutical samples as illustrations. The link to cytometry at this point is probably not clear. However, for example, flu- orescence is regularly used in biological tissue analysis to identify the location of specific tissue types or disease states, monitor pH and protein–protein interactions. Linked to a visual image, this is a type of sample mapping. Chemical mapping takes this approach a step further by providing chemical information at every point across the measured sample, and not just a yes/no response as typi- cally occurs with fluorescence measurement. The result- ing chemical image has the potential to identify not only the material present at any point, but its form too. In phar- maceutical analysis, these forms are typically polymorphs and hydrates. In basic terms chemical mapping data is obtained from a sample using an optical microscope coupled to a spectrom- eter. The spectrometer provides the analytical light source and the spectral detection elements, while the microscope provides focussing and data collecting optics for both spec- tral and white light image information from the sample. The instrumentation for chemical mapping has been exten- sively reported; the listed references provide a good start- ing point for those unfamiliar with these techniques (1,2). Mapping systems should not be confused with imaging sys- tems (3,4). In a mapping experiment, a spectrum is obtained at a point. Once the spectrum collection is com- plete the sample is moved (typically 5–25 lm) and another spectrum is taken. This is continued until the analytical area has been covered. In imaging, the microscope field of view is irradiated and an array detector collects the inten- sity of a selected wavelength over the whole area to give a true image. If required another wavelength can be chosen and the process can be repeated. Both imaging and map- ping data can be expressed as a hyperspectral data cube, and using chemometrics information is extracted from the data set to produce 2D grey scale of each component. The most frequently used spectroscopies are dispersive Raman and FT-NIR (Fourier transform near infrared), and these techniques can provide data with a spatial resolution of nominally 1–25 lm, although the actual spatial resolution achieved will be discussed later. After acquisition, the data requires processing to pro- duce the chemical image of the sample. When the sample contains few components that each have strong and unique bands in their respective spectra, univariate analy- sis can be used to identify each component uniquely (3,4). The more complex the sample the greater the prob- ability of there being more bands that overlap in a single pixel making spectral interpretation based on one band difficult or impossible to achieve. Similarly, some compo- nents do not have strong responses to the excitation radia- tion. These weak signals, particularly if from a minor com- ponent, are difficult to observe using univariate analysis. To assist in analyzing these difficult data sets, principal component analysis (PCA) is used and for many applica- tions is now the data processing method of choice. PCA is a multivariate data technique that is frequently used for unraveling complex spectroscopic data. It is based on the assumption that the signal collected is line- arly related to the concentrations of the components of *Correspondence to: Don Clark. E-mail: don.a.clark@pfizer.com or Slobodan S ˇ as ˇi c. E-mail: slobodan.sasic@pfizer.com Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cyto.a.20275 q 2006 International Society for Analytical Cytology Cytometry Part A 69A:815–824 (2006)

Upload: don-clark

Post on 11-Jun-2016

223 views

Category:

Documents


1 download

TRANSCRIPT

Chemical Images: Technical Approachesand Issues

Don Clark* and Slobodan Sasi�c*Pharmaceutical Sciences, Pfizer Global R&D, Sandwich, Kent, United Kingdom

Received 29 September 2005; Revision Received 20 February 2006; Accepted 1 March 2006

Chemical mapping techniques using Raman microscopyare introduced, and using an example of a pharmaceuticaltablet, the practical aspects of data collection and process-ing to produce a chemical image of the sample aredetailed. Issues related to data processing, instrumentstandards, chemical image reportable errors, and the inter-pretation of chemical images are presented to encourage

debate, develop solutions, and promote use in other chal-lenging scientific applications. q 2006 International Society for

Analytical Cytology

Key terms: chemical images; pharmaceuticals; Ramanmapping microscopy; chemometrics; principal compo-nent analysis (PCA)

The focus of this article is to introduce the Journal’s au-dience to chemical mapping (the technique) and chemicalimages (the visual result of the experiment) using pharma-ceutical samples as illustrations. The link to cytometry atthis point is probably not clear. However, for example, flu-orescence is regularly used in biological tissue analysis toidentify the location of specific tissue types or diseasestates, monitor pH and protein–protein interactions.Linked to a visual image, this is a type of sample mapping.Chemical mapping takes this approach a step further byproviding chemical information at every point across themeasured sample, and not just a yes/no response as typi-cally occurs with fluorescence measurement. The result-ing chemical image has the potential to identify not onlythe material present at any point, but its form too. In phar-maceutical analysis, these forms are typically polymorphsand hydrates.

In basic terms chemical mapping data is obtained from asample using an optical microscope coupled to a spectrom-eter. The spectrometer provides the analytical light sourceand the spectral detection elements, while the microscopeprovides focussing and data collecting optics for both spec-tral and white light image information from the sample.The instrumentation for chemical mapping has been exten-sively reported; the listed references provide a good start-ing point for those unfamiliar with these techniques (1,2).Mapping systems should not be confused with imaging sys-tems (3,4). In a mapping experiment, a spectrum isobtained at a point. Once the spectrum collection is com-plete the sample is moved (typically 5–25 lm) and anotherspectrum is taken. This is continued until the analyticalarea has been covered. In imaging, the microscope field ofview is irradiated and an array detector collects the inten-sity of a selected wavelength over the whole area to give atrue image. If required another wavelength can be chosen

and the process can be repeated. Both imaging and map-ping data can be expressed as a hyperspectral data cube,and using chemometrics information is extracted from thedata set to produce 2D grey scale of each component. Themost frequently used spectroscopies are dispersive Ramanand FT-NIR (Fourier transform near infrared), and thesetechniques can provide data with a spatial resolution ofnominally 1–25 lm, although the actual spatial resolutionachieved will be discussed later.After acquisition, the data requires processing to pro-

duce the chemical image of the sample. When the samplecontains few components that each have strong andunique bands in their respective spectra, univariate analy-sis can be used to identify each component uniquely(3,4). The more complex the sample the greater the prob-ability of there being more bands that overlap in a singlepixel making spectral interpretation based on one banddifficult or impossible to achieve. Similarly, some compo-nents do not have strong responses to the excitation radia-tion. These weak signals, particularly if from a minor com-ponent, are difficult to observe using univariate analysis.To assist in analyzing these difficult data sets, principalcomponent analysis (PCA) is used and for many applica-tions is now the data processing method of choice.PCA is a multivariate data technique that is frequently

used for unraveling complex spectroscopic data. It isbased on the assumption that the signal collected is line-arly related to the concentrations of the components of

*Correspondence to: Don Clark.

E-mail: [email protected] or Slobodan Sasi�c.E-mail: [email protected] online in Wiley InterScience (www.interscience.wiley.com).

DOI: 10.1002/cyto.a.20275

q 2006 International Society for Analytical Cytology Cytometry Part A 69A:815–824 (2006)

the sample via the equation X 5 CA, where X is the ma-trix of experimental spectra (the size of which equals thenumber of samples 3 number of wavenumbers (recipro-cal of wavelength) at which the data is collected), C is theconcentration matrix of the components (number of sam-ples 3 number of components), and A is the matrix oftheir spectra (number of components 3 number of wave-numbers). In effect, PCA decomposes the data into sec-tions of signal and noise. By removing noise, one effec-tively and significantly improves the signal-to-noise (s/n)ratio in the data. The major results that PCA provides arethe number of components that have detectable signal,the features that are related to the spectra (known as load-ings), and the features that relate to the concentrations(known as scores). PCA is a commonly used techniqueand more in depth descriptions can be found in thenumerous books on this subject (5–7).

Although every mapping data set is a 3D data structure(two spatial dimensions and one spectral), it can be tem-porarily rearranged by appending two spatial coordinatesto produce a 2D data structure (one spatial with mixed x

and y coordinates and one spectral dimension) (8). PCA isapplied to this provisional 2D data structure, and aftercompletion of the 2D results, scores are folded back intothe original 3D form from which the chemical images arethen produced. The procedure described clearly refers togeneral problems in analyzing complex spectral data andhow the obtained results are used to produce chemicalimages; it is employed in nearly all multivariate studies ofmapping spectra.

The following sections describe in detail the experimen-tal parameters, subsequent data analysis, and interpretationof chemical images obtained from a standard pharmaceuti-cal tablet. This should serve as a practical introduction tothe technique and the typical issues encountered in thistype of analysis. In this example, the sample is mappedusing Raman microscopy, though FT-NIR microscopy couldalso have been used. The spectra from Raman measure-ments contain sharper bands and are visually more distinctthan those from FT-NIR experiments and provide clearerfigures to illustrate our experimental approach.

ANALYSIS OF A PHARMACEUTICAL TABLETUSING CHEMICAL IMAGES

Experimental

For good chemical images, a cross section of the sample isprepared to give a flat sample; this avoids the need to refo-cus the microscope during data collection. Samples are fixedto a microscope slide using an acrylonitrile adhesive, andwhen set, a cross section is cut using a Leica EM Trim (LeicaMicrosystems (UK), Milton Keynes) to provide the flat sur-face required for the analysis. The surface is viewed throughthe microscope to ensure that the section is of appropriatequality, and then a region is selected for analysis.

Raman spectra were collected on a Renishaw Rama-scope System 1000 using Wire V.2.0 software (Renishaw,Gloucestershire, UK). This is a grating based system witha slit arrangement that allows confocal measurement if

required. The spectra were obtained by exciting the sam-ple with a 782 nm diode laser that produced a 27 lm laserline on the sample surface. The mapping capability is pro-vided by an automated xy stage on which the preparedsample is placed. All the samples were viewed and Ramandata were collected through a 203 objective. The data col-lection conditions were set to cover the 900–1400 cm21

range where the strongest Raman bands of interest wereknown to exist. The spatial resolution in the x dimensionwas set at 25 lm by the experiment stepsize. The 3.8 31.4 mm2 area mapped corresponds to 151 3 51 pixels,because the spectra collected along the laser line are fullybinned. Full binning actually makes this line-acquisitionequivalent to the acquisition with the point laser source. Ifthe spectra collected along the laser line (y dimension)were not binned but separately saved and analyzed, thenumber of spectra along y would increase significantly andmuch better spatial resolution would theoretically beachieved for this dimension. However, we have found thatthis set-up did not improve the overall quality of chemicalimages and since the fully binned spectra are of much bet-ter s/n ratio than the single-pixel spectra, and thus allowfor shorter collection times, we chose to bin the spectra. Ofcourse, if the regions of interest been of a smaller size thenspectral binning would not have been used. The acquisitiontime was 10 s per sampling point across the mapped areayielding the total collection time of about 30 h.All the original spectra were baseline corrected prior to

further analysis (Fig. 1). This operation is followed by themean-normalization where every spectrum is divided byits mean, an operation that equalizes the integral area ofall spectra. It is very important to normalize the spectrabecause the Raman scattering coefficient of the compo-nents within the tablet can be very different, and thus theoverall intensity of the spectra may vary significantly. Inaddition, the morphology of the sample can also affect theresponse. By normalization, the spectra are rendered fullycomparable as the intensity dependence is minimized.Both pretreatments are carried out in ChemAnalyze(ChemImage, Pittsburgh, PA), a software package that isalso used for creating the chemical images. PCA and self-modelling curve resolution analysis are carried out inMatlab (ver. 6.0 Natick, MA).Except for ensuring that a flat surface of the tablets is

available by trimming, no other pretreatment is used forpreparing the sample. The approximate composition ofthe tablet is as follows: Avicel (microcrystalline cellulose)48%, API (active pharmaceutical ingredient) 24%, di-cal-cium phosphate (DCP) 24%, Explotab (sodium starch gly-colate) 3%, and Mg stearate (1%).

Results and Discussion

The Raman spectra collected vary significantly in inten-sity and shape. To comprehensively analyze all the spectraand ultimately produce chemical images of the compo-nents, one needs to address two key questions. The firstone is how many components contribute to the signaldetected. Although it is known how many components

816 CLARK AND SASI�C

Cytometry Part A DOI 10.1002/cyto.a

were used to produce the tablet, there is no guaranteethat all will have an identifiable signal (e.g. the signal ofminor components may be below the respective limit ofdetection (LOD) or that new species will not have formedduring manufacture or during storage (e.g. a different APIpolymorph or hydrate). The second task is to determinethe spectral variability among the experimental spectra.Ideally, one would expect to detect the pure componentspectra of all the constituents of the tablet. It would thenbe relatively easy to produce the corresponding chemi-cal images through the simplest, univariate approachwhereby the chemical images are represented as spa-tially dependent intensity variation of nonoverlappedwavenumbers or bands i.e. one per each component.This approach is based on knowledge of the chemicalidentity and thus the Raman spectra of the componentsof a tablet (Fig. 2). If the pure component spectra are notknown, it is much more difficult, although still possible,

to produce chemical images that are based on the leastoverlapped features of the underlying spectra. Theawareness of the extent of spectral variability is alsohelpful to better understand physical features (such asdomain size) in the chemical images obtained.The most frequently used data analysis tool employed

to determine the number of components present is PCA.There are many ways to find out from PCA how manyspectrally independent species contribute to the total sig-nal. The basic approaches are the analysis of eigenvaluesand monitoring the shapes of the associated scores andloadings. PC eigenvalues are scalars obtained during alge-braic decomposition of the covariance matrix of the origi-nal spectra that also yields eigenvectors that are called‘scores’ and ‘loadings’. The PC eigenvalues monotonicallydecrease with the number of components and level offwhen the complete signal is taken into account. For thedata collected here, the eigenvalues (80.72; 7.21; 2.50;

FIG. 1. Some of the 5050 experi-mental Raman mapping spectra. Thevarious baseline and significant differ-ences in the shape and intensity ofthe spectra are easily perceivable.

FIG. 2. Left: The pure component spectra of Avicel (—), DCP (h), Explotab (x), and Mg stearate (s). Right: The spectrum of pure API. This figure exem-plifies that overlapping is substantial and that thus imaging via wavenumbers that are unique for the components present (one wavenumber for each com-ponent) is very difficult.

817CHEMICAL IMAGES: TECHNICAL APPROACHES AND ISSUES

Cytometry Part A DOI 10.1002/cyto.a

1.74; 0.50; 0.46; 0.25; 0.24; 0.20; 0.16 . . . in percentage ofaccounted variance) sharply decrease after the first PC(Fig. 3), and based on this behavior, one concludes that itis quite likely that there are only two components present.This somewhat atypical conclusion (where one PC refers totwo chemical species) is the consequence of the pretreat-ment and the intrinsic structure of the data. Mathemati-cally, because the data are normalized and mean-centred,the first PC covers the features of the two strongest spec-tral contributors and in this particular case these are sostrong that the rest of the spectral data is completely over-whelmed. In another case where a sample containsnumerous components of comparable Raman scatteringcoefficients, the above eigenvalues dependence may besignificantly different and this is an important point. Usingchemometrics alone, components at low levels may bemissed, it becomes a much more valuable tool if used withspectroscopic information since pure component spectracan often be matched to the PCA results. Therefore, a PC

with low eigenvalues can sometimes be used with confi-dence, for example when the spectral contribution isfrom a minor component.

Similarly to the analysis of the eigenvalues as describedabove, the scores (the features that relate to chemicalconcentrations or in this case chemical images) and load-ings (the features that relate to the pure component spec-tra) should become noticeably different as the PCs startto describe noise. The scores and loadings referring tonoise are ideally shapeless whereas those relating to sig-nal are of defined shapes that can often be associatedwith the pure component spectra. Since we only have in-formation about the pure component spectra, we have tovisually compare the loadings with the shapes of the purecomponent spectra. Figure 4 shows several loadings thatdemonstrate that the separation of the loadings on the ba-sis of their appearance is not necessarily linked to chemi-cal identity. The first loading separates features of the APIand Avicel, the second one contains clear spectral indica-tion of the second most abundant excipient DCP, whilethe fourth one features some of the bands of the minor ex-cipient Explotab. The third loading is shown to illustratethat it is not of a random shape. Quite surprisingly, the 9thloading clearly shows the three peaks of Mg stearate, theleast abundant excipient. Although the loading positionfor Mg stearate is a surprise, it has a scientific rationalebased on its use in pharmaceutical tablets as a lubricant.As such it should be finely dispersed in, or coated ontothe surfaces of the other components that are present asidentifiable domains. Detection of a component at 1% isgenerally not a problem for mapping systems as mostmaterials retain a detectable particle size throughout tablet-ting processes. However as a lubricant, Mg stearate isspread throughout the sample and its particle size isreduced such that it is below its LOD. Put simply, take theexample of a 10 3 10 matrix in which just one point is aspecific pure component (i.e. 1% of total). Now take that

FIG. 3. Variation of Eigenvaules with the principal components. Theplot indicates the presence of only two components (see the text fordetails) since the first PC is far more influential than the rest of PCs.

FIG. 4. The PC loadings 1, 2, 3, 4,and 9 (top to bottom). All the load-ings apart from the most bottom oneare offset for clarity. The stars markthe peaks that are associable with thepeaks of the pure components. In thefirst loading, one recognizes peaksof API at 1000 cm21 and Avicel at1090 cm21 (originally this is a nega-tive peak). The second loading fea-tures strongest peak of DCP at985 cm21. The third one containspeaks of all components and is thusinadequate, while the fourth onehas a peak at 940 cm21 and the fea-tures at the very end of the wave-number region that can be relatedto Explotab. Finally, the 9th loadingclearly shows three negative peaksthat match the position of the peaksof Mg stearate. This will cause thecorresponding score image of Mgstearate to be negative (see Fig 6).

818 CLARK AND SASI�C

Cytometry Part A DOI 10.1002/cyto.a

component and distribute it over all 100 matrix points. Atany one point, the component concentration is now0.01%. The concentration of the component in the com-plete matrix is still 1% but its detection at any one point isnow not possible because it is below the LOD. This princi-ple explains the low loading value for Mg stearate as it isnever present at a high concentration in any data point inthe full data set.

Returning to the chemometric analysis, the PCA pro-vides a somewhat inconclusive result. The analysis ofeigenvalues does not indicate whether more than two orthree components are present in the sample (it is veryunlikely that the difference between the 4th and 5theigenvalue is meaningful), while the PC loadings are ofstructured shape and are not at all indicative of the thresh-old between signal and noise. For the analyzed tablet, it isknown that there are five distinct chemical species thatimplies that the maximum number of signal related PCsshould be five and yet the 9th PC loading is obviously asso-ciated with one of the species. The intermediate PC load-ings are associated with spectral noise or unassignable ex-perimental or sample contributions. However, despitethese somewhat ambiguous results of PCA, the PC load-ings are of tremendous help for producing chemicalimages of the components of the tablet. The features ofMg stearate that are clearly present in PC loading no. 9allows a chemical image of that component to be pro-duced; this component is invisible using a univariateapproach. Similarly, there is a loading associated withExplotab that is also very difficult to image via characteris-tic wavenumbers because of the weakness of its signal.Therefore, PCA is indispensable for producing chemicalimages, despite several key aspects of it not being entirelyin line with the theoretical expectations.

To better understand the significance of the PCA result,we will now address the second question regarding theextent of spectral variation across the data matrix; in thisexample, it consists of several thousand spectra.

To simultaneously consider such a large number ofspectra, we will use the orthogonal projection approach(OPA) (9), an advanced linear algebra method for deter-mining the most unique spectra in the spectral matrix.OPA belongs to the family of self-modelling curve resolu-tion (SMCR) (10) methods that are based on the principlethat the spectra can be represented as vectors in an n-dimensional space (where n is the number of wavenum-bers), and that therefore similarity/uniqueness of the spec-tra is equivalent to the angles among the vectors (smallangle high similarity, high angle very different spectra). Inthe first step of the algorithm, a series of determinants arecalculated in which the spectrum that is geometrically mostdistant from the mean spectrum of the complete data ma-trix is identified. In the next step, the spectrum most or-thogonal to the one above is extracted, and then in thethird step, the spectrum most orthogonal to the spacespanned by the first two found spectra is identified and soon. The calculations stop when the angle among the spec-tra and spaces spanned becomes quite small. This meansthat the remaining spectra can be shown as linear combina-

tions of the previously extracted spectra. The spectraextracted in this way represent the boundaries insidewhich all other spectra can be found. For example, if in theexperimental spectra there were pure component spectraof all the components of the tablet, these five spectrawould be serially extracted and all other spectra in the setwould represent a linear combination of these five purecomponent spectra. However, this is a very rare occurrencein practice because of spectral mixing during data collec-tion. The Raman spectra generally appear thoroughly over-lapped with each other, and unique Raman signals of minorcomponents are generally undetectable as they are belowthe experimental limit of detection. OPA is used in thiswork to determine the most dissimilar (unique) spectra inthe analyzed mapping data and to assess the prospects ofunivariate imaging based on these spectra. The univariateimages produced via OPA selected spectra can then becompared with the multivariate images from PCA.The most unique spectra selected by OPA are shown in

Figure 5. The spectral signatures of API and Avicel aredominant, while only the weak shoulder at 980 cm21 indi-cates the presence of an additional component (DCP).There is no explicit indication of Mg stearate or Explotab,the detection of the latter being additionally compoundedby the strong overlap with the spectrum of Avicel. Thus,judging from Figure 5, two out of the three most abundantcomponents are easy to image (API and Avicel) via thepeaks at 1000 and 1090 or 1130 cm21, respectively, whilethe Raman signal for DCP is rather weak and it is hencerisky to univariately image DCP at 980 cm21 because ofpossible interference of the strongest API peak at 1000cm21. In short, the OPA selection of the most unique spec-tra shows that the prospect for univariate imaging of all fivecomponents is poor because only the two componentsaccounting for about 75% of the total mass can be easilyimaged, while the rest is either nonimageable through theoriginal data set (no unique band) or require additional cal-culations (e.g. to prove the identity of the univariate DCPimage). If the univariate chemical images of all the compo-nents are produced only on the basis of knowing the spec-tra from Figure 2 and without the inspection of the experi-mental mapping spectra, one can wrongly conclude thatthe minor components are homogenously dispersedthroughout while in fact there is no detectable signal fromthose components and subsequent univariate imageswould be based on noise. However, as exemplified above,PCA clearly opens possibility to image these components.The chemical images obtained as refolded PC scores

and via univariate wavenumbers (where possible) areshown in Figure 6. The similarity between the score andthe univariate image of DCP confirms that 980 cm21 canbe used as the univariate point despite the fact that this isonly a shoulder to the very strong band of the API at1000 cm21. Proof for this is in the 2nd PC loading that fea-tures clear spectral features of DCP (Fig. 4) so that the scoreimage of DCP is very reliable. The univariate and scoreimages of the API and the Avicel are very similar, while itwas not possible to produce the univariate but only thescore images of Explotab and Mg stearate. The image of

819CHEMICAL IMAGES: TECHNICAL APPROACHES AND ISSUES

Cytometry Part A DOI 10.1002/cyto.a

Explotab is rather tentative because the 3rd PC loading isconsidered to be correlated with the spectrum of Explotabbut some of the features of that loading do not correlatewith the bands of Explotab. The chemical image of Mg stea-rate appears to be more reliable (because the Mg stearatespectrum is nicely reproduced in the 9th PC loading),

although it is obtained from the PC no. 9 that is expected tobe associated with noise.A closer inspection of the chemical images in Figure 6

reveals that it is very difficult to produce a compositeimage in which all the components will be displayedsimultaneously. This occurs due to the heavy overlap ofthe Raman spectra from which the chemical images areproduced. The excitation laser light diffuses deep into thesample and effective excitation (sampling) actually farexceeds the size of the particles from which the tablet ismade. Thus, the Raman spectra obtained are always mix-tures of the Raman spectra from a number of various parti-cles and pure component spectra are not detected. Figure5 illustrates that the API and Avicel bands are present inevery single spectrum of the whole data set. As a conse-quence, if the images of these components (Fig. 6) arebinarized (i.e. each pixel is thresholded to be either on oroff with respect to the presence of that component) andadded to each other, all the pixels will be covered by ei-ther the API or the Avicel, and there will be no pixels leftfor other components although these account for 25% ofthe total mass. One option to tackle this problem is tomanually increase the binarization threshold for API orAvicel so that only high intensity points of these two areincluded in the chemical images ‘freeing’ pixels that canbe assigned to DCP and Explotab. This approach needs tobe taken with care because of the arbitrary handling ofthe binarization thresholds but undoubtedly it is veryhelpful for displaying all the components in a single image(Fig. 7). By giving each component a single color in theimage, relationships between one component and anothercan be visualized with ease.

OTHER ISSUES ASSOCIATED WITH INFORMATIONOBTAINED FROM CHEMICAL IMAGES

Production of a chemical image is very useful for pharma-ceutical samples in order to describe the composition of asample (e.g. what is in the formulation, how is each compo-

FIG. 6. Left: The PC score chemical images of API, DCP, Explotab, andMg stearate. Right: The univariate chemical images of API, Avical, andDCP. The score image of Avicel could not be produced because the corre-sponding loading features the peaks of both API and Avicel (being of op-posite sign). The loading that corresponds to Mg stearate features threenegative bands of Mg stearate so that the negative (black) pixels in thecorresponding chemical image correspond to the highest content of Mgstearate.

FIG. 5. The most unique spectra in the entireset of 5050 Raman mapping spectra. The legendassists to recognize to which component arethese spectra related. For example, the greenspectrum is taken at the spatial position richwith API and hence the Raman spectrum at thatposition the API peaks are very strong. Note thatthe signal of API and Avicel can be detected ineach of these three spectra, while the strongestindication of DCP is just a shoulder to the APIpeak at 1000 cm21.

820 CLARK AND SASI�C

Cytometry Part A DOI 10.1002/cyto.a

nent distributed, is there any association between two ormore components, and have any materials changed formduring processing). It is also used for identifying differencesbetween two samples that have different performance char-acteristics (e.g. dissolution profile, friability, stability etc.).However, there are a number of issues with chemicalimages post – image generation that users should be awareof, as these may have implications that could affect the con-clusions drawn from these complex data sets.

INSTRUMENT ACCURACYAND PRECISIONChecking System Calibration

In general, the spectral calibration of mapping systemsis adequate but it could be improved. Raman systems ei-ther use silicon or emission lines from light emittingdiodes. These standards are excellent for confirminginstrument performance, as the bands are strong, sharp,and well resolved. The emission lines are collectedthrough a defocused system, while silicon provides anoptically flat surface to collect the single sharp band at520 cm21 Raman shift. In both cases, these calibrationsamples and collection methods are not typical of theproduct measured on a daily basis. Real samples may haveweak or broad Raman bands or have a surface morphologythat is not ideal for chemical imaging (i.e. it is rough orstepped etc. and gives alignment/focus problems). For theNIR systems, there is a standard, made from 3 inorganicmaterials, designed to be used on bulk analysis NIR instru-ments that can also be used on mapping systems. Unfortu-nately, the standard is fabricated from a mixture of coarserare earth oxides particles so when used as a calibrant ona microscope-based systems, the resultant data is of indivi-dual spectra from each component rather than a compos-

ite spectrum with contributions from each component.Although not perfect, these standards ensure that thespectral resolution of the instruments can be confirmed.Of much more concern is confirming the spatial resolu-tion of any system, the value generally quoted is based onthe optical magnification available on any instrument.Quoted spatial resolutions of 1 lm are not uncommon,one Raman system even boasts a spatial resolution of0.25 lm but these can only be used as a guide, because todate no spatial standard based on spectrally different mate-rials has been fabricated.Production of a spectral and spatial standard is a real chal-

lenge, and is one being considered by Raman and NIR sub-committees of the ASTM. If one considers light microscopes,spatial calibration is performed using an etched graticulewith divisions of known absolute size, these are used toaccurately determine the actual magnifying power of a lens(e.g. 19.93 rather than 203). They are relatively easy to pro-duce as only one parameter is being measured, namelylength. For mapping applications, the standard must enableboth length and the spectral response of the instrument tobe calibrated. This requires the standard to be fabricatedfrom two spectrally different materials that are accuratelyseparated by visually distinct distances appropriate forthe measurements being made (e.g. >1–25 lm). Ideally, thestandard would provide the spatial discrimination of theUSAF 1951 target (Fig. 8) with target elements and back-ground being made of spectrally different materials. This for-mat unequivocally demonstrates at which point two smalladjacent optical features appear as a single feature in a chem-ical image. This ‘‘gold’’ standard is a long way from becomingreality. To our knowledge, the best that has been achieved todate is the measurement of latex/polystyrene spheres in acellulose matrix. While this allows an approximation of aspatial calibration to be obtained, its use as a universal stand-ard is limited, particularly as there is no positional referen-cing of any specific sphere in the sample (though this couldbe overcome using referenced sample mounts (11)).

FIG. 8. The USAF 1951 target.

FIG. 7. The composite image of the tablet components (a different tab-let to the one used in Figure 6). The size of this image is approximately2 3 2 mm2. The spatial resolution is as described in the experimental sec-tion. The image is obtained after suitably heightening the binarizationthreshold for the API and Avicel that reduced the number of pixelsassigned to these two components to the values that correspond to theactual mass percentage. The color bar only helps to distinguish the differ-ent components—the numbers do not imply any quantitative measure.

821CHEMICAL IMAGES: TECHNICAL APPROACHES AND ISSUES

Cytometry Part A DOI 10.1002/cyto.a

System Focus

The discussion above assumes that spatial resolution isonly an issue in the xy plane i.e. the image plane of thesample. For that to be true, the collection of data mustcome from an infinitely thin sample. As samples are quitethick there will be spectral contributions from the Z axisas well as the x and y axes. Recent Raman mapping stu-dies using polymer systems have demonstrated spectralcontributions from components 5 mm below the samplesurface (12). Chemical mapping (as opposed to globalchemical imaging), in our experience, is the best methodof reducing depth contributions in a chemical image. Inthe imaging system the excitation energy is defocused,which leads to a relatively large area of the sample beingilluminated and greater penetration into the sampleoccurs. This makes the Raman signal of minor compo-nents very difficult to detect as it is overwhelmed by thesignal from the major components. These minor compo-nents seem only to be imageable on the mapping system(Fig. 6). Further improvements to the mapping experi-ment may be made by measuring the sample in a confocalmode or by using a higher magnification microscopeobjective to further reduce the depth of spectral contribu-tions; however, both of these approaches will lead to a sig-nificant increase in the data collection time.

A further consideration is that our samples comprise anumber of different components. Each of those compo-nents has irregular 3D shapes, which will lead to differentdegrees of reflection, penetration, and absorption of theexcitation energy and of the returning analytical signal. Amaterial with a strong response buried in a sample can beobserved at a specific point in a chemical image pixelwhere the surface comprises a material with a weakerresponse. As the sample morphology at the surface andthroughout the bulk is not optically perfect, it is possiblefor a component outside of the excitation area to contri-bute to the analytical signal and again this is a significantissue for global imaging systems. In our experience, whencontinuous solid matrix samples (e.g. compressed pow-ders) are being studied, reliable chemical images can onlybe obtained if each of the components has minimumdimensions of 50–100 lm. If, however, the sample con-sists of discrete particles or visually discrete areas, the in-formation from imaging systems is without equal (13).

In summary and in conclusion, there are complex inter-actions between the excitation energies used in chemicalmapping instruments and the samples being studied. As aresult, the analytical information obtained from any pointon a sample is primarily from the sample surface, butthere will be a variable spectral contribution from the sur-rounding sample volume that is dependant on the opticaland chemical properties of the materials involved. Toassist those using chemical images to solve problems orunderstand complex systems, each image should contain

i. the spatial resolution of the system and, therefore,the accuracy of any absolute measurement taken from thechemical image (e.g. 12 6 3 lm).

ii. a corrected chemical image, which excludes spectralcontributions from outside the nominal data collectionvolume surface, with a ‘‘reliability factor’’ to account forerrors in the computation of the image.

In reality, a spectral and spatial chemical mapping stand-ard is achievable but requires scientific motivation and/orcommercial feasibility to make it a reality. Corrected chem-ical images previously described in the article are muchmore challenging and require significantly more knowl-edge about the components in every sample, their interac-tions with each other and excitation energies and the de-velopment of robust algorithms to reprocess the largemapping data sets. At this time this ideal is unfeasible.However, binarization thresholding offers a simple butcrude approach to this problem, and has been successfulin improving the information content of chemical images.

CHEMICAL IMAGE ANALYSIS

Assuming that the errors associated with producingchemical images are known, there are still a number ofconsiderations to be made when using chemical images.As stated earlier, our samples are evaluated at 1–25 lmstep sizes. At this degree of spatial resolution, all chemicalimages from the same sample are different with respect to% composition, particle size, and absolute component dis-tribution. Put simply, each sampled area not the same asothers despite the fact that the bulk analysis of the sample(e.g. a whole tablet) meets its specification and is equiva-lent to any other sample from the same origin. A challen-ging question is to define the minimum sampling size thatgives a chemical image that is consistent with the bulksample composition. How big should a sampled area be,and at what spatial step size should measurements betaken given knowledge about ingoing materials and themanufacturing processes used? Similarly, how many dis-crete areas should be sampled, and what are their spatialrelationships to the bulk sample, to ensure the data is rep-resentative of the whole sample? There is currently noeasy, if any, answer to any of these questions. As a result,chemical images at best can be used to identify trends in asample’s microscopic composition, and then correlatedwith sample performance (e.g. dissolution/potency/pro-cessability).Interpretation of chemical images should also be per-

formed with care. The issues associated with absolute par-ticle dimensions have already been presented. Anotherfactor to consider is the validity of any particle/domainmeasurement made from a chemical image. In all ourexperiments, the sample is presented with a flattened sur-face when powders are being studied, or as a carefully cutcross section of solid matrix samples such as tablets. Con-sideration should be given to the particle size/distributionof the spherical components as illustrated in Figure 9.From this hypothetical chemical image of the sample crosssection, one would deduce that there were a number ofcircular domains within the sample of different sizes. The

822 CLARK AND SASI�C

Cytometry Part A DOI 10.1002/cyto.a

reality of the situation is that circular domains come froma number of spheres of equal size that have been sec-tioned at different depths. Therefore, for a spherical fea-ture in a continuous matrix, chemical images will consis-tently underestimate the true diameter of this component.Conversely cut a cube obliquely or across the diagonal ofone or more of its faces and some of the dimensions willbe overestimated. Thus any single chemical image canonly ever be an estimate of the particle shapes, sizes, anddimensions in the sample.

There are two potential solutions. One is to take a num-ber of cross sections of a single sample. As the cross sec-tions are at different depths, a better measurement of par-ticle/domain shape can be made. This, however, is a timeconsuming process.

Alternatively, a depth profiling experiment can beattempted to obtain better quality Z axis data. Again this istime consuming, and further into the sample the measure-ments that are taken increases the probability of errors indetermining the true penetration of the excitation energy(14). Care should also be taken to ensure that the sam-pling spot size and step size are known accurately. This isinformation that is not taken into account during the dataprocessing; all that is required there is which pixel followswhich, and what the spectral content of each pixel is. Fig-ure 10 shows an example where the sampling spot size isthe same as the sampling step size. From this, one coulddeduce that the blue component has bigger domains than

the red domains that in turn are bigger than the greenareas. The software used to generate chemical imagestreats the data as adjacent points with no spacing, soalthough space bars can be produced, the images do notshow areas from which data has not been collected. This isnot an issue when the sampling spot size and step size areequivalent. The second chemical image in Figure 10 mapsexactly the same area but the spot size is now smaller thanthe sampling step size. As not all of the sample surface hasbeen measured, the true chemical image on the rightclearly shows where no data has been collected, and as aresult gives a different assessment of particle size whencompared with the image on the left. It is of great impor-tance that the experimental conditions used to ultimatelycreate chemical images are known; if they are not then theconclusions made from the images may be erroneous.Chemical images are commonly used to see if the micro-

scopical composition of samples can be linked to, for exam-ple, product performance. Currently, much comparison ofchemical images is done by eye. Where differences are largeit is possible from visual observation to identify what is actu-ally different between two chemical images (e.g. differentsize domains or distributions of one or more components, afavored spatial adjacency of two or more components orthe presence of a component unique to one image). Asimages become more similar, identifying these becomesmore and more subjective. There is currently an unmetneed for an indexing system that can be used to show howdifferent two images are. This will probably require the useof a chemometrics or image analysis approach, but it isexpected that the solution will not be trivial!

CONCLUSIONS

This description of chemical mapping has focussed onour approaches to producing chemical images and theissues associated with instrumentation, data processing,and image information and interpretation. The experimen-tal and chemometrics considerations involved in produ-cing reliable chemical images are not trivial, and this arti-cle could be viewed as a reason for not pursuing and

FIG. 9. Chemical image from hypothetical sample cross section ofsphere distribution through a matrix.

FIG. 10. Effects on information con-tent in chemical image where (onleft) sampling spot size is equal tosampling step size, and (on right)where sampling spot size is less thansampling step size.

823CHEMICAL IMAGES: TECHNICAL APPROACHES AND ISSUES

Cytometry Part A DOI 10.1002/cyto.a

developing these technologies as credible problem solvingtools. However, we are strong supporters of mapping andimaging methodologies for obtaining unique informationfrom solid-state mixtures. Examples from Pfizer’s globallaboratories presented both in the scientific literature andat conferences demonstrate how chemical images solveproblems or provide better understanding of our productsduring development (11,13,15,16). The ongoing use ofvibrational spectroscopy to understand biological systemsand to detect disease noninvasively is also encouragingand well documented in the literature. These applicationsare reliant on the use of chemometrics (including PCA) tounravel complex data that are inherent to these applica-tions, and chemical images offer an approach to visualiz-ing these samples. By addressing the issues that we haveraised in this article, more confidence and credibility willbe added to the information that can be obtained fromchemical images and hopefully will encourage its use innew application areas.

LITERATURE CITED1. Dhamelincourt P. In: Chalmers JM, Griffiths PR, editors. Handbook of

Vibrational Spectroscopy. Chichester: Wiley; 2002. Vol. 2. p 1419.2. Hammond SV, Clarke FC. In: Chalmers JM, Griffiths PR, editors. Handbook

of Vibrational Spectroscopy. Chichester: Wiley; 2002. Vol. 2. p 1405.3. Treado PJ, Nelson MP. In: Chalmers JM, Griffiths PR, editors. Handbook

of Vibrational Spectroscopy. Chichester: Wiley; 2002. Vol. 2. p 1429.

4. Kidder LH, Haka AS, Lewis EN. Instrumentation for FT-IR imaging. In:Chalmers JM, Griffiths PR, editors. Handbook of Vibrational Spectros-copy. Chichester: Wiley; 2002. Vol. 2. p 1386.

5. Vandenginste BGM, Massart DL, Buydens LMC, de Jong S, Lewi PJ,Smeyers-Verbeke J. Handbook of Chemometrics and Qualimetrics B.Amsterdam: Elsevier; 1998.

6. Malinowski ER. Factor Analysis in Chemistry. New York: Wiley; 1991.7. Geladi P, Grahn H. Multivariate Image Analysis. New York: Wiley; 1996.8. Wang J-H, Hopke PK, Hancewicz TM, Zhang SL. Application of modi-

fied alternating least squares regression to spectroscopic image analy-sis. Anal Chim Acta 2003;476:93–109.

9. Cuesta Sanches F, Toft J, van den Bogaert B, Massart DL. Orthogonalprojection approach applied to peak purity assessment. Anal Chem1996;68:79–85.

10. Tauler R, Smilde A, Kowalski BR. Selectivity, local rank, 3-way dataanalysis and ambiguity in multivariate curve resolution. J Chemom1995;9:31–58.

11. Clarke FC, Jamieson MJ, Clark DA, Hammond SV, Jee RD, Moffat AC.Chemical image fusion. The synergy of FT-NIR and Raman mappingmicroscopy to enable a move complete visualization of pharmaceuti-cal formulations. Anal Chem 2001;73:2213–2220.

12. Schulmerich MV, Finney WF, Fredricks RA, and Morris MD. SubsurfaceRaman spectroscopy and mapping using a globally illuminated non-confocal fiber-optic array probe in the presence of Raman photonmigration. Appl Spectrosc 2006;60(2):109–114.

13. Sasic S, Clark DA, Mitchell JC, Snowden MJ. Univariate versus multi-variate Raman imaging — A simulation with an example from phar-maceutical practice. Analyst 2004;129:1001–1007.

14. Everall N, Hahn T, Matousek P, Parker AW, Towrie M. Photon migra-tion in Raman spectroscopy. Appl Spectrosc 2004;58:591–597.

15. Sasic S, Clark DA, Mitchell JC, Snowden MJ. Analysing Raman imagesof pharmaceutical products by sample-sample 2D correlation. ApplSpectrosc 2005;59:630–638.

16. Zhang L, Henson MJ, Sekulic SS. Multivariate data analysis for Ramanimaging of a model pharmaceutical tablet. Anal Chim Acta 2005;545:262–278.

824 CLARK AND SASI�C

Cytometry Part A DOI 10.1002/cyto.a