towards root-cause analysis in compression ... - ibm research

Machine Learning group

© Copyright IBM Corporation 2006IBM Haifa Labs

Towards root-cause analysis in compression-based similarity

Justin Wong, Yiheng Xu, Elad Yom-Tov & Michal Rosen-ZviDecember 2006

� © Copyright IBM Corporation 2006

IBM Haifa Labs


Similarity by compression

�If the tree for compressing one signal is good for compressing another signal, the two must be similar

�Therefore, the compression-based similarity measure:

� �)|(),|(min)|(

BA

B

TreeBsizeTreeAsizeTreeAsize

S �


IBM Haifa Labs


Similarity by compression

�The compression-based similarity measure: Normalized compression distance

Compression-based measures have been proposed and applied in �DNA clustering � Language hierarchies�Optical character recognition�Analysis of literature�Plagiarism detection�Classification of music

� �� )(),(max

)(),(min)(BsizeAsize

BsizeAsizeABsizeS

��

“Clustering by Compression” by Rudi Cilibrasi and Paul M.B. Vitanyi, IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO 4, APRIL 2005, 1523–1545 1


IBM Haifa Labs


Example

�10 Noisy signals from two sensors, one has 5% addition to the basic signal in samples 15-20

0 10 20 30 40 50 60 70 80 90 100-1.5

-1

-0 .5

0

0.5

1

1.5


IBM Haifa Labs


0 10 20 30 40 50 60 70 80 90 100-1.5

-1

-0 .5

0

0.5

1

1.5

Example

�10 Noisy signals from two sensors, one has 5% addition to the basic signal in samples 15-20


IBM Haifa Labs


Feature space

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.305 0.31 0.315 0.320.308

0.31

0.312

0.314

0.316

0.318

0.32

Simple PCA 1st PCA on compression features


IBM Haifa Labs


Compression via general-purpose compression software

�Pros:

�Simple to do (“Black box”)

�Elegant

�Works! (Author detection, DNA analysis, image processing, …)

�Cons:

�No root cause analysis

�Expensive computationally


IBM Haifa Labs


IBM’s East Fishkill 300mm Fab


IBM Haifa Labs


Residual Gas Analysis process

�A Quadrupole Mass Spectrometer, used for process monitoring and fault detection in semiconductor manufacturing.

�The main purpose of RGA systems is to detect the contamination residue on the wafer as well as undesirable process variation.

�During the degas process monitoring, the RGA measures the intensity of each mass within the pre-defined mass range.

�One full spectrum scan from mass 1 to mass 100 usually takes around 4-5 seconds

�For a typical degas process, about 20 scans are recorded per wafer and each scan consists of measurements of 100 masses. The intensity of each mass represents the concentration/partial pressure of its corresponding chemical specie.

© Copyright IBM Corporation 2006

IBM Haifa Labs


RGA data

© Copyright IBM Corporation 2006

IBM Haifa Labs


Residual Gas Analysis (RGA) data

�Two datasets:�211 wafers, 80 defective (37.9%)

�135 wafers, 12 defective (8.9%)

�Data consists of between 19 and 25 spectral scans taken at 4 second intervals. �Compression allows using such variable length data.

�Preprocessing:�All measures below 10-6 were set to 10-6.

�All measurements were then transformed logarithmically (due to the large span of values).

�Processing:�Compression using bzip2


IBM Haifa Labs


Results

�Slightly over 88% of the wafers in dataset 1 and 97% of the wafers in dataset 2 were correctly classified � We used 10-fold cross-validation to validate

results.� AdaBoost, 100 iterations, linear weak learner� First 10 eigenvectors used as features

� The feature space of the first dataset (right) still shows clusters. These clusters have a strong correlation to the number of scans taken (that is, each cluster represents wafers with a certain number of scans).

� Therefore, it seems that the actual spectral values has a smaller effect than the number of data scans.

Defective/Working

-0.1-0.0500.050.10.15

-0.15

-0.1

-0.05

0

0.05

0.067

0.068

0.069

0.07

0.071


IBM Haifa Labs


Receiver Operating Characteristic Curve

Dataset 1 Dataset 2


IBM Haifa Labs


Identify the first scan at which a wafer can be identified as defective

�Let N be the maximal scan number for the specific wafer

�For i = 1 to N

�Check similarity of wafers using scans 1 through i

� If defective wafers are significantly different from working wafers, and the specific wafer is more similar to the defective wafers, stop.

�We generated a leave-one-out ROC using SVM (libsvm) for each of the N scans. If the area under the ROC is large, defective and working wafers are significantly different.


IBM Haifa Labs


Results


IBM Haifa Labs


Identify the variable which indicates that a specific wafer is defective

�Can compression be used?

�No, because amplitudes matter most.

�We identify the most indicative variable by measuring the ratio between the maximal amplitude of the mass (in any scan) divided by the median (across scans) of the maximal height of this mass

�This ratio is the severity level for each mass

�Tested on defective wafers in dataset 2, full agreement was found between current software and compression-based similarity


IBM Haifa Labs


Identifying problematic manufacturing stages in lithographic tools

Prime bake

Photoresistspin coating

Soft bake(PAB)

Exposure

Post-exposurebake (PEB)

Puddle resistdevelopment

Post-developbake (PDB)

Available measures:

� Critical dimension (CD) SEM measure

� Critical dimension (CD) Scatterometry pre-etching measures

Critical Dimension (CD) is measured to assess performance


IBM Haifa Labs


Finding relevant features using regression in the baking process

Clear correlation between pre-exposed bake timing and CDs


IBM Haifa Labs


Prediction of out of spec

Compression based prediction – provided the following categories:

� Baking

�In module (not in process)

�In process

� Between modules + in transition modules

� Other modules (coating/exposure)

5 modules x 2

2

Benefits from concentration in first part of the sequence that causes variations in CDs


IBM Haifa Labs


Summary

�Compression is a useful method to measure similarity between many data types, without worrying about feature extraction

�Root-cause analysis remains a challenge, though possible in some cases

�Also, in some cases it may miss salient features of the data

towards root-cause analysis in compression ... - ibm research

Documents