assignment vs. thesis - 浙江大学数学研究中心 · 2007-07-04 · eventually most or...
TRANSCRIPT
-
Assignment vs. Thesis
-
What is research all about?
-
How to do a decent research?
DefineWhat do you want to solve?Literature Review
First solutionBest solution
What will you do?Problem Identification/FormulationProblem SolutionPresentation/MarketingApplications
ExampleLinear Model
-
How to be famousCriterion (AIC, R2)Algorithm (EM)Methodology (Response Surface)Thinking Process & Philosophy (Bayesian) Intelligent ApplicationsNo, No, No.Case Studies/Extension(except Review Paper)
-
:
-
(Atmosphere) (Taste)Culture
-
Dennis Lins Lectures
04 to 08 July [email protected]
-
Potential TopicsWhats Statistics All AboutBIG StatisticsResponse Surface Methodology
Multiple Response OptimizationDesign of ExperimentQuality Assurance & Six SigmaStatistical Data Mining & Machine Learning
Classification & Support Vector MachineFuture Opportunities
Search Engine & RFID
-
Whats StatisticsAll About?
Making Sense Out of Numbers
-
+
=
() ()
+= )...( 1 kxxfy
-
I love noise ! I love noise !I love noise !
-
E.F. SchumacherWhen the Lord created the world and people to live in itan enterprise which, according to modern science, took a very long timeI could well imagine that He reasoned with Himself as follows:If I make everything predictable, these human beings, whom I have endowed with pretty good brains, will undoubtedly learn to predict everything, and they will thereupon have no motive to do anything at all, because they will recognize that the future is totally determined and cannot be influenced by any human action.On the other hand, if I make everything unpredictable, they will gradually discover that there is no rational basis for any decision whatsoever and, as in the first case, they will thereupon have no motive to do anything at all.Neither scheme would make sense.I must therefore create a mixture of the two. Let some things be predictable and let others be unpredictable. They will then, amongst many other things, have the very important task of finding out which is which.
-
Inference about Population(Assuming Population will never change)
Sample Population
-
Inference about Population(Assuming Population will change)
Before After
-
Statistics in China (1960-1990)Highly influenced by RussiaMathematics is the leading disciplineMore Probability than StatisticsMore Theoretical than Applied
Cultural Revolution (1966-1976)Wang Yuan ()
There are two ways for (pure) mathematicians to do applied work(a) Treat applied problem as an pure mathematical problem, or(b) Adapt existing methodologies from outside
-
BIG Statistics
Dennis LinUniversity Distinguished ProfessorThe Pennsylvania State University
atZhejiang University
04 July, 2005
-
Knowledge Discovery in Sciences
Deduction ()PhysicsMathematics
Induction ()BiologyChemistry
Statistics is more powerful/useful in empirical study & experience accumulation, namely induction type.
-
Where have all the Data gone?No need for data (Theoretical Development)Survey Sampling and Design of ExperimentComputer SimulationData from Internet
On-line auctionSearch Engine
-
BIG Statistics
Business StatisticsIndustrial StatisticsGovernmental/Official Statistics
-
, (True)
, (Practical)
-
BIG Statistics
Business StatisticsIndustrial StatisticsGovernmental/Official Statistics
-
Business Statistics
-
Component of Business Research
ManagementBehaviorQuantitative
-
Business SchoolsAccounting/(Statistics)FinanceMarketingManagement & OrganizationInsurance & Real EstateManagement Science & Information SystemsSupply Chain ManagementBusiness LogisticInternational BusinessBusiness AdministrationEconomics
-
Business Intelligent
What is the same?What is new?
-
Elements of the exchange process
A buyerA sellerBuyer and seller can find each other (with some way to authenticate the other party)
Each has something of value to offer the otherThey can voluntarily complete the exchange
What is the Same?
-
Whats the Same?
Do the right thing!Do the thing right!
Right or Wrong Win or Lose
-
What is New?
(e-business)
(Globalization)
-
TechnologyInformation Systeme-BusinessSupply ChainValue NetGlobalization
What is New?
-
Example of a Supply Chain
RAW MATERIALSRAW MATERIALS PRODUCTIONPRODUCTION
STORAGESTORAGEDISTRIBUTIONDISTRIBUTION
-
Information FlowsGoods Flow
What is new? e-Supply Chain
Customers
Manufacturers
Wholesale Distributors
Suppliers
Retailers
Contract Manufacturers
ConsortiaExchanges
LogisticsExchanges
Logistics Providers
Virtual Manufacturers
Virtual Distributors
-
Whats different about the electronic exchange process?Buyers have more control in the exchange process (To some extent, the online medium also helps sellers find buyers)Customers behave differently online The exchange process for digital products is being completely redefined The exchange process for non-digital products is being transformed It redefines the way marketing information is gathered, analyzed, and deployedThe medium alters the nature of competition
-
What Kind of Job?
AgricultureIndustryServiceWhats next?High-tech and high-touch
-
Business StatisticsisTime Series ???
-
Time Series AnalysisTime Series Analysis (Univariate)
Linear, ARIMANon-linear
Time Series Analysis (Multi-variate)ARCH Model
AutoRegressive Conditional HeteroscedasticityGARCH, GARCH-M, Expo-GARCH etc
Long-Memory, high-frequency dataItos Lemma and Black-Scholes Differential EquationFinancial Time SeriesValue at Risk
-
The Newsboy ProblemAn international newsstand must decide how many copies, Q, of the Toronto Star to stock.
The owner can purchase papers wholesale at $0.80 each, and sell them for $1.20.Leftover paper are sold to a recycling at $0.05 each.
Profit for the day=Min (Q,X) * $1.20
+ Max (0, Q-X) * $0.05Q* $0.80where X is the Demand for the day.
-
The Newsboy ProblemSolve for Q which maximize the Profit
Suppose a historical demands are availableOptimization problem:
X is known (or use x-bar)
Statistical Problem: X follows Normal distribution (say), orDensity Estimation for X
Eventually most OR problems can be (more or less) converted into Stat problems
-
Newsboy Problem
The newsboy needs to determine how many newspapers to order for the next day.
The optimal number of newspapers to order is given by:
+
= cost overage cost underage
cost underageLevelOrder Optimal 1F
-
Implosion & the Inverse NewsboySuppose that the supply of c1 is given by x.
How many units of product A should we commit to sell?
Assume that:Orders must be satisfied in a FCFSEach order is for one unit of product A
Let SA(x) be the quantity of A that can be satisfied from a supply of x units of c1.
c1
A( ) )(~:max1
xHxYnxS An
iinA
= =
-
Sales & Operations Planning Manufacturing
sales marketing accounting finance manufacturing
Suppliers
Sales targets explode
implodeCommitment to Sales
Configure-To-Order (FTO) System
-
Unfortunately, Life is Not SimpleMultiple configurable productsComponent commonalityMulti-period rolling planning horizonMulti-level Bill-of-Materials
A B C
Stochastic BOM Layer
Deterministic BOM Layer
-
Problem StatementGiven:
A set of m products and n componentsA set of sales targets over T weeksA set of component supply commitments over T weeks
Determine:A set of commitments to sales over T weeks
Subject to:Supply constraints
Objective:To maximum expected profit over T weeks while minimizing the penalty for deviating from the sales targets
-
a 2
a 3
s
a 1R e c e iv ere q u e s t
V e r ifyc re d i tc a rd
A s k fo rr e su b m is s io n
o f r e q u e s t
a 5F in d n om a tc h
a 4T im e
o u t
a 6
C o n ta c th o te l 1
a 7
C o n ta c th o te l 2
a 8C o n ta c th o te l 3
a 1 5
E v a lu a ter e p l ie s
a 9
a 1 1R e c e iv e r e p ly 2
a 1 3a 1 4
a 1 0
a 1 2T im e o u t h o te l 2
T im e o u t h o te l 1
R e c e iv e r e p ly 3
T im e o u t h o te l 3
R e c e iv e r e p ly 1
a 1 6a 1 7
C h a rg ec re d i t c a rd
a 1 8
N o tifyc u s to m e r
F in d ah o te l e
a 1 9
B o o k h o te l
c 1
c 2
c 3
c 4
c 5
c 7
c 1 1c 1 0
c 9c 8
c 6
c 1 6c 1 4 c 1 5
c 1 3c 1 2
How do you know your Flow-Chart is correct?
A business process example is used to show how to use process graphs to represent and analyze process modelsAn online name-your-price hotel reservation process modelIdentify blocks that are to be modeled as subprocesses
Two blocks are identified in this model
a 2
a 3
s
a 1R e c e iv ere q u e s t
V e r ifyc re d i tc a rd
A s k fo rr e su b m is s io n
o f r e q u e s t
a 5F in d n om a tc h
a 4T im e
o u t
a 6
C o n ta c th o te l 1
a 7
C o n ta c th o te l 2
a 8C o n ta c th o te l 3
a 1 5
E v a lu a ter e p l ie s
a 9
a 1 1R e c e iv e r e p ly 2
a 1 3a 1 4
a 1 0
a 1 2T im e o u t h o te l 2
T im e o u t h o te l 1
R e c e iv e r e p ly 3
T im e o u t h o te l 3
R e c e iv e r e p ly 1
a 1 6a 1 7
C h a rg ec re d i t c a rd
a 1 8
N o tifyc u s to m e r
F in d ah o te l e
a 1 9
B o o k h o te l
c 1
c 2
c 3
c 4
c 5
c 7
c 1 1c 1 0
c 9c 8
c 6
c 1 6c 1 4 c 1 5
c 1 3c 1 2
Search hotels
Find a match
-
What is Supply Chain Scheduling?
Classical Scheduling Problem
M1
M2
J1
J1
J2 J3
J2 J3
Supply Chain Scheduling Problem
M1
M2
J1
J1
J2 J3
J2 J3
MachinesM1 and M2are operatedby the samedecision maker.
MachinesM1 and M2are operatedby differentdecision makers.
Supplier
Manufacturer
Manufacturer
Distributor
-
What is Supply Chain Scheduling?
Classical Scheduling Problem
M1
M2
J1
J1
J2 J3
J2 J3
Supply Chain Scheduling Problem
J1
J1
J2 J3
J2 J3
Supplier
Manufacturer
-
Inventory: Lead Time
Lead time of order 1
Effective lead time of order 1
Lead time of order 2
Effective lead time of order 2
Order 1 is placed
Order 1 is received
Order 2 is placed
Order 2 is received
Time t
-
SCHEME
Demand Rate Deterministic
Demand Rate Stochastic
Lead Time Deterministic
Model 0 Model 1
Lead Time Stochastic
Model 2 Model 3
Common Assumption: Exponential Distribution
-
What is Research: ExampleTravel Salesman Problem (TSP)Chinese Postman Problem (CPP)Lawn Mowing Problem (LMP)
Vehicle Routing Problem (VRP)m-TSP
-
Travel Salesman Problem (TSP)Given a set of points and all distances between them, find the tour that visits all points and cover minimum distances.
-
Travel Salesman Problem (TSP)
Given a set of points and all distances between them, find the tour that visits all points and cover minimum distances.Problem Formulation
Minimize
Subject To
-
Chinese Postman Problem (CPP)Lawn Mowing Problem (LMP)
-
Vehicle Routing Problem (VRP)What if there are m salesmen? What if there are m salesmen with logistical realism?
m vehicles with some capacities (possibly different capacities by vehicle type, time windows etc.
-
{ }1,0
xxQxopt
st
Basic Idea
{ }1,0
,...1
==x
mibxaj
ijij
=j
jj xcxfopt )(
Linear to Non-linear!!!
-
Link Analysis
-
What are the issues here?How do you quantify (measure) your observations (people or dog)?How do you characterize your observations?How do you classify (match) them?Others?
-
SVM:Support Vector Machine
Dimension ReductionDimension Expansion
-
Plane Bisect Closest Points
dc
-
Dual of Closest Points Method is Support Plane Method
( )
2 2
,1
1 1
1 12 2
. . 1 1 . . 1
0 1,..,
|| || || ||min mini i iw bi
i i i ii i
y x w
s t s t y x w b
i
=
= =
=
l
l
Solution only depends on support vectors: 0i>
1
1 1:
1 1i i i ii
i Classw y x y
i Class
=
= = l
-
Linearly Inseparable Case:Supporting Plane Method
( )1
1 22, ,
min || ||
1.
1,..,0
w b z
i i
ii
i
i
w C z
zy x w bs t
iz
=
+
+
=+
l
l
Just add non-negative errorvector z.
-
Dual of Closest Points Method is Support Plane Method
( )
2 2
,1 1
1 1
1 12 2
. . 1 1 . . 0
0 0 1,..,
|| || || ||min mini i i iw bi i
i i i i ii i
i i
y x w C z
s t s t y x w b z
D z i
= =
+
= = +
=
l l
l
Solution only depends on support vectors: 0i >
1i i i
iw y x
=
=l
-
Marron
-
Marron
-
Marron
-
What is Hot in B-Stat?Supply Chain ManagementFinancial Statisticse-Business (real time business)Marketing analysisData MiningValue NetSecurityAuction (On-Line)Supply Chain Event Management (Sense & Response)More Examples? (RFID/Search Engine)
-
A Specific Example: RFID
-
Statistics (Fundamental)Population vs Sample
Distribution AssumptionParameter EstimationStatistical Inference
Observations of Entire PopulationEmpirical Distribution FunctionDensity EstimationPopulation DescriptionForecasting
-
Low-Storage Single-Pass Density Estimation
UnivariateCertain (say, 20) representative quantiles(Cubic Smoothing) Spline fitting to these quantilesStatistical inference
MultivariateConvex Hulls PeelingCertain (say, 20) representative convex hullsThin Plate Splines fitting to these quantilesStatistical inference
-
Convex Hull Peeling Depth Contour
-
Astronomy Dataset: Near infra-red vs. blue-green(subset of 11355 for illustration/comparison)
-
Astronomy Dataset: Depth Contour
-
Astronomy Dataset: Thin Plate Splines
-
Secure Communication (PAIN)Privacy
message is secret
Authenticationmessage originates/reaches from a particular sender/receiver
Integritymessage is not altered
Non-repudiationsender can not deny having sent the message
-
Industrial Statistics
-
Industrial Statistics (Lin)
Process Capacity Index (PCI)Chao and Lin (2005)
Control ChartYeh and Lin (2002, 2005)
ReliabilityHighly reliable products
Design of ExperimentLins recent work
Quality Assurance & Six-Sigma Development Data Management & Value NetOthers
-
Whats Next?
Quality inService Industry(Service Science)
-
Process Capacity IndexCpCpkCpmCpmkCp(u,v)Cpwetc.
-
Process capability
Over time, a typical process will shift and drift by approximately 1.5
-
As a bio-statistician, you must knowp-valueAs an industrial-statistician, you must knowCp value
Common values:t, Z, 2, F etc.
-
Basic Idea:Process yield = 2(3*Cy)-1
Thus, we (Chao and Lin, 2005) have
-
Shewhart Control Charts
0Subgroup 5 10 15 20 2573.985
73.995
74.005
74.015
Sam
ple
Mea
n
X=74.00
3.0SL=74.01
-3.0SL=73.99
0.00
0.01
0.02
Sam
ple
StD
ev
S=0.009240
3.0SL=0.01930
-3.0SL=0.00E+0
Xbar and S Chart for Piston Ring Example
-
Yeh and Lin (2003)
-
Design of Experiment
How to collect useful information?
-
AgriculturalExperiments
IndustrialExperiments
Number of factors Small LargeNumber of Runs Large SmallReproducibility Large SmallTime taken Long ShortBlocking Nature Not obviousMissing values Often SeldomRandomization Important OKOther
Designing industrial experiments is very different From designing agricultural experiments
-
What is a
good design???
-
Design of Experiment (Lin)Multiple Response Problems
Optimization: Kim and Lin (JRSS-C, 2000)Design: Chang, Lo, Lin & Young (JSPI, 2001)
Computer ExperimentBeattie and Lin (1998)
Dispersion EffectMcGrath and Lin (Technometrics, 2002)
Foldover PlanLi and Lin (Technometrics, 2003)
Supersaturated DesignsLin (Technometrics, 1993, 1995, 2001) and others
Uniform DesignsFang, Lin, Winker & Yang (Technometrics, 1999)
-
Supersaturated Designs
Lin (UTK Technical Report, 1991)Lin (Technometrics, 1993, 1995, 2001) and many others
-
Supersaturated Design
# of experimental runs is less than # of experimental variables
# X1 X2 . X24 Resp1 2 . . . 14
y1 y2 . . . y14
-
Half Fraction of William's (1968) DataFactor
Run 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 y1 + + + - - - + + + + + - + - - + + - - + - - - + 1332 + - - - - - + + + - - - + + + + - + - - + + - - 623 + + - + + - - - - + - + + + + + + - - - - + + - 454 + + - + - + - - - + + - + - + + - + + + - - - - 525 - - + + + + - + + - - - + - + + + - - + - + + + 566 - - + + + + + - + + + - - + + - + + + + + + - - 477 - - - - + - - + - + - + + + - + + + + + + - - + 888 - + + - - + - + - + - - - - - - - - + - + + + - 1939 - - - - - + + - - - + + - - + - + + - - - - + + 32
10 + + + + - + + + - - - + - + + - + - + - + - - + 5311 - + - + + - - + + - + - - + - - - + + - - - + + 27612 + - - - + + + - + + + + + - - + - - + - + + + + 14513 + + + + + - + - + - - + - - - - - + - + + - + - 13014 - - + - - - - - - - + + - + - - - - - + - + - - 127
Lin (1993, Technometrics)
Supersaturated Design Example
-
How can we study k parameterswith n(
-
Run FactorsNo. I 1 2 3 4 5 6 7 8 9 10 (11)1 + + + - + + + - - - + -2 + + - + + + - - - + - +3 + - + + + - - - + - + +4 + + + + - - - + - + + -5 + + + - - - + - + + - +6 + + - - - + - + + - + +7 + - - - + - + + - + + +8 + - - + - + + - + + + -9 + - + - + + - + + + - -
10 + + - + + - + + + - - -11 + - + + - + + + - - - +12 + - - - - - - - - - - -
Supersaturated Design From Hadamard Matrix of Order 12(Using 11 as the branching column)
-
X = H RH
orthogonal orthogonal
not orthogonalThus Permute rows of RHto minimize , say.E s( )2
-
UD OD SSD
Fang, Lin & Ma (2000)
-
SSD: Looking AheadSupersaturated Design
SSD is much more mature than ever.
Micro-Array Design and Analysis
Computer Experiment: Model Building (using SSD)
Higher Level SSD
Spotlight Interaction Effects (Lin, 1998, QE)
Combination Designs: Rotated FFD & SSD
-
Computer Experiment
Beattie and Lin (1998)
-
Statistics vs. Engineering Models
Statistical Model
y=0+ixi+ijxixj+
+= ),(xfy
-
Statistical Simulation Research
Random Number GeneratorsDeng and Lin (1997, 2001)
Robustness of transformation(Sensitivity Analysis)
From Uniform random numbers to other distributions
-
A Typical Engineering Model (page 1 of 3, in Liao and Wang, 1995)
-
Irrelevant Issues
ReplicatesBlockingRandomization
Question: How can a computer experiment be run in an efficient manner?
-
Rotated Factorial Designs
Computer experiments are gaining in popularity
main research area of the next 10 years
Rotated factorial designsgood factorial design properties(orthogonality and structure)good Latin hypercube properties(unique and equally-spaced projections)easy to constructcomparable by Bayesian criteriavery suitable for computer experiments
-
Reliability
More on Highly reliable Products
-
Quality Assurance & Six-Sigma: The Past & The Future
Later
-
Define
Improve Analyze
MeasureControl
Six Sigma Project Process
Rigorous, data-driven and customer-focusedmethodology
-
Governmental Statistics
(Official Statistics)
-
Statistics
Information about the state
-
Official Statistics
-- (390 B.C.)
Index ManagementSampling
-
Index ManagementWhat to measure?How to measure?
(population): Age, geographical, etc and distribution (Wealth):
Economics IndexSocial Index
-
What to Measure: IndexAccidental Rate Example
Number of Car AccidentsDivided by
???* Population -- China* Total Number of cars -- Japan* Total mileages driven -- USA
-
6.68
3.06
3.934.30
3.384.77
1.2
5.86
5.424.57
6.16.42
7.17
3.98 4.22
-2.18
-4
-2
0
2
4
6
8
82 83 84 85 86 87 88 89 90 91Q1 91Q2 91Q3 91Q4 92Q1 92Q2 92Q3 92Q4
% 90-2.18
-
Government/Official StatisticsIndex ManagementFrom Census to SamplingSampling in modern technology(Web, Palm)
Error in VariablesMissing valuesData Mining
-
Send $500 toDennis Lin483 Business BuildingSupply Chain &Information SystemsPenn State University
+1 814 865-0377 (phone)+1 814 863-7076 (fax)[email protected]
(Customer Satisfaction or your money back!)
-
Potential TopicsWhats Statistics All AboutBIG StatisticsResponse Surface Methodology
Multiple Response OptimizationDesign of ExperimentQuality Assurance & Six SigmaStatistical Data Mining & Machine Learning
Classification & Support Vector MachineFuture Opportunities
Search Engine & RFID
-
Assignment #1Suppose thaty1=0+1x+11x2+1 and y2=0+1x+11x2+2Find x=x* such that both Yis are maximized.
Suppose thaty1=f1(x,1)+1 and y2=f2(x,2)+2Find x=x* such that both Yis are maximized.What will you do?