assignment vs. thesis - 浙江大学数学研究中心 · 2007-07-04 · eventually most or...

124

Upload: dangliem

Post on 15-Jul-2018

279 views

Category:

Documents


0 download

TRANSCRIPT

  • Assignment vs. Thesis

  • What is research all about?

  • How to do a decent research?

    DefineWhat do you want to solve?Literature Review

    First solutionBest solution

    What will you do?Problem Identification/FormulationProblem SolutionPresentation/MarketingApplications

    ExampleLinear Model

  • How to be famousCriterion (AIC, R2)Algorithm (EM)Methodology (Response Surface)Thinking Process & Philosophy (Bayesian) Intelligent ApplicationsNo, No, No.Case Studies/Extension(except Review Paper)

  • :

  • (Atmosphere) (Taste)Culture

  • Dennis Lins Lectures

    04 to 08 July [email protected]

  • Potential TopicsWhats Statistics All AboutBIG StatisticsResponse Surface Methodology

    Multiple Response OptimizationDesign of ExperimentQuality Assurance & Six SigmaStatistical Data Mining & Machine Learning

    Classification & Support Vector MachineFuture Opportunities

    Search Engine & RFID

  • Whats StatisticsAll About?

    Making Sense Out of Numbers

  • +

    =

    () ()

    += )...( 1 kxxfy

  • I love noise ! I love noise !I love noise !

  • E.F. SchumacherWhen the Lord created the world and people to live in itan enterprise which, according to modern science, took a very long timeI could well imagine that He reasoned with Himself as follows:If I make everything predictable, these human beings, whom I have endowed with pretty good brains, will undoubtedly learn to predict everything, and they will thereupon have no motive to do anything at all, because they will recognize that the future is totally determined and cannot be influenced by any human action.On the other hand, if I make everything unpredictable, they will gradually discover that there is no rational basis for any decision whatsoever and, as in the first case, they will thereupon have no motive to do anything at all.Neither scheme would make sense.I must therefore create a mixture of the two. Let some things be predictable and let others be unpredictable. They will then, amongst many other things, have the very important task of finding out which is which.

  • Inference about Population(Assuming Population will never change)

    Sample Population

  • Inference about Population(Assuming Population will change)

    Before After

  • Statistics in China (1960-1990)Highly influenced by RussiaMathematics is the leading disciplineMore Probability than StatisticsMore Theoretical than Applied

    Cultural Revolution (1966-1976)Wang Yuan ()

    There are two ways for (pure) mathematicians to do applied work(a) Treat applied problem as an pure mathematical problem, or(b) Adapt existing methodologies from outside

  • BIG Statistics

    Dennis LinUniversity Distinguished ProfessorThe Pennsylvania State University

    atZhejiang University

    04 July, 2005

  • Knowledge Discovery in Sciences

    Deduction ()PhysicsMathematics

    Induction ()BiologyChemistry

    Statistics is more powerful/useful in empirical study & experience accumulation, namely induction type.

  • Where have all the Data gone?No need for data (Theoretical Development)Survey Sampling and Design of ExperimentComputer SimulationData from Internet

    On-line auctionSearch Engine

  • BIG Statistics

    Business StatisticsIndustrial StatisticsGovernmental/Official Statistics

  • , (True)

    , (Practical)

  • BIG Statistics

    Business StatisticsIndustrial StatisticsGovernmental/Official Statistics

  • Business Statistics

  • Component of Business Research

    ManagementBehaviorQuantitative

  • Business SchoolsAccounting/(Statistics)FinanceMarketingManagement & OrganizationInsurance & Real EstateManagement Science & Information SystemsSupply Chain ManagementBusiness LogisticInternational BusinessBusiness AdministrationEconomics

  • Business Intelligent

    What is the same?What is new?

  • Elements of the exchange process

    A buyerA sellerBuyer and seller can find each other (with some way to authenticate the other party)

    Each has something of value to offer the otherThey can voluntarily complete the exchange

    What is the Same?

  • Whats the Same?

    Do the right thing!Do the thing right!

    Right or Wrong Win or Lose

  • What is New?

    (e-business)

    (Globalization)

  • TechnologyInformation Systeme-BusinessSupply ChainValue NetGlobalization

    What is New?

  • Example of a Supply Chain

    RAW MATERIALSRAW MATERIALS PRODUCTIONPRODUCTION

    STORAGESTORAGEDISTRIBUTIONDISTRIBUTION

  • Information FlowsGoods Flow

    What is new? e-Supply Chain

    Customers

    Manufacturers

    Wholesale Distributors

    Suppliers

    Retailers

    Contract Manufacturers

    ConsortiaExchanges

    LogisticsExchanges

    Logistics Providers

    Virtual Manufacturers

    Virtual Distributors

  • Whats different about the electronic exchange process?Buyers have more control in the exchange process (To some extent, the online medium also helps sellers find buyers)Customers behave differently online The exchange process for digital products is being completely redefined The exchange process for non-digital products is being transformed It redefines the way marketing information is gathered, analyzed, and deployedThe medium alters the nature of competition

  • What Kind of Job?

    AgricultureIndustryServiceWhats next?High-tech and high-touch

  • Business StatisticsisTime Series ???

  • Time Series AnalysisTime Series Analysis (Univariate)

    Linear, ARIMANon-linear

    Time Series Analysis (Multi-variate)ARCH Model

    AutoRegressive Conditional HeteroscedasticityGARCH, GARCH-M, Expo-GARCH etc

    Long-Memory, high-frequency dataItos Lemma and Black-Scholes Differential EquationFinancial Time SeriesValue at Risk

  • The Newsboy ProblemAn international newsstand must decide how many copies, Q, of the Toronto Star to stock.

    The owner can purchase papers wholesale at $0.80 each, and sell them for $1.20.Leftover paper are sold to a recycling at $0.05 each.

    Profit for the day=Min (Q,X) * $1.20

    + Max (0, Q-X) * $0.05Q* $0.80where X is the Demand for the day.

  • The Newsboy ProblemSolve for Q which maximize the Profit

    Suppose a historical demands are availableOptimization problem:

    X is known (or use x-bar)

    Statistical Problem: X follows Normal distribution (say), orDensity Estimation for X

    Eventually most OR problems can be (more or less) converted into Stat problems

  • Newsboy Problem

    The newsboy needs to determine how many newspapers to order for the next day.

    The optimal number of newspapers to order is given by:

    +

    = cost overage cost underage

    cost underageLevelOrder Optimal 1F

  • Implosion & the Inverse NewsboySuppose that the supply of c1 is given by x.

    How many units of product A should we commit to sell?

    Assume that:Orders must be satisfied in a FCFSEach order is for one unit of product A

    Let SA(x) be the quantity of A that can be satisfied from a supply of x units of c1.

    c1

    A( ) )(~:max1

    xHxYnxS An

    iinA

    = =

  • Sales & Operations Planning Manufacturing

    sales marketing accounting finance manufacturing

    Suppliers

    Sales targets explode

    implodeCommitment to Sales

    Configure-To-Order (FTO) System

  • Unfortunately, Life is Not SimpleMultiple configurable productsComponent commonalityMulti-period rolling planning horizonMulti-level Bill-of-Materials

    A B C

    Stochastic BOM Layer

    Deterministic BOM Layer

  • Problem StatementGiven:

    A set of m products and n componentsA set of sales targets over T weeksA set of component supply commitments over T weeks

    Determine:A set of commitments to sales over T weeks

    Subject to:Supply constraints

    Objective:To maximum expected profit over T weeks while minimizing the penalty for deviating from the sales targets

  • a 2

    a 3

    s

    a 1R e c e iv ere q u e s t

    V e r ifyc re d i tc a rd

    A s k fo rr e su b m is s io n

    o f r e q u e s t

    a 5F in d n om a tc h

    a 4T im e

    o u t

    a 6

    C o n ta c th o te l 1

    a 7

    C o n ta c th o te l 2

    a 8C o n ta c th o te l 3

    a 1 5

    E v a lu a ter e p l ie s

    a 9

    a 1 1R e c e iv e r e p ly 2

    a 1 3a 1 4

    a 1 0

    a 1 2T im e o u t h o te l 2

    T im e o u t h o te l 1

    R e c e iv e r e p ly 3

    T im e o u t h o te l 3

    R e c e iv e r e p ly 1

    a 1 6a 1 7

    C h a rg ec re d i t c a rd

    a 1 8

    N o tifyc u s to m e r

    F in d ah o te l e

    a 1 9

    B o o k h o te l

    c 1

    c 2

    c 3

    c 4

    c 5

    c 7

    c 1 1c 1 0

    c 9c 8

    c 6

    c 1 6c 1 4 c 1 5

    c 1 3c 1 2

    How do you know your Flow-Chart is correct?

    A business process example is used to show how to use process graphs to represent and analyze process modelsAn online name-your-price hotel reservation process modelIdentify blocks that are to be modeled as subprocesses

    Two blocks are identified in this model

    a 2

    a 3

    s

    a 1R e c e iv ere q u e s t

    V e r ifyc re d i tc a rd

    A s k fo rr e su b m is s io n

    o f r e q u e s t

    a 5F in d n om a tc h

    a 4T im e

    o u t

    a 6

    C o n ta c th o te l 1

    a 7

    C o n ta c th o te l 2

    a 8C o n ta c th o te l 3

    a 1 5

    E v a lu a ter e p l ie s

    a 9

    a 1 1R e c e iv e r e p ly 2

    a 1 3a 1 4

    a 1 0

    a 1 2T im e o u t h o te l 2

    T im e o u t h o te l 1

    R e c e iv e r e p ly 3

    T im e o u t h o te l 3

    R e c e iv e r e p ly 1

    a 1 6a 1 7

    C h a rg ec re d i t c a rd

    a 1 8

    N o tifyc u s to m e r

    F in d ah o te l e

    a 1 9

    B o o k h o te l

    c 1

    c 2

    c 3

    c 4

    c 5

    c 7

    c 1 1c 1 0

    c 9c 8

    c 6

    c 1 6c 1 4 c 1 5

    c 1 3c 1 2

    Search hotels

    Find a match

  • What is Supply Chain Scheduling?

    Classical Scheduling Problem

    M1

    M2

    J1

    J1

    J2 J3

    J2 J3

    Supply Chain Scheduling Problem

    M1

    M2

    J1

    J1

    J2 J3

    J2 J3

    MachinesM1 and M2are operatedby the samedecision maker.

    MachinesM1 and M2are operatedby differentdecision makers.

    Supplier

    Manufacturer

    Manufacturer

    Distributor

  • What is Supply Chain Scheduling?

    Classical Scheduling Problem

    M1

    M2

    J1

    J1

    J2 J3

    J2 J3

    Supply Chain Scheduling Problem

    J1

    J1

    J2 J3

    J2 J3

    Supplier

    Manufacturer

  • Inventory: Lead Time

    Lead time of order 1

    Effective lead time of order 1

    Lead time of order 2

    Effective lead time of order 2

    Order 1 is placed

    Order 1 is received

    Order 2 is placed

    Order 2 is received

    Time t

  • SCHEME

    Demand Rate Deterministic

    Demand Rate Stochastic

    Lead Time Deterministic

    Model 0 Model 1

    Lead Time Stochastic

    Model 2 Model 3

    Common Assumption: Exponential Distribution

  • What is Research: ExampleTravel Salesman Problem (TSP)Chinese Postman Problem (CPP)Lawn Mowing Problem (LMP)

    Vehicle Routing Problem (VRP)m-TSP

  • Travel Salesman Problem (TSP)Given a set of points and all distances between them, find the tour that visits all points and cover minimum distances.

  • Travel Salesman Problem (TSP)

    Given a set of points and all distances between them, find the tour that visits all points and cover minimum distances.Problem Formulation

    Minimize

    Subject To

  • Chinese Postman Problem (CPP)Lawn Mowing Problem (LMP)

  • Vehicle Routing Problem (VRP)What if there are m salesmen? What if there are m salesmen with logistical realism?

    m vehicles with some capacities (possibly different capacities by vehicle type, time windows etc.

  • { }1,0

    xxQxopt

    st

    Basic Idea

    { }1,0

    ,...1

    ==x

    mibxaj

    ijij

    =j

    jj xcxfopt )(

    Linear to Non-linear!!!

  • Link Analysis

  • What are the issues here?How do you quantify (measure) your observations (people or dog)?How do you characterize your observations?How do you classify (match) them?Others?

  • SVM:Support Vector Machine

    Dimension ReductionDimension Expansion

  • Plane Bisect Closest Points

    dc

  • Dual of Closest Points Method is Support Plane Method

    ( )

    2 2

    ,1

    1 1

    1 12 2

    . . 1 1 . . 1

    0 1,..,

    || || || ||min mini i iw bi

    i i i ii i

    y x w

    s t s t y x w b

    i

    =

    = =

    =

    l

    l

    Solution only depends on support vectors: 0i>

    1

    1 1:

    1 1i i i ii

    i Classw y x y

    i Class

    =

    = = l

  • Linearly Inseparable Case:Supporting Plane Method

    ( )1

    1 22, ,

    min || ||

    1.

    1,..,0

    w b z

    i i

    ii

    i

    i

    w C z

    zy x w bs t

    iz

    =

    +

    +

    =+

    l

    l

    Just add non-negative errorvector z.

  • Dual of Closest Points Method is Support Plane Method

    ( )

    2 2

    ,1 1

    1 1

    1 12 2

    . . 1 1 . . 0

    0 0 1,..,

    || || || ||min mini i i iw bi i

    i i i i ii i

    i i

    y x w C z

    s t s t y x w b z

    D z i

    = =

    +

    = = +

    =

    l l

    l

    Solution only depends on support vectors: 0i >

    1i i i

    iw y x

    =

    =l

  • Marron

  • Marron

  • Marron

  • What is Hot in B-Stat?Supply Chain ManagementFinancial Statisticse-Business (real time business)Marketing analysisData MiningValue NetSecurityAuction (On-Line)Supply Chain Event Management (Sense & Response)More Examples? (RFID/Search Engine)

  • A Specific Example: RFID

  • Statistics (Fundamental)Population vs Sample

    Distribution AssumptionParameter EstimationStatistical Inference

    Observations of Entire PopulationEmpirical Distribution FunctionDensity EstimationPopulation DescriptionForecasting

  • Low-Storage Single-Pass Density Estimation

    UnivariateCertain (say, 20) representative quantiles(Cubic Smoothing) Spline fitting to these quantilesStatistical inference

    MultivariateConvex Hulls PeelingCertain (say, 20) representative convex hullsThin Plate Splines fitting to these quantilesStatistical inference

  • Convex Hull Peeling Depth Contour

  • Astronomy Dataset: Near infra-red vs. blue-green(subset of 11355 for illustration/comparison)

  • Astronomy Dataset: Depth Contour

  • Astronomy Dataset: Thin Plate Splines

  • Secure Communication (PAIN)Privacy

    message is secret

    Authenticationmessage originates/reaches from a particular sender/receiver

    Integritymessage is not altered

    Non-repudiationsender can not deny having sent the message

  • Industrial Statistics

  • Industrial Statistics (Lin)

    Process Capacity Index (PCI)Chao and Lin (2005)

    Control ChartYeh and Lin (2002, 2005)

    ReliabilityHighly reliable products

    Design of ExperimentLins recent work

    Quality Assurance & Six-Sigma Development Data Management & Value NetOthers

  • Whats Next?

    Quality inService Industry(Service Science)

  • Process Capacity IndexCpCpkCpmCpmkCp(u,v)Cpwetc.

  • Process capability

    Over time, a typical process will shift and drift by approximately 1.5

  • As a bio-statistician, you must knowp-valueAs an industrial-statistician, you must knowCp value

    Common values:t, Z, 2, F etc.

  • Basic Idea:Process yield = 2(3*Cy)-1

    Thus, we (Chao and Lin, 2005) have

  • Shewhart Control Charts

    0Subgroup 5 10 15 20 2573.985

    73.995

    74.005

    74.015

    Sam

    ple

    Mea

    n

    X=74.00

    3.0SL=74.01

    -3.0SL=73.99

    0.00

    0.01

    0.02

    Sam

    ple

    StD

    ev

    S=0.009240

    3.0SL=0.01930

    -3.0SL=0.00E+0

    Xbar and S Chart for Piston Ring Example

  • Yeh and Lin (2003)

  • Design of Experiment

    How to collect useful information?

  • AgriculturalExperiments

    IndustrialExperiments

    Number of factors Small LargeNumber of Runs Large SmallReproducibility Large SmallTime taken Long ShortBlocking Nature Not obviousMissing values Often SeldomRandomization Important OKOther

    Designing industrial experiments is very different From designing agricultural experiments

  • What is a

    good design???

  • Design of Experiment (Lin)Multiple Response Problems

    Optimization: Kim and Lin (JRSS-C, 2000)Design: Chang, Lo, Lin & Young (JSPI, 2001)

    Computer ExperimentBeattie and Lin (1998)

    Dispersion EffectMcGrath and Lin (Technometrics, 2002)

    Foldover PlanLi and Lin (Technometrics, 2003)

    Supersaturated DesignsLin (Technometrics, 1993, 1995, 2001) and others

    Uniform DesignsFang, Lin, Winker & Yang (Technometrics, 1999)

  • Supersaturated Designs

    Lin (UTK Technical Report, 1991)Lin (Technometrics, 1993, 1995, 2001) and many others

  • Supersaturated Design

    # of experimental runs is less than # of experimental variables

    # X1 X2 . X24 Resp1 2 . . . 14

    y1 y2 . . . y14

  • Half Fraction of William's (1968) DataFactor

    Run 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 y1 + + + - - - + + + + + - + - - + + - - + - - - + 1332 + - - - - - + + + - - - + + + + - + - - + + - - 623 + + - + + - - - - + - + + + + + + - - - - + + - 454 + + - + - + - - - + + - + - + + - + + + - - - - 525 - - + + + + - + + - - - + - + + + - - + - + + + 566 - - + + + + + - + + + - - + + - + + + + + + - - 477 - - - - + - - + - + - + + + - + + + + + + - - + 888 - + + - - + - + - + - - - - - - - - + - + + + - 1939 - - - - - + + - - - + + - - + - + + - - - - + + 32

    10 + + + + - + + + - - - + - + + - + - + - + - - + 5311 - + - + + - - + + - + - - + - - - + + - - - + + 27612 + - - - + + + - + + + + + - - + - - + - + + + + 14513 + + + + + - + - + - - + - - - - - + - + + - + - 13014 - - + - - - - - - - + + - + - - - - - + - + - - 127

    Lin (1993, Technometrics)

    Supersaturated Design Example

  • How can we study k parameterswith n(

  • Run FactorsNo. I 1 2 3 4 5 6 7 8 9 10 (11)1 + + + - + + + - - - + -2 + + - + + + - - - + - +3 + - + + + - - - + - + +4 + + + + - - - + - + + -5 + + + - - - + - + + - +6 + + - - - + - + + - + +7 + - - - + - + + - + + +8 + - - + - + + - + + + -9 + - + - + + - + + + - -

    10 + + - + + - + + + - - -11 + - + + - + + + - - - +12 + - - - - - - - - - - -

    Supersaturated Design From Hadamard Matrix of Order 12(Using 11 as the branching column)

  • X = H RH

    orthogonal orthogonal

    not orthogonalThus Permute rows of RHto minimize , say.E s( )2

  • UD OD SSD

    Fang, Lin & Ma (2000)

  • SSD: Looking AheadSupersaturated Design

    SSD is much more mature than ever.

    Micro-Array Design and Analysis

    Computer Experiment: Model Building (using SSD)

    Higher Level SSD

    Spotlight Interaction Effects (Lin, 1998, QE)

    Combination Designs: Rotated FFD & SSD

  • Computer Experiment

    Beattie and Lin (1998)

  • Statistics vs. Engineering Models

    Statistical Model

    y=0+ixi+ijxixj+

    += ),(xfy

  • Statistical Simulation Research

    Random Number GeneratorsDeng and Lin (1997, 2001)

    Robustness of transformation(Sensitivity Analysis)

    From Uniform random numbers to other distributions

  • A Typical Engineering Model (page 1 of 3, in Liao and Wang, 1995)

  • Irrelevant Issues

    ReplicatesBlockingRandomization

    Question: How can a computer experiment be run in an efficient manner?

  • Rotated Factorial Designs

    Computer experiments are gaining in popularity

    main research area of the next 10 years

    Rotated factorial designsgood factorial design properties(orthogonality and structure)good Latin hypercube properties(unique and equally-spaced projections)easy to constructcomparable by Bayesian criteriavery suitable for computer experiments

  • Reliability

    More on Highly reliable Products

  • Quality Assurance & Six-Sigma: The Past & The Future

    Later

  • Define

    Improve Analyze

    MeasureControl

    Six Sigma Project Process

    Rigorous, data-driven and customer-focusedmethodology

  • Governmental Statistics

    (Official Statistics)

  • Statistics

    Information about the state

  • Official Statistics

    -- (390 B.C.)

    Index ManagementSampling

  • Index ManagementWhat to measure?How to measure?

    (population): Age, geographical, etc and distribution (Wealth):

    Economics IndexSocial Index

  • What to Measure: IndexAccidental Rate Example

    Number of Car AccidentsDivided by

    ???* Population -- China* Total Number of cars -- Japan* Total mileages driven -- USA

  • 6.68

    3.06

    3.934.30

    3.384.77

    1.2

    5.86

    5.424.57

    6.16.42

    7.17

    3.98 4.22

    -2.18

    -4

    -2

    0

    2

    4

    6

    8

    82 83 84 85 86 87 88 89 90 91Q1 91Q2 91Q3 91Q4 92Q1 92Q2 92Q3 92Q4

    % 90-2.18

  • Government/Official StatisticsIndex ManagementFrom Census to SamplingSampling in modern technology(Web, Palm)

    Error in VariablesMissing valuesData Mining

  • Send $500 toDennis Lin483 Business BuildingSupply Chain &Information SystemsPenn State University

    +1 814 865-0377 (phone)+1 814 863-7076 (fax)[email protected]

    (Customer Satisfaction or your money back!)

  • Potential TopicsWhats Statistics All AboutBIG StatisticsResponse Surface Methodology

    Multiple Response OptimizationDesign of ExperimentQuality Assurance & Six SigmaStatistical Data Mining & Machine Learning

    Classification & Support Vector MachineFuture Opportunities

    Search Engine & RFID

  • Assignment #1Suppose thaty1=0+1x+11x2+1 and y2=0+1x+11x2+2Find x=x* such that both Yis are maximized.

    Suppose thaty1=f1(x,1)+1 and y2=f2(x,2)+2Find x=x* such that both Yis are maximized.What will you do?