references3a978-1-4757... · 2017. 8. 25. · references [1] i. abounadi. stochastic approximation...

16
References [1] I. Abounadi. Stochastic approximation for non-expansive maps: Application to Q- learning algorithms. Unpublished Ph.D. Thesis, MIT, Department of Electrical Engi- neering and Computer Science, February, 1998. [2] I. Abounadi, D. Bertsekas, and V. Borkar. Learning algorithms for Markov decision processes with average cost. Technical Report, LIDS-P-2434, MIT, MA, USA., 1998. [3] I. Abounadi, D. Bertsekas, and V. Borkar. Stochastic approximations for non-expansive maps: Application to Q -learning algorithms. Technical Report, LIDS-P-2433, MIT, MA, USA., 1998. [4] J.S. Albus. Brain, Behavior and Robotics. Byte Books, Peterborough, NH, USA, 1981. [5] M.H. Alrefaei and S. Andrad6ttir. A simulated annealing algorithm with constant tem- perature for discrete stochastic optimization. Management Science, 45(5):748-764, 1999. [6] M.H. Alrefaei and S. Andrad6ttir. A modification of the stochastic ruler method for discree stochastic optimization. European Journal of Operational Research (to appear), 2002. [7] T. Altiok and S. Stidham. The allocation of interstage buffer capacities in production lines. liE Transactions, 15(4):292-299, 1984. [8] S. Andrad6ttir. Simulation optimization. In Handbook of Simulation (edited by Jerry Banks), Chapter 9. John Wiley and Sons, New York, NY, USA, 1998. [9] A. B. Badiru and D. B. Sieger. Neural network simulation metamodel in economic analysis of risky projects. European Journal of Operational Research, 105:130-142, 1998. [10] N. Barish and N. Hauser. Economic design of control decisions. Journal of Industrial Engineering, 14:125-134, 1963. [11] A.G. Barto, S.J. Bradtke, and S. P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81-138, 1995.

Upload: others

Post on 15-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • References

    [1] I. Abounadi. Stochastic approximation for non-expansive maps: Application to Q-learning algorithms. Unpublished Ph.D. Thesis, MIT, Department of Electrical Engi-neering and Computer Science, February, 1998.

    [2] I. Abounadi, D. Bertsekas, and V. Borkar. Learning algorithms for Markov decision processes with average cost. Technical Report, LIDS-P-2434, MIT, MA, USA., 1998.

    [3] I. Abounadi, D. Bertsekas, and V. Borkar. Stochastic approximations for non-expansive maps: Application to Q -learning algorithms. Technical Report, LIDS-P-2433, MIT, MA, USA., 1998.

    [4] J.S. Albus. Brain, Behavior and Robotics. Byte Books, Peterborough, NH, USA, 1981.

    [5] M.H. Alrefaei and S. Andrad6ttir. A simulated annealing algorithm with constant tem-perature for discrete stochastic optimization. Management Science, 45(5):748-764, 1999.

    [6] M.H. Alrefaei and S. Andrad6ttir. A modification of the stochastic ruler method for discree stochastic optimization. European Journal of Operational Research (to appear), 2002.

    [7] T. Altiok and S. Stidham. The allocation of interstage buffer capacities in production lines. liE Transactions, 15(4):292-299, 1984.

    [8] S. Andrad6ttir. Simulation optimization. In Handbook of Simulation (edited by Jerry Banks), Chapter 9. John Wiley and Sons, New York, NY, USA, 1998.

    [9] A. B. Badiru and D. B. Sieger. Neural network simulation metamodel in economic analysis of risky projects. European Journal of Operational Research, 105:130-142, 1998.

    [10] N. Barish and N. Hauser. Economic design of control decisions. Journal of Industrial Engineering, 14:125-134, 1963.

    [11] A.G. Barto, S.J. Bradtke, and S. P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81-138, 1995.

  • 540 SIMULATION-BASED OPTIM/7ATION

    [12] A.G. Barto, R.S. Sutton, and C.W. Anderson. Neuronlilre elements that can solve diffi-cult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13:835-846, 1983.

    [13] R.E. Bechhofer, T.J. Santner, and D.J. Goldsman. Design and Analysis of Experiments for Statistical Selection, Screening, and Multiple Comparisons. John Wiley, New York, NY, USA, 1995.

    [14] R. Bellman. The theory of dynamic programming. BulL Amer. Math. Soc, 60:503-516, 1954.

    [15] R. E. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.

    [16] R.E. Bellman and S. E. Dreyfus. Applied Dynamic Programming. Princeton University Press, Princeton, NJ, 1962.

    [17] P.P. Belobaba. Application of a probabilistic decision model to airline seat inventory control. Operations Research, 37:183--197, 1989.

    [18] D. Bertsekas. Non-Unear Programming. Athena Scientific, Belmont, MA, USA, 1995.

    [19] D. Bertsekas and J. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, Bel-mont, MA, USA, 1996.

    [20] D.P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, Bel-mont, MA, USA, 1995.

    [21] D. Blackwell. Discrete dynamic programming. Ann. Math. Stat., 33:226-235, 1965.

    [22] J. Boesel and B.L. Nelson. Accounting for randomness in heuristic simulation optimiza-tion. Preprint, Northwestern University, Department oflndustrial Engineering.

    [23] L.B. Booker. Intelligent Behaviour as an Adaptation to the Task Environment. PhD thesis, University of Michigan, Ann Arbor, Ml, USA, 1982.

    [24] V. Borkar and P. Varaiya. Adaptive control of Markov chains I. finite parameter set IEEE Transactions on Automatic Control, 24:953-958, 1979.

    [25] V. S. Borkar. Stochastic approximation with two-time scales. Systems and Control Letters, 29:291-294, 1997.

    [26] V. S. Borkar. Asynchronous stochastic approximation. SIAM J. Control Optim., 36 No 3:~51, 1998.

    [27] V. S. Borkar and V. R. Konda. The actor-critic algorithm as multi-time scale stochastic approximation. Sadhana (Proc. Indian Academy of Sciences- Eng. Sciences), 27(4), 1997.

    [28] V. S. Borkar and S.P. Meyn. The ODE method for convergence of stochastic approx-imation and reinforcement learning. SIAM Journal of Control and Optimization, 38 (2):447-469, 2000.

    [29] V.S. Borkar and K. Soumyanath. A new analog parallel scheme for fixed point com-putation, part 1: Theory. IEEE Transactions on Circuits and Systems 1: Theory and Applications, 44:351-355, 1997.

  • REFERENCES 541

    [30] J.A. Boyan and A.W. Moore. Generalization in Reinforcement Learning: Safely Ap-proximating the Value Function. Advances in Neurallnfonnation Processing Systems, pages 369-376, 1995.

    [31] SJ. Bradtke and M. Duff. Reinforcement learning methods for continuous-time Markov decision problems. In Advances in Neurallnfonnation Processing Systems 7. MIT Press, Cambridge, MA, USA, 1995.

    [32] G. Bronson. C For Engineers and Scientists. West Publishing Company, MN, USA, 1993.

    [33] Y. Carson and A. Maria. Simulation optimization: Methods and applications. Proceed-ings of the 1997 Winter Simulation Conference, pages 118-126, 1997.

    [34] H. Cohn and M. J. Fielding. Simulated annealing:searching for an optimal temperature schedule. SIAM Jounuzl of Optimization (to appear).

    [35] R. Crites and A. Barto. Improving elevator performance using reinforcement learning. In Neurallnfonnation Processing Systems (NIPS). 1996.

    [36] C. Darken, J. Chang, and J. Moody. Learning rate schedules for faster stochastic gradient search. In D.A. White and D.A. Sofge, editors, Neural Networks for Signal Processing 2- Proceedings of the 19921EEE Workshop. IEEE Press, Piscataway, NJ, 1992.

    [37] T.K. Das, A. Gosavi, S. Mahadevan, and N. Marchalleck. Solving semi-Markov de-cision problems using average reward reinforcement learning. Management Science, 45(4):560-574, 1999.

    [38] T.K. Das, V. Jain, and A.Gosavi. Economic design of dual-sampling-interval policies for x-bar charts with and without run rules. liE Transactions, 29:497-506, 1997.

    [39] T.K. Das and S. Sarkar. Optimal preventive maintenance in a production inventory system. liE Transactions, 31:537-551, 1999.

    [40] S. Davies. Multi-dimensional interpolation and lriangulation for reinforcement learning. Advances in Neurallnfonnation and Processing Systems, 1996.

    [41] L. Devroye, L. Gyorfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, New York, USA, 1996.

    [42] S. A. Douglass. Introduction to Mathematical Analysis. Addison-Wesley Publishing Company, Reading, MA, USA., 1996.

    [43] M. J. Fielding. Optimisation by Simulated Annealing. PhD thesis, The University of Melbourne, DepartmentofMathematics, Australia, 1999.

    [44] J. Filar and K. Vrieze. Competitive Markov Decision Processes. Springer-Verlag, New York, NY, USA, 1997.

    [45] P.A. Fishwick. Neural network models in simulation: A comparison with traditional modeling approaches. Proceedings of the 1989 Winter Simulation Conference, pages 702-710, 1989.

    [46] B.L. Fox and G.W. Heine. Probabilistic search with overrides. Annals of Applied Probability, 5:1087-1094, 1995.

  • 542 SIMULATION-BASED OPTIMIZATION

    [47] M.C. Fu. Optimization via simulation: A review. Annals of Operations Research, 53:199-247, 1994.

    [48] M.C. Fu and J. Hu. Efficient design and sensitivity analysis of control charts using Monte Carlo simulation. Management Science, 45(3):395-413, 1999.

    [49] E.D. Gaughan. Introduction to Analysis, 4th edition. Brooks/Cole Publishing Company, Belmont, CA, USA, 1993.

    [50] S.B. Gelfand and S.K. Mitter. Simulated annealing with noisy or imprecise energy measurements. Journal of Optimization Theory and Applications, 62(1):49-62, 1989.

    [51] F. Glover. Tabu Search: A Tutorial. Interfaces, 20(4):74-94, 1990.

    [52] F. Glover and S. Hanafi. Tabu search and finite convergence. 2001. Working Paper, University of Colorado, Colorado.

    [53] F. Glover, J.P. Kelly, and M. Laguna. New advances and applications of combining simulation and optimization. Proceeedings of the 1996 Winter Simulation Conference, pages 144-152, 1996.

    [54] F. Glover and M. Laguna. Tabu Search. Kluwer Academic Publishers, Norwell, MA, USA, 1998.

    [55] D. Goldsman and B.L. Nelson. Ranking, selection, and multiple comparisons in com-puter simulation. Proceedings of the 1994 Winter Simulation Conference, pages 192-199, 1994.

    [56] D. Goldsman and B.L. Nelson. Comparing Systems via Simulation. In Handbook of Simulation (edited by Jerry Banks), Chapter 8. John Wiley and Sons, New York, NY, USA, 1998.

    [57] D. Goldsman, B.L. Nelson, and B. Schmeiser. Methods for selecting the best system. Proceedings of the 1991 Winter Simulation Conference (B. L. Nelson, W.D. Kelton, and G.M. Clark, eds.), pages 177-186, 1991.

    [58] W.B. Gong, Y.C. Ho, and W. Zhai. Stochastic comparison algorithm for discrete opti-mization with estimation. Proc. 31st. Conf. Decision Control, pages 795-800, 1992.

    [59] A. Gosavi. An Algorithm for solving semi-Markov decision problems using reinforce-ment learning: Convergence analysis and numerical results. Unpublished Ph.D. disser-tation, Department of Industrial and Management Systems Engineering, University of South Florida, Tampa, FL, USA, 1999.

    [60] A. Gosavi. The effect of noise on artificial intelligence and meta-heuristic techniques. In Proceedings of the Artificial Neural Networks in Engineering Conference (Intelligent Engineering Systems Through Artificial Neural Networks), volume 12, pages 981-988. American Society of Mechanical Engineering Press, 2002.

    [61] A. Gosavi. A reinforcement learning algorithm based on policy for average reward: Empirical results with yield management and convergence analysis. Machine Learning (to appear), 2002. (fechnical report available with author until paper appears in print).

  • REFERENCES 543

    [62] A. Gosavi. Asynchronous convergence and boundedness in reinforcement learning. Proceedings of the 2003 Institute of Industrial Engineering Reseil1f:h Conference in Portland, Oregon, 2003.

    [63] A. Gosavi. Reinforcement learning for long-run average cost. European JoU1'111ll of Operational Research (to appear), 2003. (Technical Report available with author until paper appears in print).

    [64] A. Gosavi, N. Bandla, and T. K. Das. A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking. liE Transactions (Special issue on Large-Scale Optimization), 34(9):729-742, 2002.

    [65] A. Gosavi, T. K. Das, and S. Sarkar. A simulation-based learning automata framework for solving semi-Markov decision problems. liE Transactions (to appear), 2003.

    [66] B.S. Gottfried. Programming with C. McGraw Hill, New York, NY, USA, 1991.

    [67] B. Hajek. Cooling schedules for optimal annealing. Mathematics of Operations Re-search, 13:311-329, 1988.

    [68] C. Harrell, B.K. Ghosh, and R. Bowden. Simulation Using Promodel. McGraw Hill Higher Education, Boston, MA, USA, 2000.

    [69] T. Hastie, R. 'libshirani, andJ. Friedman. The ElementsofStatisticalLeaming. Springer, New York, NY, USA, 2001.

    [70] S. Haykin. Neural Networks: A Comprehensive Foundation. McMillan, New York, NY, USA,1994.

    [71] S. Heragu. Facilities Design. PWS Publishing Company, Boston, MA, USA, 1997.

    [72] S. S. Heragu and S. Rajgopalan. A literature survey of the AGV ftowpath design prob-lem. Technical Report No 37-96-406 at the Rensselaer Polytechnic Institute, DSES Department, NY, USA, 1996.

    [73] F. S. Hillier and G. J. Lieberman. Introduction to Operations Research, Seventh Edition. McGraw Hill, New York, 2001.

    [74] G.E. Hinton. Distributed representations. Technical Report, CMU-CS-84-157, Carnegie Mellon University, Pittsburgh, PA, USA, 1984.

    [75] Y.C. Ho and X.R. Cao. Perturbation Analysis of Discrete Event Dynamic Systems. Kluwer, 1991.

    [76] Y. Hochberg and A.C. Tamhane. Multiple Comparison Procedures. Wiley, 1987.

    [77] J.H. Holland. AdtJptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI, USA, 1975.

    [78] J.H. Holland. Escaping brittleness: The possibility of general-purpose learning algo-rithms applied to rule-based systems. In R.S. Michalski, J.G. Carbonell, and T.M. Mitchell, editors, Machine Learning: An Artificial Intelligence Approach, pages 593-623. Morgan Kaufmann, San Mateo, CA, USA, 1986.

  • 544 SIMULATION-BASED OPTIMI7.ATION

    [79] T. Homem-de Mello. Variable-sample methods and simulated annealing for discrete stochastic optimization. 2001. Worlting paper, Department of Industrial, Welding, and Systems Engineering, The Ohio State University, Columbus, OH, USA.

    [80] R. Hooke and T.A. Jeeves. Direct search of numerical and statistical problems. ACM, 8:212-229, 1966.

    [81] R. Howard. Dynamic Programming and Marlwv Processes. MIT Press, Cambridge, MA,1960.

    [82] S.H. Jacobson and L.W. Schruben. A harmonic analysis approach to simulation sensi-tivity analysis. liE Transactions, 31(3):231-243, 1999.

    [83] A. Jalali and M. Ferguson. Computationally efficient adaptive control algorithms for Markov chains. In Proceedings of the 29th IEEE Conference on Decision and Control, pages 1283 - 1288. 1989.

    [84] S.A. Johnson, J.R. Stedinger, C.A. Shoemaker, Y. Li, andJ .A. Tejada-Guibert. Numerical solution of continuous state dynamic programs using linear and spline interpolation. Operations Research, 41(3):484-500, 1993.

    [85] L.P. Kaelbling, M.L. Littman, and A.W. Moore. Reinforcement learning: A survey. Journal of Artificial Intelligence Researr:h, 4:237-285, 1996.

    [86] P. Kanerva. Sparse Distributed Memory. MIT Press, Cambridge, MA, USA, 1988.

    [87] J.G. Kemeny and J.L. Snell. Finite Marlwv Chains. van Nostrand-Reinhold, NY, USA, 1960.

    [88] J. Kiefer and J. Wolfowitz. Stochastic estimation of the maximum of a regression function. Ann. Math. Stat., 23:462-466, 1952.

    [89] R.A. Kilmer and A. E. Smith. Using artificial nemal networks to approximate a discrete event stochastic simulation model. Intelligent Engineering Systems Through Anificial Neural Networlcs (edited by C. H. Dagli, L.l. Burke, B.R. Fernandez, and J. Ghosh, ASME Press), 3:631-636, 1993.

    [90] R.A. Kilmer, A. E. Smith, and LJ. Shuman. Computing confidence intervals for stochas-tic simulation using nemal network metamodels. Computers and Industrial Engineering (to appear), 1998.

    [91] S-H Kim and B.L. Nelson. A fully sequential procedure for indifference-zone selection in simulation. ACM Transactions on Modeling and Computer Simulation, 11:251-273, 2001.

    [92] S. Kitpatrick, C.D. Gelatt, and M.P. Vecchi. Optimization by simulated annealing. Science, 220:671-680, 1983.

    [93] J.P.C. Kleijnen. Sensitivity analysis and optimization in simulation:design of experi-ments and case studies. In Proceedings of the 1995 Winter Simulation Conference, pages 133-140. 1995.

    [94] A. H. Klopf. Brain function and adaptive systems - a heterostatic theory. Technical Report AFCRL-72-0164, 1972.

  • REFERENCES 545

    [95] D.E. Knuth. The An of Computer Programming, Voll: Seminumerical Algorithms, 3rd edition. Addison-Wesley, Reading, MA, 1998.

    [96] V.R. Konda and V. S. Borkar. Actor-critic type learning algorithms for Marlrov decision processes. SIAM JourntJl on Control and Optimization, 38(1):94-123, 1999.

    [97] E. Kreyszig. Advanced Engineering Mathematics. John Wiley and Sons, 1998.

    [98] P.R. Kumar. A survey of some results in stochastic adaptive control. SIAM JounuJl of Control and Optimization, 23:329-380, 1985.

    [99] P.R. Kumar and P. Varaiya. Stochostic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall, Englewood Cliffs, NJ, USA, 1986.

    [100] HJ. Kushner and D.S.Clark. Stochostic Approximation Methods for Constrained and Unconstrained Systems. Springer Verlag, New York, 1978.

    [101] J. C. Lagarias, J. A. Reeds, M. H. Wright, and P.E. Wright. Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions. SIAM Journal on Optimization, 9 (1):112-147, 1998.

    [102] A. M. Law and W. D. Kelton. Simulation Modeling and Analysis. McGraw Hill, Inc., New York, NY, .USA, 1999.

    [103] P. L'Ecuyer. Good Parameters for Combined Multiple Recursive Random Number Generators. Operations Research, 47:159-164, 1999.

    [ 104] K. Littlewood. Forecasting and control of passenger bookings. In Proceedings of the 12thAGIFORS (Airline Group of the International Federation of Operational Research Societies Symposium), pages 95-117, 1972.

    [ 105] L. Ljung. Analysis of recursive stochastic algorithms. IEEE Transactions on Automatic Control, 22:551-575, 1977.

    [106] M. Lundy and A. Mees. Convergence of the annealing algorithm. Mathematical Pro-gramming, 34:111-124, 1986.

    [107] G.R. Madey, I. Wienroth, and V. Shah. Integration of neurocomputing and system simulation for modeling continuous improvement systems in manufacturing. Journal of Intelligent Manufacturing, 3:193-204, 1992.

    [108] S. Mahadevan. To discount or not to discount: A case study comparing a-learning and Q-leaming. In Proceedings of the 11th International Conference on Machine Learning, pages 164-172. New Brunswick, NJ, 1994.

    [109] S. Mahadevan. Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22(1):159-95, 1996.

    [110] S. Mahadevan and G. Theocharous. Optimizing production manufacturing with rein-forcement learning. Eleventh International FLAIRS conference, pages 372-377, 1998.

    [111] Matlab. The Student edition of MATLAB. Prentice Hall, Englewood Cliffs, NJ, USA, 1995.

  • 546 SIMULATION-BASED OPTIMIZATION

    [ 112] R. Matthciij and J. Molenaar. Ordinary DiffeTr!ntial Equations in Theory and Practice. John Wiley and Sons, West Sussex, England, 1996.

    [ 113] 1.1. McGill anciG. J. van Ryzin. Revenue management: Research overview and prospects. Transporation Science, 33(2):233-256, 1999.

    [ 114] G. Meghabghab and G. Nasr. Iterative RBF neural networks as metamodels of stochastic simulations. 2nd International ConfeTr!nce on lnteUigent Processing and Manufacturing of Materials, Honolulu, Hawai~ USA, 2:729-734, 1999.

    [115] N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller. Equation of state calculations by fast computing machines. J. Chem. Plrys, 21:1087-1092, 1953.

    [116] T. M. Mitchell. Machine Learning. McGraw Hill, Boston, MA, USA, 1997.

    [117] D. G. Montgomery, G. C. Runger, and N. A. Hubele. Engineering Statistics, Second Edition. John Wtley and Sons, New York, NY, USA, 2001.

    [118] D.C. Montgomery. Introduction to Statistical Quality Control (Fourth Edition). John Wiley and Sons, New York, NY, USA, 2001.

    [119] R.H. Myers and D.C. Montgomery. Response Suiface Methodology: Process and Prod-uct Optimization Using Designed Experiments. Wiley, New York, NY, USA, 1995.

    [120] K.S. Narendra and M.A.L. Thathachar. Learning Automata: An Introduction. Prentice Hall, Englewood Cliffs, NJ, USA, 1989.

    [121] K.S. Narendraand R. M. Wheeler. An N-Playersequential stochastic game with identical payoffs. IEEE Transactions Systems, Man, and Cybernetics, 13:1154-1158, 1983.

    [122] J.P. Nash. Equilibrium points inn-person games. Proceedings, Nat. Acad. of Science, USA, 36:48-49, 1950.

    [123] J. A. Neider and R. Mead. A Simplex Method for Function Minimization. Computer Journal, 7:308-313, 1965.

    [124] B.L. Nelson. Designing efficient simulation experiments. Proceedings of the 1992 Winter SimuliJtion ConfeTr!nce. (J.J. Swain, D. Goldsman, R.C. Crain and J.R. Wilson eds.}, pages 126-132, 1992.

    [125] M.L. Padgett and T.A. Roppel. Neural networks and simulation: modeling for applica-tions. Simulation, 58:295--305, 1992.

    [126] S.K. Park and K. W. Miller. Random number generators: good ones are hard to find. Communications of the ACM, 31(10):1192-1201, 1988.

    [127] C. D. Paternina and T.K. Das. Intelligent dynamic control policies for serial production lines. liE Transactions, 33(1):65-77, 2001.

    [128] J. Peng and RJ. Williams. Efficient learning and planning with the DYNA framework. Adaptive Behavior, 1:437-454, 1993.

    [129] D. L. Pepyne, D.P. Looze, C.G. Cassandras, and T.E. Djaferis. Application of Q-learning to elevator dispatching. Unpublished Report, 1996.

  • REFERENCES 547

    [130] G. C. Pflug. Optimization of Stochastic Models: The Interface between simulation and optimization. Kluwer Academic, 1996.

    [131] D.T. Pham and D. Karaboga. lnteUigent Optimisation Techniques: Genetic Algorithms, Tabu Search, Simulated Annealing and Neural Networks. Springer-Verlag, New York, USA, 1998.

    [132] C.R. PhilbrickandP. K. Kitanidis. lmproveddynamicprogrammingmethods for optimal control of lumped-parameter stochastic systems. Operations Research, 49(3):398-412, 2001.

    [133] H. Pierreval. Training a neural network by .simulation for dispatching problems. Pro-ceedings of the Third Rensselaer International Corrference on Computer Integrated Manufacturing, pages 332-336, 1992.

    [134] H. Pierreval and R.C. Huntsinger. An investigation on neural network capabilities as simulation metamodels. Proceedings of the 1992 Summer Computer Simulation Con-ference, pages 413--417, 1992.

    [135] E. L. Plambeck, B.R. Fu, S.M. Robinson, and R. Suri. Sample path optimization of convex stochastic performance functions. Mathematical Programming, 75:137-176, 1996.

    [136] B.T. Poljak and Y:Z. Tsypkin. Pseudogradient adaptation and training algorithms. Au-tomation and Remote Control, 12:83-94, 1973.

    [137] M. A. Pollatschek. Programming Discrete Simulations. Research and Development Books, Lawrence, KS, USA., 1995.

    [138] P. Ponttandolfo, A. Gosavi, O.G. Okogbaa, and T.K. Das. Global supply chain manage-ment: A reinforcement learning approach. International Journal of Production Research (to appear), 40(6):1299-1317, 2002.

    [139] W. H. Press, S.A.Tuekolsk:y, W.T. Vetterling, and B.P.Flannery. Numerical Recipies in C: The Art of Scientific Computing. Cambridge University Press, Cambridge, 1992.

    [140] M. L. Puterman. Markov Decision Processes. Wiley lnterscience, New York, NY, USA, 1994.

    [141] Y. Rinott. On two-stage selection procedures and related probability-inequalities. Com-munications in Statistics: Theory and Methods, A7:799-811, 1978.

    [142] H. Robbins and S. Monro. A stochastic approximation method. Ann. Matk Statist., 22:400-407, 1951.

    [143] L. W. Robinson. Optimal and approximate control policies for airline booking with sequential nonmonotonic fare classes. Operations Research, 43:252-263, 1995.

    [144] S. M. Ross. Stochastic Processes. John Wiley and Sons, New York, NY, 1996.

    [145] S.M. Ross. Introduction to Probability Models. Academic Press, San Diego, CA, USA, 1997.

    [146] R. Y. Rubinstein and A. Shapiro. Sensitivity Analysis and Stochastic Optimization by the Score Function Method. John Wiley and Sons, New York, NY, 1983.

  • 548 SIMULATION-BASED OPTIMIZATION

    [147] W. Rudin. Rea/Analysis. McGraw Hill, N.Y., USA, 1964.

    [148] G. Rudolph. Convergence of evolutionary algorithms in general search spaces. Pro-ceedings of the Third IEEE Conference on Evolutionary Computation, Piscataway, NJ: IEEE Press, pages 50-54, 1996.

    [149] D.E. Rumelhart, G.E. Hinton, and RJ. Williams. Learning internal representations by error propagation. In D.E. Rumelhart and J.L. McClelland, editors, Parallel Distributed Processing: Explorations in the Micro-structure of Cognition. MIT Press, Cambridge, MA,1986.

    [ 150] G.A. Rummery and M. Niranjan. On-line Q-leaming using connectionist systems. Tech-nical Report CUED/F-INFENGffR 166. Engineering Department, Cambridge Univer-sity, 1994.

    [151] A.L. Samuel. Some studies in machine learning using the game of checkers. In E.A. Feigenbaum and J. Feldman, editors, Computers and Thought. McGraw-Hill, New York, 1959.

    [ 152] S. Sarkar and S. Chavali. Modeling parameter space behavior of vision systems using Bayesian networks. Computer Vtsion and Image Understanding, 79:185-223, 2000.

    [153] F.A.V. Schouten and S.G. Vanneste. Maintenance optimization with buffer capacity. European Journal of Operational Research, 82:323-338, 1992.

    [154] L. Schrage. A more portable random number generator. Assoc. Comput. Mach. Trans. Math. Software, 5:132-138, 1979.

    [ 155] A. Schwartz. A reinforcement learning method for maximizing undiscounted rewards. Proceeding of the TenthAnnual Conference on Machine Learning, pages 298-305, 1993.

    [ 156] L.I. Sennott. The computation of average optimal policies in denumerable state Markov control processes. Adv. Appl. Prob., 29:114-137, 1997.

    [157] L.I. Sennott. Stochastic Dynamic Programming and the Control of Queueing Systems. John Wiley and Sons, New York, NY, USA, 1999.

    [158] S. Sethi and G.L. Thompson. Optimal Control Theory: Applications to Management Science and Economics, Second Edition. Kluwer Academic Publishers, Boston, USA, 2000.

    [159] L.S. Shapley. Stochastic games. Proc. Nat. Acad. Sci., USA, 39:1095-1100, 1953.

    [160] D. Simchi-Levi, P. Kaminsky, and E. Simchi-Levi. Designing and Managing a Supply Chain. McGraw Hill, Boston, MA, USA, 2000.

    [161] S. Singh and D. Bertsekas. Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems ( 1996), pages 974-980. 1997.

    [162] J.C. Spall. Multivariate Stochastic Approximation Using a Simultaneous Perturbation Gradient Approximation. IEEE Transactions on Automatic Control, 37:332-341, 1992.

    [163] R. Sutton. Reinforcement Learning. Machine Learning (Special Issue), 8(3), 1992.

  • REFERENCES 549

    [164] R. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA, USA, 1998.

    [165] R.S. Sutton. Temporal Credit Assignment in Reinforcement Learning. PhD thesis, University of Massachusetts, Amherst, MA, USA, May 1984.

    [166] R.S. Sutton. Learning to predict by the method of temporal differences. Machine Learning, 3:9-44, 1988.

    [167] P. Tadepalli and D. Ok. Model-based Average Reward Reinforcement Learning Algo-rithms. Artificial Intelligence, 100:177-224, 1998.

    [168] H.A. Taha. Operations Research: An Introduction. Prentice Hall, NJ., USA, 1997.

    [169] H. Taylor and S. Karlin. An Introduction to Stochastic Modeling. Academic Press, New York. 1984.

    [170] G. Tesauro. Practical issues in temporal difference learning. Machine Learning, 8(3), 1992.

    [ 171] T. Tezcan and A. Gosavi. Optimal buffer allocation in production Jines using an automata search. Proceedings of the 2001 Institute of Industrial Engineering Research Conference in Dallas, Texas, 2001.

    [ 172] M.A.L. Thathachar and K.R. Ramakrishnan. A cooperative game of a pair of learnign automata. Automatica, 20:797-801, 1894.

    [173] M.A.L. Thathachar and P.S. Sastry. Learning optimal discriminant functions through a cooperative game of automata. IEEE Transactions on Systems, Man, and Cybernetics, 17:73-85, 1987.

    [174] J. Tsitsildis. Markov chains with rare transitions and simulated annealing. Mathematics of Operations Research, 14:70-90, 1989.

    [175] J. Tsitsildis. Asynchronous stochastic approximation and Q-leaming. Machine Learn-ing, 16:185-202, 1994.

    [ 176] J. N. Tsitsildis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674-690, 1997.

    [ 177] A. Turgeon. Optimal operation of multi-reservoir power systems with stochastic inflows. Water Resources Research, 16(2):275-283, 1980.

    [178] J.M. Twomey and A.E. Smith. Bias and variance of validation models for function approximation neural networks under conditions of sparse data. IEEE Transactions on Systems, Man., and Cybernetics, 28(3):417-430, 1998.

    [179] P. van Laarhoven and E. Aarts. Simulated Annealing: Theory and Applications. Kluwer Academic Publishers, 1987.

    [180] J.A. E. E. van Nunen. A set of successive approximation methods for discounted Marko-vian decision problems. Z. Operations Research, 20:203-208. 1976.

    [181] J. von Neumann and 0. Morgenstern. The Theory of Games and Economic Behavior. Princeton University Press, Princeton, New Jersey, 1944.

  • 550 SIMULATION-BASED OPTIMIZATION

    [182] C.I. Watkins. Learning from Delayed Rewards. PhD thesis, Kings College, Cambridge, England, May 1989.

    [ 183] P. I. Werb6s. Beyond Regression: New Tools for Prediction and Analysis of Behavioral Sciences. PhD thesis, Harvard University, Cambridege, MA, USA, May 1974.

    [184] P. I. WerbOs. Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research. IEEE Transactions on Systems, Man., and Cybernetics, 17:7-20, 1974.

    [185] R. M. Wheeler and K. S. Narenda Decentralized learning in finite Markov chains. IEEE Transactions on Automatic Control, 31(6):373-376, 1986.

    [186] D. I. White. Dynamic programming, Markov chains, and the method of successive approximations. J. Math. Anal. Appl., 6:373-376, 1963.

    [187] B. Widrow and M.E. Hoff. Adaptive Switching Circuits. In Institute of Radio Engineers, Western Electronic Show and Convention, Convention Record, Part 4, pages 96-104. 1960.

    [188] W.L. Winston. An Introduction to Mathematical Programming. Duxbury Press, CA, USA, 1995.

    [189] I.H. Witten. An adaptive optimal controller for discrete time Markov environments. Information and Control, 34:286-295, 1977.

    [190] D. Yan and H. Mukai. Stochas.tic discrete optimization. SIAM Journal of Control and Optimization, 30:594-612, 1992.

  • Index

    accumulation point, 303 acronyms, 13 actor-critic algorithms, 252

    convergence, 404 address, 435 age replacement, 419 AGV,424 airline revenue IIUIII&gement, 412 approximating sequence, 206 asyncluonism, 387 automated guided vehicles, 424 average,24 average reward, 280

    backpropagation, 73,411 backward recursion, 205 BASIC,434 Bayesian Learning, 126 behavicr, 29 Bellman equation, 162, 181

    average reward, 162 discounted reward, 173 optimality proof, 349 optimality proof for average reward case,

    365 binary trees, 121, 267 Bolzano-Weierstrass, 304 bootstrap, 90 bounded sequence, 300 buffer optimization, 420

    Clanguage,430,433 cardinality, 11 Cauchy sequence, 301 cdf, 22 central ditierences, 98, 3'1:7 central limit theorem, '1:1 Qmpman-Kolmogorov theorem, 142 closed fcrm, 49, 58 co-related data, 43

    complex event, 17 compound event, 17 computational operations research, 1 computer programming, 3, 433 computer programs

    backpropagation batch mode, 470 dynamic programming, 441 MATLAB,531 neural netwcrks, 464 organization, 436 preventive maintenance, 506 random number generators, 437 reinforcement learning, 478 simultaneous perturbation, 439

    continuous functions, 318 continuous time Markov process, 146 continuously differentiable, 319 contraction mapping, 308, 360 control charts, 426 control optimization, 4, 51 convergence, 3 convergence of sequences, 298

    with probability 1, 381 coordinate sequence, 306 cost function, 48 cross-over, 119 cumulative distribution function, 22 curse of dimensionality, 212 curse of modeling, 212

    decision-making process, 184 decreasing sequence, 300 ditierential equations, 380 discounted reward, 171 distributions, 21 domain, 291 DTMDP,183 dynamic optimization, 4 dynamic programming, 4, 52, 161, 170 dynamic systems, 29

  • 552

    elevatcr scheduling, 427 embedded Markov chain, 183 EMSR, 213, 416 ergodic, 146 Euclidean norm, 10, 324 event, 16 exhaustive enum«ation, 157, 186 expected value, 24 explcration, 230

    feedback, 124,228,279 finite differences, 100, 3rt finite horizon, 203 finite horizon problems, 259 fit, 67 fixed point thecnmt, 312 FOIURAN, 434, 435 fm-ward differences, 99, 327 function approximation, 260, 272

    COJlValli'DCe, 405 difficulties, 262 neural netwods, 264

    function fitting, (;I)

    game,201 games,278 Gauss elimination, 165 Gauss-Siedel algorithm, 178 genetic algorithm, 117 global optimum, 82, 320 global Vlriables, 436 gradient descent, 70, 79

    COJlValll'nce, 324

    H-Leaming average reward, 255 discounted reward, 254

    heuristic, 213, 416 hyper-plane, 64

    identity matrix, 10 immediate reward, 154 increasing sequence, 300 incremental, 72 independent, 42 induction, 294 infinity nonn, 290 inventory control, 410 inverse function method, 36 irreducible, 146

    jackknife, 90 jmnps, 136

    kanban,421 kernel methods, 266 Kim-Nelson method, 109

    SIMULATION-BASED OPTIMIZATION

    LAST, 123 layer

    hidden, 74 input, 74 output, 74

    learning, 228 learning automata

    control optimization, 277 parametric optimization, 123

    learning rate, 280 limiting probabilities, 143 linesr programming, 48, 201 Lipschitz condition, 324, 380 local optinmm, 81, 96,320 loss function, 48

    mapping, 292 Marlrov chain, 139 Markov decision problems, 148,277

    convergence, 344 reinforcement learning, 211

    Marlrov process, 136 mathematical progtauuning, 4 MATLAB,430 max norm, 10, 290 MCAT,277 MOPs, 148,271

    convergence, 344 rei~tlearning,211

    mean,24 rnetnmyless, 137 meta-heuristics, 110 metmlOdel, 58 model-based, 69, 93, 409 model-building algorithms, 253, 426 model-free, 93 modified policy iteration, 197 monotonicity, 346 nmltiple comparisons, 107 nmtalion, 119 nmtually exclusive, 17

    n-step ttansition probabilities, 140 n-tuple, 11, 289 natural process, 184 nearest neighbots, 265 neighbcr, 111 neightxmood, 303 Nelder-Mead, 4, 94 neural networks, 69, 264

    backpropagation, 73 codes,464 gradient descent, 326 linesr, 69 non-linear, 73

    neuro-dynamic progt'IIDllllin, 211, 271 Neuro-RSM, (;I)

  • INDEX

    neuron,69,269 nodes, 74 noise due to simulation, 338 non-derivative methods, 104 non-expansive mappings, 380 non-linear programming, 48, 70, 94 non-terminating, 43 normalization, 279 nonned vector spaces, 291 norms, 10,290 notation, 9

    matrix, 10 product, 9 sequence, 11 sets, 11 sum,9 vector, 10

    objective function, 48, 155 off-line, 229 on-line, 229 ordinary differential equations, 380 overbook, 412

    parametric optimization, 4, 47,48 continuous,94 discrete, 106

    partial derivatives, 319 PON,126 performance metric, 155 phase, 113 plane, 63 pmf,21,23 pointers, 435 policy iteration, 195

    average reward MOPs, 163 convergence proof for average reward

    case,372 convergence proof for discounted case,

    357 discounted reward MOPs, 173 SMOPs,189

    preventive maintenance, 416 probability, 16 probability density function, 23 probability mass function, 21

    Q-factor, 217 boundedness, 394 definition, 218 Policy iteration, 235 Value iteration, 219

    Q-Leaming convergence, 392 model-building, 257 steps in algorithm, 225 worked-out example, 231

    Q-P-Leaming average reward MOPs, 244 convergence, 400

    553

    discounted reward MOPs, 237 incomplete evaluation of average reward,

    402 semi-Markov decision problems, 250

    quality control, 426 queue, 134

    R-Leaming, 241 random process, 133 random system, 30 random variable, 15 random variables

    continuous, 22 discrete, 21

    range, 291 ranking and selection, 107 regression, 265, 269

    linear, 60, 63, 69 non-linear, 66 piecewise, 65

    regular Markov chain, 142, 334 Reinforcement Learning, 53

    asynchronous convergence, 383 average reward MOPs, 238 convergence, 379 discounted reward MOPs, 224 finite convergence, 397 introduction, 211 SMOPs,245 synchronous convergence, 382

    Relative Q-Learning, 239 convergence, 397 model-building, 258

    relative value iteration, 168, 179 Relaxed-SMAIIT, 243, 249 renewal reward theorem, 187 renewal theory, 419 replication, 42 response, 279 response surface method, 1, 57, 58 revenue management, 412 Reward-Inaction Scheme, 280 Rinott method, 108 Robbins-Monro algorithm, 217, 220, 221 RSM, 57,58

    scalar, 9 seed, 34, 42, 241 Semi-Markov decision problems

    average reward OP, 186 convergence, 379 definition, 182 discounted reward OP, 194 learning automata, 277 reinforcement learning, 248

  • 554

    sequence, 11, 297 sets,11 sigmoid, 76 simulated annealing

    algorithm, 111 convecgence, 333

    simulation, 32 simulation packages, 433 simultaneous perturbation, 4, 101, 328 SMAIU, 242, 248 SMOPs

    average reward OP, 186 convecgence, 379 definition, 182 discoonted reward OP, 194 learning automata, 277 reinforcement learning, 248

    span seminonn, 180 spill-over effect, 263 standard deviation, 25 state, 29, 51, 133 state aggregation, 260 static optimization, 4 stationary point, 319 step size, 95, 102, 125, 223 stochastic game, 201 stochastic optimization, 2, 47 stochastic process, 133 stochastic system, 30 straight line, 60 strong law of large numbers, 27 sup-norm, 290 supply chain, 423 supply chain management, 423 symmetric neighborhoods, 334 system, 1, 3-5, 51, 133

    definition, 29

    SIMULATION-BASED OPTIMIZATION

    tabu search, 119 mutations, 119 tabu list, 119

    Taylor's thecrem, 320, 329 technology, 3 temperature, 113 tenninating,43 thresholding, 76 TPM,l39, 152,214 transfer line, 420 transformation, 11, 292 transition probability matrix, 139 transition reward matrix, 154 transition time matrix, 183 transpose, 10 trial-and-emll', 228 TRM,l54 TTM,l83

    uniformization, 183, 193, 196

    validation, 89 valuei~tion, 178,195

    Average reward MOPs, 165 convergence for average reward case, 379 convergence proof for discounted case,

    363 discounted reward MOPs, 175 SMDPs,l91

    vector,9,288,289 vector spaces, 288

    Widrow-Hoff algorithm, (I}

    yield management, 412