|
1.Bellman.R.(1957), A Markovian Decision Process, Journal of Mathematics and Mechanics, pp.679-684. 2.Howard.R.A.(1960), Dynamic Programming and Markov Processes, MIT Press, Cambridge. 3.Watkins.C.J.C.H. and Dayan.P.(1992), Q-learning, Machine learning, Vol. 8, No.3, pp.279–292. 4.Bertsekas.D.P.(2012), Dynamic Programming and Optimal Control: Approximate Dynamic Programming, Vol.II, 4th edition, Athena Scientific. 5.Bertsekas.D.P. and Tsitsiklis.J.N.(1996), Neuro-Dynamic Programming, Athena Scientific. 6.Artasanchez.A.(2018), 9 Reasons why your machine learning project will fail, from http://www.kdnuggets.com. 7.Rachel.M.(2018), Why Microsoft's teen chatbot, Tay, said lots of awful things online, MIT Technology Review. 8.Silver.D., Huang.A., Maddison.C.J., Guez.A., Sifre.L, Driessche.G., Schrittwieser.J., Antonoglou.I., Panneershelvam.V., Lanctot.M., Dieleman.S., Grewe.D., Nham.J., Kalchbrenner.N., Sutskever.I, Lillicrap.T., Leach.M., Kavukcuoglu.K., Graepel.T. and Hassabis.D.(2016), Mastering the game of Go with deep neural networks and tree search, Nature, No.529, pp.484-489. 9.François-Lavet.V., Henderson.P., Islam.R., Bellemare.M.G. and Pineau.J.(2018), An Introduction to Deep Reinforcement Learning, Foundations and Trends® in Machine Learning, Vol.11,No.3–4, pp.219–354. 10.Mnih.V., Kavukcuoglu.K., Silver.D. et. al(2015), Human-level control through deep reinforcement learning, Nature, No.518, pp.529-533. 11.Hasselt.H., Guez.A., and Silver.D.(2016), Deep Reinforcement Learning with Double Q-Learning, Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), pp.2094-2100. 12.Wang.Z., Schaul.T., Hessel.M., Hasselt.H., Lanctot.M., Freitas.N.(2015), Dueling Network Architectures for Deep Reinforcement Learning, ArXiv e-prints. 13.Baird.L.(1995), Residual algorithms: Reinforcement learning with function approximation, Machine Learning: Proceedings of the Twelfth International Conference, pp.30–37. 14.Krizhevsky.A., Sutskever.I., and Hinton.G.(2012), Image net classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, No.25, pp.1106–1114. 15.Mnih.V., Kavukcuoglu.K., Silver.D., Graves.A., Antonoglou.I, Wierstra.D. and Riedmiller.M.(2013), Playing Atari with Deep Reinforcement Learning, Cornell University. 16.Nair.A., Srinivasan.P., Blackwell.S., Alcicek.C., Fearon.R., Maria.A.D., Panneershelvam.V., Suleyman.M., Beattie.C., Petersen.S., Legg.S., Mnih.V., Kavukcuoglu.K., and Silver.D.(2015), Massively parallel methods for deep reinforcement learning, Deep Learning Workshop, ICML. 17.Riedmiller.M.(2005), Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method, Proceedings of the 16th European Conference on Machine Learning, pp.317–328, Springer. 18.Sallans.B. and Hinton.G.E.(2004), Reinforcement learning with factored states and actions, Journal of Machine Learning Research, No.5, pp.1063–1088. 19.Watkins.C.J.C.H. and Dayan.P.(1992), Q-learning, Machine learning, Vol.8, pp.279–292. 20.Lange.S. and Riedmiller.M.(2010), Deep auto-encoder neural networks in reinforcement learning, Neural Networks (IJCNN), The 2010 International Joint Conference on, pp.1–8. IEEE. 21.Mnih.V.(2013), Machine Learning for Aerial Image Labeling. PhD thesis, University of Toronto. 22.Norris.J.R.(1998), Markov chains, Cambridge University Press. 23.Hasselt.H.(2010), Double Q-learning, Advances in Neural Information Processing Systems, No.23, pp.2613–2621. 24.Auer.P., Cesa-Bianchi.N. and Fischer.P.(2002), Finite-time analysis of the multiarmed bandit problem, Machine learning, Vol.47, pp.235–256. 25.Kaelbling.L.P., Littman.M.L. and Moore.A.W.(1996), Reinforcement learning: A survey, Journal of Artificial Intelligence Research, Vol.4, pp.237–285. 26.Sutton.R.S.(1988), Learning to predict by the methods of temporal differences, Machine learning, Vol.3, pp.9–44. 27.Pollack.J.B. and Blair.A.D.(1996), Why did td-gammon work, Advances in Neural Information Processing Systems, Vol.9, pp.10–16. 28.Tsitsiklis.J.N. and Roy.B.V.(1997), An analysis of temporal-difference learning with function approximation, Automatic Control, IEEE Transactions on, Vol.42, pp. 674–690. 29.Lazaric.A., Markov Decision Processes and Dynamic Programming, from http://researchers.lille.inria.fr/~lazaric/Webpage/MVA-RL_Course14_files/notes-lecture-02.pdf. 30.Schaefer.S.(2002), Mathematical Recreations, from http://www.mathrec.org/old/2002jan/solutions.html
|