|
[1]D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel and D. Hassabis, "Mastering the game of Go with deep neural networks and tree search," Nature, vol. 529, pp. 484-503, 2016 [2]O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Y. Wu, R. Ring, D. Yogatama, D. Wünsch, K. McKinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, and D. Silver, “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, vol 575, 350–354, 2019 [3]D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V. Vanhoucke and S. Levine, “QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation,”arXiv: 1806.10293, 2018 [4]S. Kumar, “Balancing a CartPole System with Reinforcement Learning - A Tutorial,” arXiv: 2006.04938, 2020 [5]National Tsing Hua University PhD Shang-Hung Wu Official Website, http://www.cs.nthu.edu.tw/~shwu/ [6]Z. Wang, T. Schaul , M. Hessel, H. Hasselt , M. Lanctot, and N. Freitas, “Dueling Network Architectures for Deep Reinforcement Learning”, arXiv:1511.06581v3, 2016 [7]National Taiwan University Ph.D Hung-yi Lee Official website, https://speech.ee.ntu.edu.tw/~hylee/. [8]V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, ”Playing Atari with Deep Reinforcement Learning.” arXiv:1312.5602v1, 2013 [9]H. Hasselt, A. Guez , and D.Sliver, “Deep Reinforcement Learning with Double Q-learning.” arXiv:1509.06461v3,2015 [10]D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic Policy Gradient Algorithms”, http://proceedings.mlr.press/v32/silver14.pdf, 2014 [11]Pytorch Official website https://pytorch.org/ [12]Kingma, P. Diederik, and B. Jimmy, "Adam: A method for stochastic optimization." arXiv:1412.6980, 2014. [13]OpenAI Gym official website http://www.gymlibrary.ml/ [14]J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms.” arXiv:1707.06347v2, 2017 [15]V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning”, Nature volume 518, pages,529–533 ,2015 [16]V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Harley, T. P. Lillicrap, D. Silver, and K. Kavukcuoglu, “Asynchronous Methods for Deep Reinforcement Learning.” arXiv:1602.01783v2, 2016 [17]Audi A8 AI System introduction, https://audimediacenter-a.akamaihd.net/system/production/uploaded_files/9722/file/5a27b64d8fd9d654be67df0a70c84c0bb4f7f161/en_press_release_Audi_AI.pdf?1499418927&disposition=attachment [18]HONDA RECEIVES TYPE DESIGNATION FOR LEVEL 3 AUTOMATED DRIVING https://hondanews.eu/eu/en/cars/media/pressreleases/318975/honda-receives-type-designation-for-level-3-automated-driving [19]M. E. Moran, “Evolution of robotic arms”, https://www.researchgate.net/publication/225151458, 2007
|