臺灣博碩士論文加值系統

English | Mobile

免費會員登入| 註冊

功能切換導覽列

訪客IP：216.73.216.158

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

鄭丞劭

研究生(外文):

CHENG, CHENG-SHAO

論文名稱:

馬可夫決策程序為基礎之阿卡西記錄學習方法及其應用

論文名稱(外文):

Markov Decision Process Based Akashic Record Learning Method and Its Application

指導教授:

鄭錦聰

指導教授(外文):

JENG, JIN-TSONG

口試委員:

張朝陽、莊鎮嘉、蕭志清

口試委員(外文):

CHANG, JAU-YANG、CHUANG, CHEN-CHIA、HSIAO, CHIH-CHING

口試日期:

2019-01-28

學位類別:

碩士

校院名稱:

國立虎尾科技大學

系所名稱:

資訊工程系碩士班

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2019

畢業學年度:

107

語文別:

英文

論文頁數:

中文關鍵詞:

馬可夫決策過程、Q學習、增強式學習、遊戲、閾值

外文關鍵詞:

Markov Decision Process、Q-learning、Reinforcement learning、Game、Threshold

相關次數:

被引用:0
點閱:226
評分:
下載:4
書目收藏:0

1.Bellman.R.(1957), A Markovian Decision Process, Journal of Mathematics and Mechanics, pp.679-684.
2.Howard.R.A.(1960), Dynamic Programming and Markov Processes, MIT Press, Cambridge.
3.Watkins.C.J.C.H. and Dayan.P.(1992), Q-learning, Machine learning, Vol. 8, No.3, pp.279–292.
4.Bertsekas.D.P.(2012), Dynamic Programming and Optimal Control: Approximate Dynamic Programming, Vol.II, 4th edition, Athena Scientific.
5.Bertsekas.D.P. and Tsitsiklis.J.N.(1996), Neuro-Dynamic Programming, Athena Scientific.
6.Artasanchez.A.(2018), 9 Reasons why your machine learning project will fail, from http://www.kdnuggets.com.
7.Rachel.M.(2018), Why Microsoft's teen chatbot, Tay, said lots of awful things online, MIT Technology Review.
8.Silver.D., Huang.A., Maddison.C.J., Guez.A., Sifre.L, Driessche.G., Schrittwieser.J., Antonoglou.I., Panneershelvam.V., Lanctot.M., Dieleman.S., Grewe.D., Nham.J., Kalchbrenner.N., Sutskever.I, Lillicrap.T., Leach.M., Kavukcuoglu.K., Graepel.T. and Hassabis.D.(2016), Mastering the game of Go with deep neural networks and tree search, Nature, No.529, pp.484-489.
9.François-Lavet.V., Henderson.P., Islam.R., Bellemare.M.G. and Pineau.J.(2018), An Introduction to Deep Reinforcement Learning, Foundations and Trends® in Machine Learning, Vol.11,No.3–4, pp.219–354.
10.Mnih.V., Kavukcuoglu.K., Silver.D. et. al(2015), Human-level control through deep reinforcement learning, Nature, No.518, pp.529-533.
11.Hasselt.H., Guez.A., and Silver.D.(2016), Deep Reinforcement Learning with Double Q-Learning, Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), pp.2094-2100.
12.Wang.Z., Schaul.T., Hessel.M., Hasselt.H., Lanctot.M., Freitas.N.(2015), Dueling Network Architectures for Deep Reinforcement Learning, ArXiv e-prints.
13.Baird.L.(1995), Residual algorithms: Reinforcement learning with function approximation, Machine Learning: Proceedings of the Twelfth International Conference, pp.30–37.
14.Krizhevsky.A., Sutskever.I., and Hinton.G.(2012), Image net classiﬁcation with deep convolutional neural networks, Advances in Neural Information Processing Systems, No.25, pp.1106–1114.
15.Mnih.V., Kavukcuoglu.K., Silver.D., Graves.A., Antonoglou.I, Wierstra.D. and Riedmiller.M.(2013), Playing Atari with Deep Reinforcement Learning, Cornell University.
16.Nair.A., Srinivasan.P., Blackwell.S., Alcicek.C., Fearon.R., Maria.A.D., Panneershelvam.V., Suleyman.M., Beattie.C., Petersen.S., Legg.S., Mnih.V., Kavukcuoglu.K., and Silver.D.(2015), Massively parallel methods for deep reinforcement learning, Deep Learning Workshop, ICML.
17.Riedmiller.M.(2005), Neural ﬁtted Q iteration - ﬁrst experiences with a data efﬁcient neural reinforcement learning method, Proceedings of the 16th European Conference on Machine Learning, pp.317–328, Springer.
18.Sallans.B. and Hinton.G.E.(2004), Reinforcement learning with factored states and actions, Journal of Machine Learning Research, No.5, pp.1063–1088.
19.Watkins.C.J.C.H. and Dayan.P.(1992), Q-learning, Machine learning, Vol.8, pp.279–292.
20.Lange.S. and Riedmiller.M.(2010), Deep auto-encoder neural networks in reinforcement learning, Neural Networks (IJCNN), The 2010 International Joint Conference on, pp.1–8. IEEE.
21.Mnih.V.(2013), Machine Learning for Aerial Image Labeling. PhD thesis, University of Toronto.
22.Norris.J.R.(1998), Markov chains, Cambridge University Press.
23.Hasselt.H.(2010), Double Q-learning, Advances in Neural Information Processing Systems, No.23, pp.2613–2621.
24.Auer.P., Cesa-Bianchi.N. and Fischer.P.(2002), Finite-time analysis of the multiarmed bandit problem, Machine learning, Vol.47, pp.235–256.
25.Kaelbling.L.P., Littman.M.L. and Moore.A.W.(1996), Reinforcement learning: A survey, Journal of Artiﬁcial Intelligence Research, Vol.4, pp.237–285.
26.Sutton.R.S.(1988), Learning to predict by the methods of temporal differences, Machine learning, Vol.3, pp.9–44.
27.Pollack.J.B. and Blair.A.D.(1996), Why did td-gammon work, Advances in Neural Information Processing Systems, Vol.9, pp.10–16.
28.Tsitsiklis.J.N. and Roy.B.V.(1997), An analysis of temporal-difference learning with function approximation, Automatic Control, IEEE Transactions on, Vol.42, pp. 674–690.
29.Lazaric.A., Markov Decision Processes and Dynamic Programming, from http://researchers.lille.inria.fr/~lazaric/Webpage/MVA-RL_Course14_files/notes-lecture-02.pdf.
30.Schaefer.S.(2002), Mathematical Recreations, from http://www.mathrec.org/old/2002jan/solutions.html

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	應用馬可夫決策過程進行台股期貨日內交易策略之研究
2.	應用馬可夫決策過程進行台灣股市投資分析之研究
3.	使用增強式學習法改善一個簡易的臺灣股價指數期貨當沖交易系統
4.	適應性模糊化類神經網路及其應用
5.	以加強式學習建構機器人行為融合演算法
6.	多重工作者之競爭與合作-以排程為例
7.	第五代行動通訊雲端無線接取網路之動態資源管理與運營最佳化
8.	基於個體推薦系統之Q-learning性能改良研究
9.	應用增強式學習模糊控制之四旋翼機避障系統
10.	增強式學習於深度神經網路之仿人類遊戲控制策略應用
11.	一種利用Q-learning學習個人化推薦策略的方法
12.	應用AI強化學習於建立股票交易代理人之研究-以台積電股票為例
13.	增強式學習建構臺灣股價指數期貨之交易策略
14.	以增強式學習為基礎之兩輪機器人控制設計
15.	以加強式學習實現適應性視覺伺服於機器手臂控制

簡易查詢 | 進階查詢 |