臺灣博碩士論文加值系統

English | Mobile

免費會員登入| 註冊

功能切換導覽列

訪客IP：216.73.216.214

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

孫昆正

研究生(外文):

Kun-Cheng Sun

論文名稱:

發展不平衡語意分類之研究

論文名稱(外文):

A Study of Developing the Imbalanced Sentiment Classification

指導教授:

陳隆昇

指導教授(外文):

Long-Sheng Chen

學位類別:

碩士

校院名稱:

朝陽科技大學

系所名稱:

資訊管理系碩士班

學門:

電算機學門

學類:

電算機一般學類

論文種類:

學術論文

論文出版年:

2012

畢業學年度:

101

語文別:

中文

論文頁數:

124

中文關鍵詞:

決策樹、支持向量機、隱含語意索引、特徵選取、田口方法、不平衡語意分類

外文關鍵詞:

Decision Tree、Support Vector Machines、Latent Semantic Index、Feature Selection、Taguchi Method、Imbalanced Semantic Classification

相關次數:

被引用:3
點閱:449
評分:
下載:13
書目收藏:0

目錄
第一章緒論 1
1.1 研究背景 1
1.2 研究動機 3
1.3 研究目的 6
1.4 研究架構 6
1.5 研究步驟 8
第二章文獻探討 10
2.1 不平衡語意分類問題 10
2.1.1 語意分類 10
2.1.2 不平衡語意分類 11
2.1.3 小結 18
2.2維度縮減方法 19
2.2.1 特徵選取 19
2.2.2 特徵擷取 27
2.3 田口方法 29
2.4 機器學習方法 36
2.4.1 支持向量機 36
2.4.2 決策樹 38
第三章研究方法 41
3.1 應用田口方法找出關鍵因子 41
3.2 平衡類別特徵 45
3.3 BCF結合LSI二階段方法 51
第四章實驗結果與分析 54
4.1 田口方法之實驗結果 54
4.1.1 資料來源與工具 54
4.1.2 資料前處理 55
4.1.3 評估指標 57
4.1.4 實驗結果與分析 59
4.2 BCF+LSI之實驗結果 64
4.2.1 資料來源與工具 64
4.2.2 資料前處理 65
4.2.3 實驗結果 67
第五章結論與未來研究方向 114
5.1 研究結論 114
5.2 未來研究方向 116
參考文獻 118
圖目錄
圖1 1 不平衡語意分類問題 3
圖1 2 研究架構圖 7
圖1 3 研究步驟 9
圖2 1 處理不平衡問題相關統計 15
圖2 2 特徵型態分佈圖 23
圖2 3 Relief方法之計算原理 25
圖2 4 奇異值分解 29
圖2 5 產品/製程參數圖 31
圖2 6 直交表之符號定義 33
圖2 7 因子效果圖範例 35
圖2 8 SVM超平面示意圖 37
圖3 1 田口方法流程 44
圖3 2 BCF流程圖 47
圖3 3 BCF結合LSI之流程 53
圖4 1 因子效果圖 63
圖4 2 Review centre評論網站五星等評價系統 64
圖4 3 傳統不平衡資料處理方法之實驗結果(SVM) 70
圖4 4 傳統不平衡資料處理方法之實驗結果(DT) 70
圖4 5 傳統不平衡資料處理方法之實驗結果-調整Cost(SVM) 72
圖4 6 傳統不平衡資料處理方法之實驗結果-調整Cost(DT) 72
圖4 7 手機評論之實驗結果(SVM) 75
圖4 8 手機評論之實驗結果-調整Cost(SVM) 78
圖4 9 手機評論之實驗結果(DT) 81
圖4 10 手機評論之實驗結果-調整Cost(DT) 84
圖4 11 iPhone評論之實驗結果(SVM) 87
圖4 12 iPhone評論之實驗結果-調整Cost(SVM) 90
圖4 13 iPhone評論之實驗結果(DT) 93
圖4 14 iPhone評論之實驗結果-調整Cost(DT) 96
圖4 15 相機評論之實驗結果(SVM) 99
圖4 16 相機評論之實驗結果-調整Cost(SVM) 102
圖4 17 相機評論之實驗結果(DT) 105
圖4 18 相機評論之實驗結果-調整Cost(DT) 108

表目錄
表2 1 處理不平衡問題相關研究 13
表2 2 特徵選取方法使用於不平衡文本分類之相關研究 20
表2 3 詞彙與類別關係表 22
表2 4 L8(27)直交表 33
表3 1 實驗因子與水準 42
表3 2 Sign指標計算範例 49
表3 3 Sign-IG計算範例 50
表3 4 詞彙文件矩陣範例 51
表4 1 田口方法之資料來源 55
表4 2 田口方法之原始實驗資料 55
表4 3 田口方法之前處理結果 56
表4 4 混淆矩陣 58
表4 5 田口方法之實驗結果 60
表4 6 田口實驗之回應值與SN比(GM指標) 62
表4 7 SN比之ANOVA表 62
表4 8 因子貢獻度與ANOVA表 62
表4 9 實驗資料 65
表4 10 資料集之特徵集合 66
表4 11 平衡後特徵數 67
表4 12 傳統不平衡資料處理方法之實驗結果 69
表4 13 傳統不平衡資料處理方法之實驗結果-調整Cost 71
表4 14 手機評論之實驗結果(SVM) 73
表4 15 手機評論之實驗結果-調整Cost(SVM) 76
表4 16 手機評論之實驗結果(DT) 79
表4 17 手機評論之實驗結果-調整Cost(DT) 82
表4 18 iPhone評論之實驗結果(SVM) 85
表4 19 iPhone評論之實驗結果-調整Cost(SVM) 88
表4 20 iPhone評論之實驗結果(DT) 91
表4 21 iPhone評論之實驗結果-調整Cost(DT) 94
表4 22 相機評論之實驗結果(SVM) 97
表4 23 相機評論之實驗結果-調整Cost(SVM) 100
表4 24 相機評論之實驗結果(DT) 103
表4 25 相機評論之實驗結果-調整Cost(DT) 106
表4 26分類效能最佳與次佳之方法 109
表4 27 傳統方法之訓練時間 110
表4 28 手機評論之訓練時間(SVM) 110
表4 29 手機評論之訓練時間(DT) 111
表4 30 iPhone評論之訓練時間(SVM) 111
表4 31 iPhone評論之訓練時間(DT) 112
表4 32 相機評論之訓練時間(SVM) 112
表4 33 相機評論之訓練時間(DT) 113

中文文獻
[1]曾韋榮 (2006），結合潛在語意檢索及資訊粒化於資料探勘，碩士論文，國立臺北科技大學商業自動化與管理研究所，臺北。
[2]張麗新、王家廞、趙雁南、楊澤紅 (2004)，「基於Rel ief 的組合式特徵選擇」，復旦學報(自然科學版)，第43卷，第5期，第893-898頁。
[3]廖闊、付建勝、楊萬麟 (2010)，「改進的ReliefF 算法用於雷達距離像目標識別」，電子測量與儀器學報，第24卷，第9期，第831-836頁。
英文文獻
[1]Abbasi, A., and Chen, H. (2005), “Applying authorship analysis to extremist-group web forum messages,” IEEE Intelligent Systems, vol. 20, no. 5, pp. 67–75.
[2]Abbasi, A., Chen, H., and Salem, A. (2008), “Sentiment analysis in multiple languages: feature selection for opinion classification in web forums,” ACM Transactions on Information Systems, vol. 26, no. 3, pp. 12:1-12:34.
[3]Arun Kumar, M., and Gopal, M. (2010), “A comparison study on multiple binary-class SVM methods for unilabel text categorization,” Pattern Recognition Letters, vol. 31, no. 11, pp. 1437-1444.
[4]Chang, C. C. and Lin, C. J. (2001), “LIBSVM: A Library for Support Vector Machines,” Software, available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[5]Chaovalit, P., and Zhou, L. (2005), “Movie review mining: a comparison between supervised and unsupervised classification approaches,” In Proceedings of the 38th Hawaii International Conference on System Sciences, pp.1-9.
[6]Chen, E., Lin, Y., Xiong, H., Luo, Q., and Ma, H. (2010), “Exploiting probabilistic topic models to improve text categorization under class imbalance,” Information Processing and Management, vol. 47, no. 2, pp. 202-214.
[7]Chen, L. S., Liu, C. H., and Chiu, H. J. (2011), “A neural network based approach for sentiment classification in the blogosphere,” Journal of Informetrics, vol. 5, no. 2, pp. 313-322.
[8]Chien, W. T., and Tsai, C. S. (2003), “The investigation on the prediction of tool wear and the determination of optimum cutting conditions in machining 17-4PH stainless steel,” Journal of Materials Processing Technology, vol. 140, no. 1-3, pp. 340-345.
[9]Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. (1990), “Indexing by latent semantic analysis,” Journal of the American Society for Information Science, vol. 41, no. 6, pp. 391-407.
[10]Das Mohapatra, P.K., Maity, C., Rao, R.S., Pati, B.R., and Mondal, K.C. (2009), “Tannase production by bacillus licheniformis KBR6: optimization of submerged culture conditions by taguchi DOE methodology,” Food Research International, vol. 42, no. 4, pp. 430-435.
[11]Fern&;aacute;ndez, A., del Jesus, M. J., and Herrera, F. (2009), “On the influence of an adaptive inference system in fuzzy rule based classification systems for imbalanced data sets,” Expert Systems with Application, vol. 36, pp. 9805-9812.
[12]Garc&;iacute;a, V., S&;aacute;nchez, J. S., and Mollineda, R. A. (2011), “On the effectiveness of preprocessing methods when dealing with different levels of class imbalance,” Knowledge-Based Systems, vol. 25, no. 1, pp. 13-21.
[13]Gunn, S. R. (1998), “Support vector machines for classification and regression,” Technical Report, University of Southampton, UK.
[14]Gomez, J. C., and Moens, M. F. (2012), “PCA document reconstruction for email classification,” Computational Statistics and Data Analysis, vol. 56, no. 3, pp. 741-751.
[15]Huang, Y., McCullagh, P. J., Black, N. D. (2009), “An optimization of ReliefF for classification in large datasets,” Data &; Knowledge Engineering, vol. 68, no. 11, pp. 1348-1356.
[16]Hong, C. W. (2011), “Using the Taguchi method for effective market segmentation,” Expert Systems with Applications, doi:10.1016/j.eswa.2011.11.040.
[17]Kira, K., and Rendell, L. A. (1992), “The feature selection problem traditional methods and a new algorithm,” In Proceedings of 9th National Conference on Artificial Intelligence, pp. 129-134.
[18]Kononenko, I. (1994), “Estimating attributes: analysis and extensions of Relief,” In Proceedings of the European Conference on Machine Learning, pp. 171-182.
[19]Kontostathis, M., and Pottenger, W. M. (2006), “A framework for understanding latent semantic indexing (LSI) performance,” Information Processing and Management, vol. 42, no. 1, pp. 56-73.
[20]Keshtkar, F., and Inkpen, D. (2009), “Using sentiment orientation features for mood classification in blogs,” IEEE International Conference on Natural Language Processing and Knowledge Engineering.
[21]Li, B., Xu, S., and Zhang, J. (2007), “Enhancing clustering blog documents by utilizing author/reader comments,” In Proceedings of the 45th Annual Southeast Regional Conference, pp. 94-99.
[22]Li, S., Zhou, G., Wang, Z., Lee, S. Y. M., and R. Wang (2011), “Imbalanced sentiment classification,” Proceedings of the 20th ACM International Conference on Information and Knowledge Management , pp. 2469-2472.
[23]Liu, B., Hu, M., and Cheng, J. (2005), “Opinion observer: analyzing and comparing opinions on the web,” In Proceedings of the 14th International Conference on World Wide Web, pp. 342-351.
[24]Liu, Y., Loh, H. T., and Sun, A., (2009), “Imbalanced text classification: a term weighting approach,” Expert Systems with Applications, vol. 37, no. 1, pp. 690-701.
[25]Liu, Y., Yu, X., Huang, J. X., and An, A. (2010), “Combining integrated sampling with SVM ensembles for learning from imbalanced datasets,” Information Processing and Management, vol. 47, no. 4, pp. 617-631.
[26]Meng, J., Lin, H., and Yu, Y. (2011), “A two-stage feature selection method for text categorization,” Computers and Mathematics with Applications, vol. 62, no. 7, pp. 2793-2800.
[27]Mallick, K., and Bhattacharyya, S. (2012), “Uncorrelated local maximum margin criterion: an efficient dimensionality reduction method for text classification,” Procedia Technology, vol. 4, pp. 370-374.
[28]Ogura, H., Amano, H., and Kondo, M. (2010), “Distinctive characteristics of a metric using deviations from Poisson for feature selection,” Expert Systems with Applications, vol. 37, no. 3, pp. 2273–2281.
[29]Ogura, H., Amano, H., and Kondo, M. (2011), “Comparison of metrics for feature selection in imbalanced text classification,” Expert Systems with Applications, vol. 38, no. 5, pp. 4978–4989.
[30]O’Keefe, T., and Koprinska, I. (2009), “Feature selection and weighting methods in sentiment analysis,” In proceedings of the 14th Australasian Document Computing Symposium.
[31]Quinlan, J. R. (1993), “C4.5: programs for machine learning,” Morgan kaufmann, San Mateo, CA.
[32]Sebastiani, F. (2002), “Machine learning in automated text categorization,” ACM Computing Surveys, vol. 34, no. 1, pp. 1–47.
[33]Simeon, M., and Hilderman, R. (2008), “Categorical proportional difference: A feature selection method for text categorization,” In Proceedings of the 17th Australasian Data Mining Conference.
[34]Stamatatos, E. (2008), “Author identification: using text sampling to handle the class imbalance problem,” Information Processing and Management, vol. 44, no. 2, pp. 790-799.
[35]Sun, A., Lim, E. P., and Liu, Y. (2009), “On strategies for imbalanced text classification using SVM: a comparative study,” Decision Support Systems, vol. 48, no. 1, pp. 191-201.
[36]Tan, S., and Zhang, J. (2008), “An empirical study of sentiment analysis for chinese documents,” Expert Systems with Applications, vol. 34, no. 4, pp. 2622-2629.
[37]Tang, H., Tan, S., and Cheng, X. (2009), “A survey on sentiment detection of reviews,” Expert Systems with Applications, vol. 36, no. 7, pp. 10760-10773.
[38]Tong, L. I., Chang, Y. C., and Lin, S. H. (2011), “Determining the optimal re-sampling strategy for a classification model with imbalanced data using design of experiments and response surface methodologies,” Expert Systems with Applications, vol. 38, no. 4, pp. 4222-4227.
[39]Uğuz, H. (2011), “A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm,” Knowledge-Based Systems, vol. 24, no. 7, pp. 1024-1032.
[40]Uysal, A., and Gunal, S. (2012), “A novel probabilistic feature selection method for text classification” Knowledge-Based Systems, doi: 10.1016/j.knosys.2012.06.005.
[41]van Halteren, H. (2004), “Linguistic profiling for author recognition and verification,” Proceedings of the 42nd annual meeting of the association for computational linguistics, pp. 199–206.
[42]Vapnik, V. N. (1995), The Nature of Statistical Learning Theory, Springer-Verlag.
[43]Weiss, G., and Provost, F. (2003), “Learning when training data are costly: the effect of class distribution on tree induction,” Journal of Artificial Intelligence Research, vol. 19, no. 1, pp. 315-354.
[44]Whitelaw, C., Garg, N., and Argamon, S. (2005), “Using appraisal groups for sentiment analysis,” In proceedings of the ACM 14th Conference on Information and Knowledge Management, pp. 625-631.
[45]Wu, C. H., Chuang, Z. J., and Lin, Y. C. (2006), “Emotion recognition from text using semantic labels and separable mixture models,” ACM Transactions on Asian Language Information Processing, vol. 5, no. 2, pp. 165-182.
[46]Ye, Q., Zhang, Z., and Law, R. (2009), “Sentiment classification of online reviews to travel destinations by supervised machine learning approaches,” Expert Systems with Applications, vol. 36, no. 3, pp. 6527-6535.
[47]Yusoff, N., Ramasamy, M., and Yusup, S. (2011), “Taguchi’s parametric design approach for the selection of optimization variables in a refrigerated gas plant,” Chemical Engineering Research and Design, vol. 89, no. 6, pp. 665-675.
[48]Yang, J., Liu, Y., Zhu, X., Liu, Z., Zhang, X. (2012), “A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization,” Information Processing and Management, vol. 48, no. 4, pp. 741-754.
[49]Zhang, J. Z., Chen, J. C., and Kirby, E. D. (2007), “Surface roughness optimization in an end-milling operation using the Taguchi design method,” Journal of Materials Processing Technology, vol. 184, no. 1-3, pp. 233-239.
[50]Zheng, Z., Wu, X., and Srihari, R. (2004), “Feature selection for text categorization on imbalanced data,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 80-89.
[51]Zhang, W., Yoshida, T., and Tang, X. (2011), “A comparative study of TF*IDF, LSI and multi-words for text classification,” Expert Systems with Applications, vol. 38, no. 3, pp. 2758-2765.

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	結合潛在語意檢索及資訊粒化於資料探勘
2.	應用資料探勘技術於理財促銷-以國內某金控銀行為例
3.	支援向量機之最佳化參數與屬性篩選之分散式資料探勘系統—以粒子群最佳化演算法為基礎
4.	應用資料探勘技術提升電話行銷成交率之研究—以國內某郵購公司為例
5.	新興分類技術於行為評等模式之建構
6.	基因演算法為基礎之決策樹於信用卡使用者之違約分類預測-以台灣某銀行信用卡為例
7.	遺傳演算法應用於支援向量機之參數調整與屬性篩選
8.	以粒子群演算法為基礎之決策樹演算法於人員離職預測-以某工業股份有限公司為例
9.	應用群體啟發式演算法與集成架構改進決策樹之分類效能
10.	資料探勘於個人信用貸款審核之應用
11.	使用支援向量機預測台灣期貨指數
12.	利用迴歸理論在入侵偵測系統的特徵選取之研究
13.	特徵向量與分類器之組合於X光乳房微鈣化良惡性判讀
14.	以電子公文處理系統運用為基礎導入空軍辦公室自動化之研究
15.	應用分散式搜尋法與決策樹於基金促銷決策之研究

簡易查詢 | 進階查詢 |