臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.77) 您好！臺灣時間：2025/06/25 01:33

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
紙本論文
QR Code

本論文永久網址:

研究生:

陳秀涵

研究生(外文):

Hsiu Han, Chen

論文名稱:

概念性類別標題詞自動擷取的評估

論文名稱(外文):

Evaluation of Generic Title Generation for Clustered Documents

指導教授:

曾元顯

指導教授(外文):

Yuen Hsien, Tseng

學位類別:

碩士

校院名稱:

輔仁大學

系所名稱:

圖書資訊學系

學門:

傳播學門

學類:

圖書資訊檔案學類

論文種類:

學術論文

論文出版年:

2006

畢業學年度:

語文別:

中文

論文頁數:

中文關鍵詞:

文件歸類、文件聚類

外文關鍵詞:

Document Clustering、Automatic Labeling、Hypernym Search、WordNet、Correlation Coefficient

相關次數:

被引用:0
點閱:617
評分:
下載:0
書目收藏:2

目錄

第一章緒論…………………………………………………………………………1
第一節研究背景………………………………………………………………1
第二節研究動機………………………………………………………………1
第三節研究目的………………………………………………………………2
第四節預期研究貢獻…………………………………………………………2
第二章文獻探討……………………………………………………………………3
第一節文件自動歸類方法……………………………………………………3
第二節標題詞選取方法………………………………………………………8
第三節 WordNet……………………………………………………………… 9
第四節 InfoMap………………………………………………………………12
第五節中央研究院中英雙語知識本體詞網……………………………… 16
第六節路透社文件集……………………………………………………… 18
第七節 FJU-CTC文件集 …………………………………………………… 20
第八節美國專利文件……………………………………………………… 22
第三章研究設計………………………………………………………………… 28
第一節實驗設計…………………………………………………………… 28
第二節實驗工具…………………………………………………………… 30
第三節測試文件集………………………………………………………… 36
第四節評估方法…………………………………………………………… 37
第五節研究限制…………………………………………………………… 38
第四章研究結果與評估………………………………………………………… 40
第一節專利文件集………………………………………………………… 40
第二節路透社文件集……………………………………………………… 44
第三節 FJU-CTC文件集 …………………………………………………… 46
第五章結論與後續研究方向建議 ………………………………………………52
第一節結論………………………………………………………………… 52
第二節後續研究…………………………………………………………… 55
參考書目 ……………………………………………………………………………57

圖目錄

圖 1：階層凝聚式歸類……………………………………………………………… 8
圖 2：WORDNET使用者查詢介面示意圖…………………………………………… 11
圖 3：WORDNET查詢結果示意圖 ………………………………………………… 12
圖 4：INFOMAP相關詞彙查詢介面………………………………………………… 14
圖 5：INFOMAP詞彙網路查詢介面………………………………………………… 15
圖 6：INFOMAP詞彙網路查詢結果………………………………………………… 15
圖 7：SINICA BOW詞彙網路查詢介面 …………………………………………… 17
圖 8：SINICA BOW詞彙網路查詢結果 …………………………………………… 18
圖 9：文件自動歸類流程 ………………………………………………………… 29
圖 10：實驗流程圖………………………………………………………………… 29
圖 11：NSC專利文件歸類結果視覺化…………………………………………… 53

表目錄

表 1：REUTERS-21578（YANG版本）最大與最小的十類文件統計表…………… 20
表 2：FJU-CTC最大與最小的十類文件統計表…………………………………… 22
表 3：三種選詞方法用在三個文件集的評估結果 ……………………………… 41
表 4：專利文件概念性標題詞評估結果………………………………………… 42
表 5：由MI選出的類別特徵詞…………………………………………………… 44
表 6：路透社文件集概念性標題詞評估結果…………………………………… 45
表 7：FJU-CTC文件集標題詞與概念性標題詞評估結果………………………… 47
表 8：NSC專利文件最後歸類結果………………………………………………… 54

一、中文部分

(一)圖書

林傑斌、劉明德、陳湘，「資料採掘與 OLAP 理論與實務」，台北：文魁書局，2002。

(二)期刊

陳光華，"資訊檢索查詢之自然語言處理"，中國圖書館學會會報，第 57 期， 85年 12月，頁 141 - 153 。
曾元顯，"分類不一致對文件自動分類效果的影響"，大學圖書館，9卷1期，2005年3月，頁 11-13
簡立峰，"尋易系統（Csmart）與中文智慧型資訊檢索"，資訊傳播與圖書館學， 3卷 2期， 85年 12月，頁28-37。

二、英文部分

(一)圖書

Wordnet: An Electronic Lexical Database , pp. xviii-xix

(二)期刊

Aggrawal, C. C., & Yu, P. S., "Finding Generalized Projected Clusters in High Dimensional Spaces," Proceedings of the 2000 ACM SIGMOD international conference on Management of data, Dallas, New York: ACM Press, 2000, 70-81.
Cutting, D. R. Karger, J. O. Pedersen, and J. W. Tukey. “Scatter/gather: A cluster-based approach to browsing large document collections,” Proceedings of the 15th ACM-SIGIR Conference, 1992, pp. 318-329.
Dubes, R. C. & Jain, A. K., "Algorithms for Clustering Data," New Jersey: Prentice Hall, 1988.
Frakes, W. B. & Baezay, R., "Information Retrieval: Data Structures and Algorithms," New Jersey: Prentice-Hall, 1992.

Franca Debole and Fabrizio Sebastiani, “An Analysis of the Relative Hardness of Reuters-21578 Subsets” to appear in Journal of the American Society for Information Science and Technology.
Griffith, A., Luckhurst, H. C. & Willet, P., "Using Inter-Document Similarity Information in Document Retrieval Systems," Journal of the American Society for Information Sciences, Vol. 37, No. 1, 1986, pp. 3-11.
Ido Dagan and Ronen Feldman, "Keyword-based browsing and analysis of large document ets," Proceedings of the Symposium on Document Analysis and Information Retrieval (SDAIR-96), Las Vegas, Nevada, 1996.
Joseph B. Kruskal, "Multidimensional Scaling and Other Methods for Discovering Structure," pp. 296-339 in "Statistical Methods for Digital Computers" edited by Kurt Enslein, Anthony Ralston, and Herbert S. Wilf, Wiley: New York, 1977.
Krista Lagus, Samuel Kaski, and Teuvo Kohonen, “Mining Massive Document Collections by the WEBSOM Method,” Information Sciences, Vol 163/1-3, pp. 135-156, 2004.
Lee-Feng Chien, "PAT-Tree Based Keyword Extraction for Chinese Information Retrieval" CM SIGIR 1997.
Liu, T., Liu, S. & Chen, Z., 2003, "An Evaluation on Feature Selection for Text lustering," Proceedings of the Twentieth International Conference on Machine Learning, Washington, CA: AAAI Press, pp. 488-495.
Marti A. Hearst and Jan O. Pedersen, "Reexamining the Cluster Hypothesis: Scatter/
Gather n Retrieval Results," Proceedings of the 19th ACM-SIGIR Conference, 1996, pp. 76-84.
Mehran Sahami, Salim Yusufali, and Michelle Q. W. Baldonaldo, “SONIA: A Service for Organizing Networked Information Autonomously,” Proceedings of the 3rd ACM Conference on Digital Libraries, 1998, pp. 200-209.
Michele Banko, Vibhu O. Mittal, and Michael J. Witbrock, “Headline Generation Based on Statistical Translation,” ACL 2000.
Oren Zamir and Oren Etzioni, “Web document clustering: a feasibility demonstration,” Proceedings of the 21st ACM-SIGIR Conference, 1998, pp. 46-54.
Paul E. Kennedy, Alexander G. Hauptmann, "Automatic title generation for EM," Proceedings of the 5th ACM Conference on Digital Libraries, 2000, pp.
Ron Bekkerman, Ran El-Yaniv, Yoad Winter, Naftali Tishby, “On Feature Distributional Clustering for Text Categorization,” Proceedings of the 24th ACM-SIGIR Conference, 2001, pp.146-153.
Russell Swan and James Allan, "Automatic Generation of Overview Timelines," Proceedings of the 23rd ACM-SIGIR Conference, 2000, pp. 49-56.
Salton, G. & Buckley, C., "Term Weighting Approaches in Automatic Information Retrieval," Journal of Information Proceeding and Management, Vol. 24, No. 5, 1988, pp. 513-524.
Salton, G., "Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer," New York: Addison-Wesley, 1989.
Steinbach, M., Karypis, G. & Kumar, V., "A Comparison of Document Clustering Techniques," Technical Report 00-034, Computer Science and Engineering, University of Minnesota, 2000.
Wei,C.P.,Hu P.J.&Dong,Y.X.,"Managing Document Categories in E-Commerce Environments:an Evolution-Based Approach,,"European Journal of Information Systems,Vol.11,No.3,2002,pp.208-222
William B. Frakes and Ricardo Baeza-Yates, Information Retrieval: Data Structure and Algorithms, Prentice Hall, 1992.

(三)網路資源

Antti Arppe, "Term Extraction from Unrestricted Text,",1995 <http://www.lingsoft.fi/doc/nptool/term-extraction.html>
David D. Lewis, “Reuters-21578 text categorization test collection, Distribution 1.0” README file (v 1.2), 1997 <http://www.research.att.com/~lewis/>
Document Understanding Conferences<http://www-nlpir.nist.gov/projects/duc/.>
Jean Godby, "Two Techniques for the Identification of Phrases in Full Text," <http://www.oclc.org/oclc/research/publications/review94/part1/twotech.htm>
Jen-Nan Chen, Jyun-Sheng, Chang and Huey-Chyun Chen, "Using Word Segmentation Model for Compression of Chinese Text"
<http:// nlplab.cs.nhtu.edu.tw/~mathis/own/html/PAPER/JNL/95/cpcol/ CPCOL95.htm>
Mathis H. C. Chen, Tsong-Yi Tseng, Jason J. S. Chang, "Automatic Generation of Indices or Chinese Books," <http://nlplab.cs.nthu.edu.tw/~mathis/own/html/ PAPER/JNL/96/cpcol/BookIdx.htm>

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

無相關論文

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室

臺灣博碩士論文加值系統

功能切換導覽列

詳目顯示

Error