臺灣博碩士論文加值系統

English | Mobile

免費會員登入| 註冊

功能切換導覽列

訪客IP：216.73.216.194

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

李治平

研究生(外文):

Li, Chih-Ping

論文名稱:

應用區域敏感雜湊對文獻進行分類之研究

論文名稱(外文):

A Language Neutral Text Classification Method Using Locality Sensitive Hashing

指導教授:

陳宗天

指導教授(外文):

Chen, Tsung-Teng

口試委員:

陳宗天、曾元顯、蔡瑞煌

口試委員(外文):

Chen, Tsung-Teng、Tseng, Yuen-Hsien、Tsaih, Rua-Huan

口試日期:

2016-07-04

學位類別:

碩士

校院名稱:

國立臺北大學

系所名稱:

資訊管理研究所

學門:

電算機學門

學類:

電算機一般學類

論文種類:

學術論文

論文出版年:

2016

畢業學年度:

104

語文別:

中文

論文頁數:

中文關鍵詞:

區域敏感雜湊、空間向量模型、文獻分類

外文關鍵詞:

Locality Sensitive Hashing、Document Classification、Vector Space Model

相關次數:

被引用:0
點閱:199
評分:
下載:1
書目收藏:1

英文參考文獻
Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175-185.
Aslam, J. A., & Pavlu, V. (2007). Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. Paper presented at the European Conference on Information Retrieval.
Baoli, L., Qin, L., & Shiwen, Y. (2004). An adaptive k-nearest neighbor text categorization strategy. ACM Transactions on Asian Language Information Processing (TALIP), 3(4), 215-226.
Blekanov, И., & Korelin, V. (2015). Hierarchical clustering of large text datasets using Locality-Sensitive Hashing. Proceedings of the International Workshop on Applications in Information Technology (IWAIT-2015).
Boyack, K. W., & Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389-2404.
Broder, A. Z. (1997, June). On the resemblance and containment of documents. In Compression and Complexity of Sequences 1997. Proceedings (pp. 21-29). IEEE.
Broder, A. Z., Charikar, M., Frieze, A. M., & Mitzenmacher, M. (1998, May). Min-wise independent permutations. In Proceedings of the thirtieth annual ACM symposium on Theory of computing (pp. 327-336). ACM.
Buhler, J. (2001). Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, 17(5), 419-428.
Carletta, J. (1996). Assessing agreement on classification tasks: the kappa statistic. Computational linguistics, 22(2), 249-254.
Cha, S.-H. (2007). Comprehensive survey on distance/similarity measures between probability density functions. City, 1(2), 1.
Chen, H., Chung, Y. M., C., Marshall, R., & Yang, C. C. (1998). An intelligent personal spider (agent) for dynamic Internet/Intranet searching. Decision Support Systems, 23(1), 41-58.
Chowdhury, G. (2010). Introduction to modern information retrieval: Facet publishing.
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1), 21-27.
Dutta, D., Guha, R., Jurs, P. C., & Chen, T. (2006). Scalable partitioning and exploration of chemical spaces using geometric hashing. Journal of chemical information and modeling, 46(1), 321-333.
Feldman, R., Fresko, M., Kinar, Y., Lindell, Y., Liphstat, O., Rajman, Y., Schler & Zamir, O. (1998, September). Text mining at the term level. In European Symposium on Principles of Data Mining and Knowledge Discovery (pp. 65-73). Springer Berlin Heidelberg.
Gwet, K. (2002). Inter-rater reliability: dependency on trait prevalence and marginal homogeneity. Statistical Methods for Inter-Rater Reliability Assessment Series, 2, 1-9.
Han, E. H. S., Karypis, G., & Kumar, V. (2001, April). Text categorization using weight adjusted k-nearest neighbor classification. In Pacific-asia conference on knowledge discovery and data mining (pp. 53-65). Springer Berlin Heidelberg.
Haveliwala, T., Gionis, A., & Indyk, P. (2000). Scalable Techniques for Clustering the Web (Extended Abstract). In: Third International Workshop on the Web and Databases (WebDB 2000), May 18-19, 2000, Dallas, Texas,.
Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8), 2554-2558.
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of classification, 2(1), 193-218.
Hull, D. A. (1996). Stemming algorithms: A case study for detailed evaluation. JASIS, 47(1), 70-84.
Indyk, P., & Motwani, R. (1998, May). Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing (pp. 604-613). ACM.
Jaccard, P. (1901). Distribution de la Flore Alpine: dans le Bassin des dranses et dans quelques régions voisines: Rouge. Sciences Naturelles, 1901, 241-272.
Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets: Cambridge University Press.
Levandowsky, M., & Winter, D. (1971). Distance between sets. Nature, 234(5323), 34-35.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Paper presented at the Proceedings of the fifth Berkeley symposium on mathematical statistics and probability.
Oprişa, C., Checicheş, M., & Năndrean, A. (2014). Locality-sensitive hashing optimizations for fast malware clustering. Paper presented at the Intelligent Computer Communication and Processing (ICCP), 2014 IEEE International Conference on.
Park, D. C., El-Sharkawi, M., Marks, R., Atlas, L., & Damborg, M. (1991). Electric load forecasting using an artificial neural network. Power Systems, IEEE Transactions on, 6(2), 442-449.
Powers, D. (2007). Evaluation: From Precision, Recall and F Factor to ROC, Informedness, Markedness & Correaltion. Sch. Informatics Eng. Flinders.
Ravichandran, D., Pantel, P., & Hovy, E. (2005, June). Randomized algorithms and nlp: using locality sensitive hash function for high speed noun clustering. InProceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 622-629). Association for Computational Linguistics.
Salton, G., & McGill, M. J. (1986). Introduction to modern information Retrieval. New York, NY, USA: McGraw-Hill, Inc.
Salton, G., Wong, A., & Yang, C.-S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613-620.
Santos, J. M., & Embrechts, M. (2009). On the use of the adjusted rand index as a metric for evaluating supervised classification. Paper presented at the International Conference on Artificial Neural Networks.
Slaney, M., & Casey, M. (2008). Locality-sensitive hashing for finding nearest neighbors [lecture notes]. IEEE Signal Processing Magazine, 25(2), 128-131.
Sparck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1), 11-21.
Stehman, S. V. (1997). Selecting and interpreting measures of thematic classification accuracy. Remote sensing of Environment, 62(1), 77-89.
Sullivan, D. (2001). Document warehousing and text mining: techniques for improving business operations, marketing, and sales. John Wiley & Sons, Inc..
Wu, G., Boydell, O., & Cunningham, P. (2014). High-throughput, Web-scale data stream slustering. In Proceedings of the 4th Web Search Click Data workshop (WSCD 2014).
Yang, Y., & Pedersen, J. O. (1997, July). A comparative study on feature selection in text categorization. In ICML (Vol. 97, pp. 412-420).
Zhang, J., Song, R., Yu, W.-X., Xia, S.-P., & Hu, W.-D. (2005). Construction of hierarchical classifiers based on the confusion matrix and fisher's principle. Ruan Jian Xue Bao(Journal of Software), 16(9), 1560-1567.
中文參考文獻
江珅薇(2007)，相關學術論文集合關鍵詞擷取-學術領域自動命名，國立臺北大學資訊管理研究所。
曾有德(2008)，以 Web 2.0 概念建構自動化文件分群與內容相似性比對之研究，國立高雄第一科技大學資訊管理研究所。
黃馨儀(2015)，智識建構方法論之改進研究，國立臺北大學資訊管理研究所。
鄭宇傑(2015)，以核運算方法與LDA主題模型產生文字標籤之比較研究，國立臺北大學資訊管理研究所。
謝祥榆(2016)，應用區域敏感雜湊進行中文文獻分類之研究，國立臺北大學資訊管理研究所。

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	自動化文章敵意分級系統之初探研究
2.	中文文件分類研究－以IC設備業為例
3.	行職業描述自動分類之研究

簡易查詢 | 進階查詢 |