基于关键词提取的搜索结果聚类研究

秦鹏,张华平,基于关键词提取的检索结果聚类研究,第五届全国信息检索学术会议(CCIR2009), 上海,2009-11


摘 要:信息检索的结果往往庞杂,缺乏有效地加工整理,对搜索结果进行聚类是一种普遍的需求,而传
统的文本聚类方法不能提供有效的类别标签,且速度较慢,不适用于在线搜索结果的聚类。本文针对性地
提出了基于关键词提取的搜索结果聚类算法,基本思想为:结合信息检索的特点,将词频(TF)、词性和互
信息等特征进行融合计算,综合实现关键词的提取;最终以筛选出的关键词作为基础特征,实现层次聚类。
经实验验证,该方法P@10 达到80%,用户满意度达到85%。实验结果表明,基于关键词提取的搜索结果聚
类算法优于目前已知的所有系统。
关键词:关键词提取;搜索结果聚类;信息检索;


Abstract: Web Search results clustering is userd to organize search results which is complicated and poorly organizted ,
and make it easy for user to browse the results. Web search results is required widly. Traditional clustering techniques
are inadequate since they can not generate clusters with highly readable names and they process so slowly that can not
meet the requirement. A multi-feature integrated model is developed to evaluate of the keyword , which combines the
term frequence, POS, mutual information features together. The improved keyword extraction method takes into
account of the feature of search result. According to the experiments, it can be concluded that the method, which
P@10 reached 80% and customer satisfaction reached 85%, is better than known system.
Keywords: keyword extraction; search result clustering; information retrieval


SearchClustering.pdf(190 KB)

You May Also Like

About the Author: nlpir

发表回复