Hua-Ping Zhang, Qian Mo,He-Yang Huang,Structured POI data Extraction from Internet News,In Proceedings of the 4th International Universal Communication Symposium (IUCS 2010) in Beijing, China,2010.10,p115-120(特邀报告) |
Abstract: POI (Point of Interest) data is key resources for GPS application. Manual POI collection is expensive and time consuming. This paper presents a novel approach that automatically extracts structured POI data from Internet news articles. The procedure includes erasing noisy news document with POI linguistic features, making lexical analysis on the remaining texts using ICTCLAS2010, identifying time expression and the full name of POI location and organization, extracting the relationship between entities, and getting structured data given a POI event based on extraction modeling. The POI extraction model is computed with the term frequency and word distance, without any syntax analysis, scenario template or relationship induction. Consistency and validity check were employed to optimize result. Open testing with experiment conducted on 1,000 news articles, the precision is 97.30% and recall is 75.48%. The approach has been applied in industrial POI collection. POI oriented event extraction is effective.
Keywords:information extraction; extraction model;relation extraction;POI ICTCLAS2010
论文: POI Extraction.pdf(146 KB)
研究ppt: Structured POI data Extraction from Internet News.ppt(1.43 MB)