Learning information extraction patterns
The rapid growth of online texts call for systems that can extract relevant information. Many information extraction systems have been developed using the knowledge engineering approach, which is often time-consuming, laborious, and of no portability. A more promising direction is to apply machine learning techniques to information extraction. A complete Information Extraction (IE) system, IEPlus, has been developed for exploring various design issues. Fine-grained semantic units were defined, and a strategy for semantic resolution was proposed in IEPlus. An enhancement for rule evaluation based on case frame matching was implemented in IEPlus. A rule firing strategy was also presented in IEPlus, which prioritizes the most specific rule in terms of the number of terms matched. Experiments on the Rental Ads domain demonstrated the effectiveness of the IEPlus system. IEPlus is highly flexible resulting from its object-oriented design, and has the capability of exploring various issues in information extraction system design.