Information Extraction - Facilitating Smart Data Management

A wealth of information is hidden within unstructured text located in company documents, e-mails, newspaper articles, web pages, etc. This information is best exploited in a structured or relational form, which is more suitable for searching and integration with the relational databases, and for text mining.

Accordia’s Information Extraction System produces a structured representation of the information that is buried in unstructured text documents: free-text documents written in natural language, and semi-structured pages.

There are three major components of our Information Extraction System:

  • The Named Entity Recognizer (NER), which finds and classifies:
  • (1) the names of people, organizations, and geographic locations
    (2) the date and time expressions, percentages, and money amounts
  • The Co-Reference Resolution (CoRe) module, which discovers
  • identity relations between entities in and across documents.
  • The Relation Extraction (RE) module, which finds relations between
  • recognized entities.

Each of our components is based on state-of-art, machine-learning algorithms, which increase the accuracy and speed for extracting relevant information.

Also, the components may be adapted to meet the specific demands and needs of the client to recognize other classes of entities and/or relations between them.