Macquarie Home | Course Handbook | Library | Campus Map | Macquarie Contacts
Home page

Macquarie University ResearchOnline

Home
Add
-List Of Titles -Datasets for generic relation extraction

Please use this identifier to cite or link to this item: http://hdl.handle.net/1959.14/174202

OpenURL Link
53 Visitors 60 Hits 0 Downloads
Title
Datasets for generic relation extraction
Related
Natural language engineering, Vol. 18, Issue 1, (2012), p.21-59
DOI
10.1017/S1351324911000106
Publisher
Cambridge University Press
Date
2012
FoR/RFCD Code(s)
080100 Artificial Intelligence and Image Processing  200400 Linguistics  170200 Cognitive Sciences
Author/Creator
Hachey, B
Author/Creator
Grover, C
Author/Creator
Tobin, R
Description
A vast amount of usable electronic data is in the form of unstructured text. The relation extraction task aims to identify useful information in text (e.g. PersonW works for OrganisationX, GeneY encodes ProteinZ) and recode it in a format such as a relational database or RDF triplestore that can be more effectively used for querying and automated reasoning. A number of resources have been developed for training and evaluating automatic systems for relation extraction in different domains. However, comparative evaluation is impeded by the fact that these corpora use different markup formats and notions of what constitutes a relation. We describe the preparation of corpora for comparative evaluation of relation extraction across domains based on the publicly available ACE 2004, ACE 2005 and BioInfer data sets. We present a common document type using token standoff and including detailed linguistic markup, while maintaining all information in the original annotation. The subsequent reannotation process normalises the two data sets so that they comply with a notion of relation that is intuitive, simple and informed by the semantic web. For the ACE data, we describe an automatic process that automatically converts many relations involving nested, nominal entity mentions to relations involving non-nested, named or pronominal entity mentions. For example, the first entity is mapped from 'one' to 'Amidu Berry' in the membership relation described in 'Amidu Berry, one half of PBS'. Moreover, we describe a comparably reannotated version of the BioInfer corpus that flattens nested relations, maps part-whole to part-part relations and maps n-ary to binary relations. Finally, we summarise experiments that compare approaches to generic relation extraction, a knowledge discovery task that uses minimally supervised techniques to achieve maximally portable extractors. These experiments illustrate the utility of the corpora.
Description
39 page(s)
Subject Keyword
080100 Artificial Intelligence and Image Processing
Subject Keyword
200400 Linguistics
Subject Keyword
170200 Cognitive Sciences
Resource Type
journal article
Organisation
Macquarie University. Dept. of Computing

Identifier
http://hdl.handle.net/1959.14/174202
Identifier
ISSN:1351-3249
Identifier
mq-rm-2010004178
Language
eng
Reviewed
Reviewed
Save/E-mail Citation
Citation Format
E-mail Address
Subject
"Natural language engineering"
 
OR
  • Show All  
  • Show My Selections 
Advanced Search

Search

200400 Linguistics

Browse

  • By Title 
  • By Author/Creator 
  • By Department/Centre 
  • By Subject Keyword 
  • By Journal/Conference 
  • By FoR/RFCD codes 
  • By Resource Type 
  • By Date 

Highlights

  • Most Accessed Objects 
  • Recent Additions 
  • Pending Publications 
  • Author Profiles 

Resources

  • About ResearchOnline 
  • FAQ 
  • Open Access 
  • Open Access-FAQs 
  • Copyright 
  • Contribute 
  • Help 
  • Contact
  • Terms and Conditions 
Valid XHTML 1.0 Strict Powered by VITAL

Copyright Macquarie University | Privacy Statement | Accessibility Information

ABN 90 952 801 237 | CRICOS Provider No 00002J

Library Staff Sign In