Extracting human protein information from MEDLINE using a full-sentence parser

Busa-Fekete, Róbert and Kocsor, András: Extracting human protein information from MEDLINE using a full-sentence parser. Acta cybernetica, (18) 3. pp. 391-402. (2008)

[img] Cikk, tanulmány, mű
Busa_2008_ActaCybernetica.pdf

Download (233kB)

Abstract

Today, a fair number of systems are available for the task of processing biological data. The development of effective systems is of great importance since they can support both the research and the everyday work of biologists. It is well known that biological databases are large both in size and number, hence data processing technologies are required for the fast and effective management of the contents stored in databases like MEDLINE. A possible solution for content management is the application of natural language processing methods to help make this task easier. With our approach we would like to learn more about the interactions of human genes using full-sentence parsing. Given a sentence, the syntactic parser assigns to it a syntactic structure, which consists of a set of labelled links connecting pairs of words. The parser also produces a constituent representation of a sentence (showing noun phrases, verb phrases, and so on). Here we show experimentally that using the syntactic information of each abstract, the biological interactions of genes can be predicted. Hence, it is worth developing the kind of information extraction (IE) system that can retrieve information about gene interactions just by using syntactic information contained in these text. Our IE system can handle certain types of gene interactions with the help of machine learning (ML) methodologies (Hidden Markov Models, Artificial Neural Networks, Decision Trees, Support Vector Machines). The experiments and practical usage show clearly that our system can provide a useful intuitive guide for biological researchers in their investigations and in the design of their experiments.

Item Type: Article
Event Title: Conference for PhD Students in Computer Science, 5., 2006, Szeged
Journal or Publication Title: Acta cybernetica
Date: 2008
Volume: 18
Number: 3
Page Range: pp. 391-402
ISSN: 0324-721X
Language: angol
Uncontrolled Keywords: Természettudomány, Informatika
Additional Information: Bibliogr.: p. 401-402.; Abstract
Date Deposited: 2016. Oct. 15. 12:25
Last Modified: 2018. Jun. 05. 14:31
URI: http://acta.bibl.u-szeged.hu/id/eprint/12826

Actions (login required)

View Item View Item