Go to contents

(사)한국바이오칩학회 The Korean BioChip Society

BT+IT+NT융합시대의 리더 : 한국바이오칩학회

학술지 열람

논문 정보 자세히 보기

Info. Vol.4 - No.4 (2010.12.20)
Title Document clustering of MEDLINE abstracts based on non-negative matrix factorization using local confidence assessment
Authors Byeong-Chul Kang1, Zee-Won Sur2, Chulhwan Park3 & Man-gi Cho4
Institutions 1Insilicogen Inc., #909 Chomdan Venture ValleySuwon, Gosaek-dong, Gweonseon-gu, Gyeonggi-do 441-813, Korea
2Bayer AG, Largenfeld, Germany 40764
3Department of Chemical Engineering, Kwangwoon University, Kwangwoon-gil 26, Nowon-gu, Seoul 139-701, Korea
4Department of Food and Biotechnology, Dongseo University, San 69-1 Jurye-2-dong, Sasang-gu, Busan 617-716, Korea
Correspondence and requests for materials should be addressed to C. Park and M.-G. Cho ( chpark@kw.ac.kr, mgcho@gdsu.dongseo.ac.kr)
Abstract A document search in PubMed is certainly one of the most exhaustive ways for finding information related to any biological or biomedical topic. However, a keyword search in this database that is not specific enough will provide a number of results that exceeds by far an amount of documents the user can read through one by one. In this work, we therefore present a new document clustering tool called Med-Clus for bioinformaticians in order to make a keyword search result from PubMed more concise by grouping such a set of documents into clusters. MedClus contains two modules. First, a pre-clustering module that creates the data matrix. This matrix contains term-document frequencies according to the TF*IDF method and optional weights. These weights are given by comparing the term list with the MeSH terms contained in the related MEDLINE abstracts. Second, it contains a clustering module, which is based on a Non-negative Matrix Factorization algorithm that finds an approximate factorization of the data matrix. This application was tested in different experiments evaluating its performance and reliability. Based on these results, a list of recommended ranges for crucial parameters such as the number of clusters was edited in order to constitute an user assistance for the application of Med-Clus. Finally, some results were analyzed by scientists from the field of medicine and biology, who evaluated the relevance of the terms and the existence of a relation between them. MedClus is a tool that is able to re-structure the result list of a keyword search for documents in PubMed. This is done by extracting terms before and finding latent semantics during the clustering process. Also, it optionally applies weights to terms that also appear as MeSH terms in at least one of the MEDLINE abstracts. Therefore, it helps users to refine a search result in PubMed via term-based clustering in order to economize time and efforts. At this development stage, the software is suitable for experienced users such as bioinformaticians, database administrators and developers. Also Web service for Semantic Toxicogenomics Knowledgebase, available at http://stkb2. labkm.net, has applied this technology to provide comprehensive and accurate relations between chemical and toxicological contexts.
Keyword Text-mining, Literature clustering, Non-negative matrix factorization, Local onfidence assessment, Bioinformatics
PDF File
# 2010년도 발행분 부터는 Springer 의 BioChip Journal 페이지에서 전문을 열람하실 수 있습니다.
# 학회회원 로그인 후 [ Springer BioChip Journal 열람하기 ] 버튼을 클릭하시면 새창으로 열립니다.
→ 전체 목록 보기

(사)한국바이오칩학회 (The Korean BioChip Society)
전화 : 070-7767-9855
전자우편 : biochip@biochip.or.kr
전화 : 070-7767-9867
전자우편 : biochip2@biochip.or.kr
주소 : (우)06130, 서울특별시 강남구 테헤란로7길 22 과학기술회관 신관 804호     대표자 : 심상준, 고유번호 : 206-82-65403
팩스 : 02-921-9856 , 웹사이트 : http://www.biochips.or.kr Copyright © The Korean BioChip Society. All Rights Reserved.