International Conference on Asian Language Processing 2008

Chiang Mai, Thailand, November 12-14, 2008


Invited Speakers

Dr Jian Su
Research Scientist, Leader,Information Extraction and Text Mining Group
Institute for Infocomm Research, Singapore

Dr Jian Su received her B.Sc degree in Electronics from Sichuan University, China in 1990, M.Sc and PhD degrees in electrical & electronic engineering from South China University of Technology in 1993 and 1996 respectively. She was a Research Assistant from 1994 to 1995 at City University of Hong Kong, and an intern student at Centre de Recherche en Informatique de Nancy, France in 1995. She joined Institute for Infocomm Research (I2R), formerly known as Institute of System Sciences, where she established herself in the areas of Information Extraction, Coreference Resolution, (Bio)Text Mining. Dr Su has published intensively in natural language processing (NLP) and bioinformatics conferences and journals, including 12 papers in ACL Annual Meetings, and one journal article in Computational Linguistics in recent years. Dr Su is active in professional services for the computational linguistics community. She has served as Editor / Member of Editorial Board for two international journals. She is program chair of ACL-IJCNLP09, publication chair of ACL 2007 and IJCNLP 2005, program chair of LBM 2007,Workshop Organizer of LREC2008 Workshop: Building and evaluating resources for biomedical text mining, and PC members of numerous NLP conferences including ACL, IJCNLP, COLING, and EMNLP.

Dr Su also led her team to achieve top performances in various information extraction / text mining benchmarking such as BioCreAtIve. She was responsible for the effort to establish the largest co-reference annotation corpus that has 2000 Medline abstracts and 24 biomedical full papers from GENIA collection. She has been the Principal Investigator in multiple technology deployments including BioMedical Information Management, Homeland Security Intelligence Gathering, Legal / Standard Enforcement, Business Intelligence Gathering.

Title: Advance Information Extraction Technology for Information Mangagement


Dealing with the overwhelming amount of unstructured information present within an organization and in its external sources such as the Web is a crucial challenge for organizations today. This talk will give an introduction to the work on advancing information extraction technology for the effective exploitation of such unstructured text data at Institute for Infocomm Research (I2R).

In the work, the accuracy, adaptability and scalability issues have been addressed on Name Entity Recognition, Co-reference Resolution and Relationship extraction so far. In doing so, various linguistic knowledge and phenomena have been explored on annotated corpus, raw text and web. Supervised, semisupervised and unsupervised machine learning approaches have been used in order to come up with effective solutions to the tasks. Furthermore, various information management applications have being built for competitive intelligence gathering, security information gathering, biomedical information management, and standard enforcement.

Assoc. Prof. Asanee Kawtrakul
Deputy Executive Director
NECTEC, Thailand 

Prof. Kawtrakul is an Associate Professor in Language and Knowledge Engineering Technologies and Deputy Executive Director of NECTEC. She obtained a B.Eng and M.Eng in Electrical Engineering (honors) from Kasetsart University in Thailand and a D.Eng in Information Engineering from Nagoya University, Japan. Her current research focuses primarily on language technologies to support knowledge acquisition and management and utilization. She has led various large-scale research projects, including I-KNOW: Information and Knowledge Extraction from Unstructured Thai Document. She also leads the Natural Language Processing and Intelligent Information System Technology Laboratory (NAIST) at Kasetsart University. She initiated various collaboration effort with FAO, UN agencies and other international institution such as NII, National Informatic Institute of Japan (under BIOCASTER Project), IRIT, Institut de Recherche en Informatique de Toulouse, of France. She has published more than 90 papers and books. She also chaired and organized various international NLP conferences and training courses including the International Symposium on National Language Processing  (SNLP) and the Language, Artificial Intelligence and Computer Science for Natural Language Processing applications (LAICS-NLP) summer schools.

Tiltle: A Unified Knowledge Engineering with  Language  Engineering for Effectively Knowledge Management :
CyberBrain as a Case Study


Accumulation of knowledge and management on certain topics is crucial for building an Intelligence Society.  Knowledge Sources are divided into two different categories: Tacit Knowledge  and Explicit Knowledge. Tacit Knowledge that people carry in their minds, such as the lessons learned from solving past problems and valuable information from previous experiences, are invaluable for knowledge sharing. With the development of the Internet and the World Wide Web, the enormous amount of  explicit knowledge including  best practices or experience on focus areas can be found and shared through writing research reports, visiting blogs, and even participating in Wikipedia.  However, these sources of valuable knowledge are scattered over many different sources including human minds, and they come in many different formats. Moreover, desired information/knowledge is more difficult to access from scattered sources since search engines return ranked retrieval lists that offer little or no information on the semantic relationships among scattered information, and even when such information is found, it is often redundant or in excess volume since there is no content filtering or correct answer indicated. Accordingly, as we move beyond the concept of simple information retrieval and simple database queries, automatic content aggregation, question answering, and knowledge visualization become more important.

This talk introduces a framework called CyberBrain that unifies  Knowledge Engineering and  Language Engineering for effectively knowledge management.  CyberBrain is a dynamic structure, interconnecting organization and communities. It behaves as a natural ecosystem  for collecting and processing including  extracting and aggregating the knowledge from  both people minds and unstructured documents  on the Internet. By exploiting the semantic links between problems, methods for solving them and the people who solve them, knowledge services could be provided as a “one-stop service”.

This challenging platform needs both complex natural language processing, including deep semantic relation interpretation, and the collaborative intelligence which is the participation of the right stakeholders to create the community knowledge pool and contribute to both annotate problem-solving solutions scattered on the web  and verify the ones that extracted by the question-answering system. Moreover, task-oriented ontology or  semantic-based knowledge aggregating  and organizing are needed for shortening the time it takes to consume the knowledge.

Prof Hui Wang
Department of Chinese Studies, National University of Singapore

Dr Wang Hui is an Assistant Professor at the Department of Chinese Studies, National University of Singapore (NUS). She received a PhD in Chinese Linguistics from Peking University, China in 2002. Her research focuses on Lexical semantics and corpus linguistics.  She is in charge of Chinese language projects funded from a variety of sources, including National University of Singapore, Ministry of Education of Singapore, Institute for Infocomm Research of Singapore and Chinese National Fund Program of Social Sciences, etc. Dr Wang has published more than 20 journal articles and 3 monographs. She is the author of A Syntagmatic Study on Chinese Noun Senses (Beijing: Peking University Press, 2004), the Structure of Chinese Language: Characters, Words and Sentences (co-author), (New Jersey, Singapore: Global Publishing Co. 2004), and the Grammatical Knowledge Base of Contemporary Chinese -- A Complete Specification (co-author), (Beijing: Tsinghua University Press. 1998. 2nd editions, 2003). She has served on the program committees of many conferences in the field of nature language processing (NLP) and Chinese linguistics. She is the program chair of the 9th Chinese Lexical Semantics Workshop (CLSW-2008) which was held in July 14-17, 2008 in Singapore. In addition, Dr Wang has served as Executive Editor / Member of Editorial Board for two international journals:  Journal of Chinese Language and Computing (2004-2007) and Studies on Global Chinese (2006-present).

Title: Extraction of Word Collocations in Singapore Mandarin Chinese


Collocation, i.e. the sequences of certain words which habitually co-occur, plays an essential part in human language. For example, in English you say strong wind but heavy rain. It would not be normal to say *heavy wind or *strong rain. For students, choosing the right collocation will make his speech and writing sound much more natural, more native speaker-like. For linguists, collocation is often used to distinguish word senses. For computational purposes, collocation is of a type usable for various Natural Language Processing (NLP) applications. Similar to English, collocation runs through the whole of Chinese language. No piece of natural spoken or written Chinese is totally free of collocation. However, no matter how convinced learners and experts are in principle of the importance of collocation, it is difficult for them to put these principles into practice without the benefit of a large scale dictionary of collocations. This presentation introduces a dictionary-rule-statistics combination way to automatically extract collocations from huge Corpus. The specific aim is to build a large collocation knowledge-base for the 10,000 most frequently used words in Singapore Mandarin.


Copyright © 2007-2008 Chinese and Oriental Languages Information Processing Society