Computers as Intellectual Peers in Scientific Research

One of the most exciting futurist notions is a machine that can think like a human. Although we are not presently able to have true discourse with a computer, we know that many pieces must come together for this to happen.

Consider a scenario in which two people, Person A and Person B, wish to engage in a discussion on a particular topic. What is required of Person A to have a meaningful conversation with Person B? Without question, Person A must have an adequate understanding of the language that will be spoken by Person B; otherwise Person A may have difficulty parsing words and sentence structure. Moreover, Person A will need to be able to explain his/her own thoughts using a language that Person B will be able to understand. This preliminary understanding is not enough for a meaningful conversation, however – Person A needs to be able to comprehend the meaning of Person B’s words as well.

Here is an interesting challenge: given this scenario, tell me what is required for Person A to extract the meaning of what Person B is communicating. Once you have identified the specific components, how can you model them and define relationships between them in such a way that a computer can use your design to understand what Person B says to it?

This is clearly not a trivial task. For decades, researchers in a number of fields related to Linguistics and Artificial Intelligence have been working on problems that can contribute to formal models of communication and thought processes. In my time as a graduate student so far, I have enjoyed developing a deep interest in these areas. For the past year, I have been surveying the areas of Knowledge Representation, Question Answering, and Cognitive Computing with the intention of directing my research towards work in these areas. I’m driven by the idea that a computer can be something of an intellectual peer that assists in scientific research, and I believe that work at the intersection of these fields will lead to advancement towards this goal.

Interning at GE Global Research (GRC) has been an invaluable experience for me.  Under the guidance of excellent mentors in the Knowledge Discovery Lab, I’ve enriched my sphere of understanding to cover concepts from Natural Language Processing (NLP), Information Extraction (IE), and the incorporation of user feedback to modify the behavior or results of a program. This summer, I have been working on part of a knowledge extraction project in the Healthcare domain. Today, radiologists have access to more patient data than ever before. So much, in fact, that it is can be difficult for the doctors to identify the subset of the data that is relevant to making an effective diagnosis. The project vision involves using knowledge extraction techniques so that radiologists can make an informed diagnosis. My internship work has focused on using linguistic analysis to discover relationships between mentions of body parts and imaging modalities in patient records.

To get acquainted with the work, I spent my first weeks learning about the project, its major goals, and the work that already had been done for the project. I also read and discussed recommended papers related to the research area, two of which I included in my Ph.D. candidacy survey. Learning about different aspects of NLP and approaches to the creation of Information Extraction systems significantly influenced my perspective on how a computer might converse with a person.

In the next phase, I learned how to work with open source natural language processing tools, in particular, ClearNLP. After discovering that the out-of-the-box ClearNLP tool takes a very long time to load models before it can process a single document, we created a web service that loads models once at startup and to support batch processing of documents.

The bulk of the work has involved designing, implementing, and refining components of a prototype system that takes a set of patient records and returns sections of text that are relevant to the information need. Additionally, the approach incorporates feedback to adjust to the information needs of the radiologist. It’s been very cool seeing this system come together, and by next week there will even be a nice interface that we’ll use to demonstrate the work.

I’ve been excited to work here every day. I’ve been in the company of extremely bright, motivated, and fun scientists in a lab environment that is supportive, friendly, and encouraging. I’ve seen interesting seminars by a variety of labs at the research center, and I’ll even be giving my own presentation on my last day. My internship experience at GRC has been truly excellent and I’m looking forward to using everything that I’ve learned as my academic work and research progresses.

1 Comment

  1. james Jernigan

    Use if library science might be helpful here. Computers can search large databases much faster than we can.