Paula Chocrón: "We face great challenges such as scaling the tool to other languages"
Paula Chocrón is NLP & Data Science Team Lead at IOMED Medical Solutions. With a degree in Computer Science from the University of Buenos Aires and a PhD in artificial intelligence from the IIIA-CSIC, Paula has been with our team for two years. During this time, he has witnessed a process of revolution and growth in his department. This 2022, they face two great challenges: the translation of the tool into other languages and the automation of processes.
What benefits does Natural Language Processing (NLP) have for clinical research?
Natural language processing aims to extract information from unstructured data. The reports that doctors write about the patients they see are in natural language, in text. To extract data from this type of text, NLP methods are used, which allow analyzing large amounts of data in a very short time, and thus automate a process that is long, slow and hard and would involve a lot of human work. It allows us to extract data and ask questions about it.
What forecast do you make of the evolution of the NLP in the short term?
One of the most interesting developments of the current NLP is everything related to language models and “Transfer Learning”. Basically the idea is to train very large models that know about language in general, and then be able to use them for more specific tasks, taking advantage of this knowledge. It’s like teaching a model how Italian works, and then asking them to classify text in Italian. The interesting thing is that since you already know Italian, you need far fewer examples to learn how to do the specific task. It was a revolutionary development in NLP, and there is still much to explore in this direction.
How has the NLP Data Science department evolved since you came to IOMED?
We have grown a lot. When I entered we were two people, Álvaro Abella and me. Little by little, colleagues with different profiles have been incorporated; from more computer-oriented to more related to linguistics. This allowed us to grow the tool and its functionalities, so that we can extract more and more interesting information from the texts that we analyze. Now we are facing big challenges like scaling the tool to other languages.
What are the main challenges you face from the NLP & Data Science department in 2022?
We have two big challenges: new languages and automation of our tool. Our goal is to have more and more features, to detect more things. Until a few months ago this was something manual, we said “we need to identify if a text is dermatology or not” and we built a part of the tool. We are now working on automating these processes. If the tool has features to be built automatically, we can grow a lot, by adding parts of the tool in a simple and automatic way. Another of the great challenges is transferring the tool to other languages and we also want it to be as automatic as possible. Do not do it again in another language. To do this, we will share the basics of the tool. We will use transfer learning techniques to make the most of the data we have. For example, we are going to train models in Italian with the ones we already have in Spanish.
What do you like most about working at IOMED?
At IOMED there are constant challenges. There are daily challenges even in the smallest things, from designing code, to creating an entire tool. We built our NLP tool in a very innovative way, by betting on automation and the possibility of the tool building itself. In addition, working with the aim of improving clinical studies is very interesting.
What kind of professional profile are you looking for to work in your team?
We are looking for professionals with a technical background in Computer Science or Data Science with NLP or machine learning in general. They are profiles related to codes (e.g., programming) or with data (e.g., data analysis and reading corpus).