Our Natural Language Processing Systems can structure entities found in human free-text inputs and standardize relevant information contextually.
Natural Language Processing.
Our Natural Language Processing system represents a revolutionary advancement in the structuration and standardization of clinical data. With the ability to understand free-text inputs from clinical notes, our system can contextualize and comprehend the meaning behind medical information in human-written inputs. It's not just about keyword extraction, it's about a deep understanding of the information. To ensure high-quality, our double quality assurance process allows the attainment of high-quality data to build a data-driven healthcare ecosystem.
Our added value: NLP systems and QA processes.
Our technology uses advanced techniques such as Name Entity Recognition (NER) to identify and categorize entities like medical conditions, medications, procedures, and measurements embedded within a diverse array of medical texts.
Our technology understands the specific context surrounding medical entities. This proficiency is instrumental in providing accurate interpretations and a comprehensive understanding of an entity's implications within clinical notes or reports.
Our cutting-edge technology is capable of seamlessly assigning OMOP (Observational Medical Outcomes Partnership) concept ID's to previous identified entities.
Our technology recognizes the intricate relationships that interconnect various medical entities within a given text. This analysis unveils the nuanced interplay between medical concepts, contributing to the comprehension of medical narratives.
Our meticulous annotation process, conducted by medical professionals, ensures that any erroneously classified entities are corrected. This approach ensures the precision of our system's outputs and maintains a high standard of accuracy in medical data.
QA: Identifying False
Our unique, peer-reviewed and published approach involves identifying false negatives by comparing vectorized clinical notes with previously discarded records. This ensures we identify any crucial information missed by the earlier model, enhancing our capacity to detect overlooked relevant data and bolstering the sensitivity of our clinical datasets.
From free - text to standardized data.
By standardizing all hospital data, we ensure consistent and coherent understanding of information across various applications and systems, providing hospitals with a unique and comprehensive data repository.
The DERMACLEAR study: Verification results of a natural language processing system in dermatology.
Results from the DERMACLEAR study will increase the real-world evidence of clinical practice, obtaining a large amount of information on patients with the studied diseases. The NLP system used is precise in identifying patients diagnosed with HS, PsO, CU and/or AD, and other medical variables from EHRs, highlighting that it is a valid system to use in the DERMACLEAR study.
An open source corpus and automatic tool for section identification in Spanish health records.
This work shows that it is possible to build competitive automatic systems when both data and the right evaluation metrics are available. The annotated data, the implemented evaluation scripts, and the section identification Language Model are open-sourced hoping that this contribution will foster the building of more and better systems.
NATI (NATural language in ThyroId cancer).
A total of 5137 medical records of patients diagnosed with thyroid cancer between 2015 and 2022 were included. The median follow-up (interquartile range) was 29.7 months (8.8-55.8). The mean age at the time of diagnosis was 55 years (SD 18), and 67% were women. The stage could be classified in a subgroup of 520 patients, of which 60% (n=313) had advanced stages. Metastasis was observed in 2177 patients (42%) during the follow-up, mainly in lymph nodes (44%). It was also identified that the majority of patients (71%; n=3629) had some comorbidity.
Extending the OMOP CDM to store the output of natural language processing pipelines.
Although OMOP CDM provides a NOTE_NLP table to store the outputs of NLP algorithms, queries to this table can become clumsy and slow, so we designed and extended the OMOP CDM with our own NLP schema to store the results generated in the annotation process of NLP. We designed an extension of the OMOP CDM able to store the output of NLP solutions while integrating with the vocabulary normalization process of the OMOP CDM.
A Framework for False Negative Detection in NER/NEL.
Finding the false negatives of a NER/NEL system is fundamental to improve it, and is usually done by manual annotation of texts. However, in an environment with a huge volume of unannotated texts (e.g. a hospital) and a low frequency of positives (e.g. a mention of a particular disease in the clinical notes) the task becomes very inefficient.
Efficient automated mapping of internal source codes to OMOP CDM concepts.
Our automated concept mapping system provides an efficient way of mapping source codes to OMOP concepts. By utilizing text-based vector representations and knowledge transfer, our system can find equivalent mappings from other hospitals, thereby reducing the time and effort required for manual mapping.
ContextMEL: Classifying Contextual Modifiers in Clinical Text.
Taking advantage of electronic health records in clinical research requires the development of natural language processing tools to extract data from unstructured text in dif ferent languages. A key task is the detection of contextual modifiers, such as understanding whether a concept is negated or if it belongs to the past. We present ContextMEL, a method to build classifiers for contextual modifiers that is independent of the specific task and the language, allowing for a fast model development cycle.