Our Automated Concept Mapping System is capable to standardize data into the OMOP Common Data Model automatically, through leveraging vector representations of internal source code descriptions.
Automated Concept mapping.
Our technology leverages the power of Artificial Intelligence to automate mapping of all internal source codes of the hospital to OMOP CDM. The automation of code assignment is achieved through a vector search-based approach. Our system creates vector representations of hospital internal code descriptions, which capture the relevant syntactic and semantic text features, ensuring that similar registers exhibit correspondingly similar representations. Therefore, assigning standard codes becomes a matter of performing similarity searches within the vector space: the OMOP code associated with the most similar vector is assigned to the record we aim to map.
Our added value: unique
concept mapping model.
Our technology is capable to preprocess and later vectorize internal hospital descriptions through a Natural Language Processing Model.
Clinical Validation and
Our clinical specialists, who are trained medical annotators, validate the model results using a dedicated internal interface tailored specifically for this purpose. ensuring the accuracy and reliability of the outcomes attained through our technology.
Through the implementation of vector search techniques, we achieve the acquisition of standardized codes, enabling seamless automation in the encoding process.
After the QA process, the confirmed codifications are efficiently uploaded into the OMOP CDM. This seamless integration ensures that data is harmonized and standardized, promoting interoperability and facilitating comprehensive healthcare analysis and research for healthcare organizations.
The DERMACLEAR study: Verification results of a natural language processing system in dermatology.
Results from the DERMACLEAR study will increase the real-world evidence of clinical practice, obtaining a large amount of information on patients with the studied diseases. The NLP system used is precise in identifying patients diagnosed with HS, PsO, CU and/or AD, and other medical variables from EHRs, highlighting that it is a valid system to use in the DERMACLEAR study.
An open source corpus and automatic tool for section identification in Spanish health records.
This work shows that it is possible to build competitive automatic systems when both data and the right evaluation metrics are available. The annotated data, the implemented evaluation scripts, and the section identification Language Model are open-sourced hoping that this contribution will foster the building of more and better systems.
NATI (NATural language in ThyroId cancer).
A total of 5137 medical records of patients diagnosed with thyroid cancer between 2015 and 2022 were included. The median follow-up (interquartile range) was 29.7 months (8.8-55.8). The mean age at the time of diagnosis was 55 years (SD 18), and 67% were women. The stage could be classified in a subgroup of 520 patients, of which 60% (n=313) had advanced stages. Metastasis was observed in 2177 patients (42%) during the follow-up, mainly in lymph nodes (44%). It was also identified that the majority of patients (71%; n=3629) had some comorbidity.
Extending the OMOP CDM to store the output of natural language processing pipelines.
Although OMOP CDM provides a NOTE_NLP table to store the outputs of NLP algorithms, queries to this table can become clumsy and slow, so we designed and extended the OMOP CDM with our own NLP schema to store the results generated in the annotation process of NLP. We designed an extension of the OMOP CDM able to store the output of NLP solutions while integrating with the vocabulary normalization process of the OMOP CDM.
A Framework for False Negative Detection in NER/NEL.
Finding the false negatives of a NER/NEL system is fundamental to improve it, and is usually done by manual annotation of texts. However, in an environment with a huge volume of unannotated texts (e.g. a hospital) and a low frequency of positives (e.g. a mention of a particular disease in the clinical notes) the task becomes very inefficient.
Efficient automated mapping of internal source codes to OMOP CDM concepts.
Our automated concept mapping system provides an efficient way of mapping source codes to OMOP concepts. By utilizing text-based vector representations and knowledge transfer, our system can find equivalent mappings from other hospitals, thereby reducing the time and effort required for manual mapping.
ContextMEL: Classifying Contextual Modifiers in Clinical Text.
Taking advantage of electronic health records in clinical research requires the development of natural language processing tools to extract data from unstructured text in dif ferent languages. A key task is the detection of contextual modifiers, such as understanding whether a concept is negated or if it belongs to the past. We present ContextMEL, a method to build classifiers for contextual modifiers that is independent of the specific task and the language, allowing for a fast model development cycle.