We've created several guides and standard operating procedures (SOPs), which include
required steps,
our recommended best practices, and helpful bits of information, for use by the team, and
our internal
and external collaborators. Click on the sections in the left-hand sidebar to learn
more.
These guides are updated as and when necessary, so we recommend both keeping an eye on this
page, as
well as reach out to us if you need any further
information.
The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) is a data model designed for standardising and analysing healthcare data from different sources for observational research. The OMOP Common Data model enables the structured examination of diverse observational databases. This methodology involves converting data from these databases into a unified format (data model) and consistent representation (terminologies, vocabularies, coding schemes). Subsequently, researchers can conduct systematic analysis using a set of standardised analytic routines specifically designed for the common format.
Data standardisation plays a crucial role in establishing a unified format of data, which allows for collaborative research, enables large scale analysis, and promotes the sharing of advanced tools. The importance of this arises due to significant variations in healthcare data across different organisations. As data is collected for various purposes, it can be stored in diverse formats, using different systems and information models. Despite the increasing use of standard terminologies in healthcare, identical concepts can be represented differently at different organisations.
The Observational Health Data Sciences and Informatics (OHDSI) program developed the OMOP CDM. As part of the CDM, the OMOP Standardised Vocabularies serve two main purposes: they act as a common repository for all vocabularies used in the community, and they provide standardisation and mapping for use in research. The Standardised Vocabularies are freely available to the community and are mandatory for use as a reference table in OMOP CDM instances. To build the Standardised Vocabularies, all vocabularies are consolidated into a common format, simplifying the researchers’ work by eliminating the need to understand multiple formats and conventions. The OHDSI Vocabulary Team manages and updates the vocabularies regularly using the Pallas system. Researchers can access the Standardised Vocabularies from Athena, where they can select and download the vocabularies needed for their OMOP CDM. OHDSI prefers adopting existing vocabularies instead of building new ones due to their complexity and the existence of well-utilised vocabularies in the community. Concepts, representing the semantic notion of each clinical event in the OMOP CDM, are stored in the CONCEPT table, forming the foundation of the data records.
An Extract, Transform, and Load (ETL) approach is used to convert source data into the OMOP CDM.
In the first stage of the process, both data experts and CDM experts work together to design the ETL. It helps to have prior experience in the implementation of ETLs to improve efficiency during this process, which requires an in-depth knowledge of the source data. Softwares such as WhiteRabbit and Rabbit-in-a-Hat can be used during the ETL design stage.
WhiteRabbit creates a scan of the source data and creates a report which includes:
This report gives you a good idea of the general structure of the source data and can help with mapping of the data later on. The Rabbit-in-a-Hat software shows both the source data and the CDM. This is useful during the data mapping process as it shows how tables interlink.
For the next stage of the process, people with medical and clinical coding knowledge will help create the code mappings. There is often a use of standard vocabularies such as the OHDSI vocabularies, RxNorm, ICD, and SNOMED. Data mapping is a big task and can be complex depending on the original source data format and quality. To make the process easier, it is best to focus on the most frequently used codes and either (1) exclude or (2) group codes which come up less often i.e., due to their relatively limited individual significance.
After receiving the design of the ETL with the complete code mappings, a technical person can use this information to implement the ETL via a coding language such as Java.
Finally, quality control is essential, especially in a medical context, therefore everyone is involved in ensuring there is good data quality control throughout the design process. Tools such as Achilles are useful for spotting abnormal data inputs and data distributions and hence alerting the designers of data quality issues.
OMOP has the potential to revolutionise decision making and healthcare research within the NHS. It would facilitate the standardisation of healthcare data providing a unified data model that allows all healthcare data sources within the NHS to be accumulated and mapped onto a common format. This will enable researchers to analyse and compare data from multiple sources with more ease and on a larger scale leading to valuable insights being made to inform clinical decision making, drive evidence-based policies as well as allowing for a more collaborative style of research within the NHS. Secondly, the standardisation of data will increase data quality and data sharing capabilities, laying the foundations for a robust healthcare system.
In addition, OMOP could be used within the NHS to create large scale patient databases with real-time treatment patterns and outcomes. OMOP’s data analytics capabilities also enable the identification of patient subgroups based on shared characteristics, such as genetic predispositions or response patterns to treatments. This stratification can help identify patient populations who are more likely to benefit from specific therapies, allowing for more personalised and effective healthcare approaches.
OMOP provides a standardised, open-source data model that allows for the integration of diverse healthcare data sources. This standardisation ensures that data from various systems, institutions, and countries can be easily mapped and harmonised, promoting interoperability. Common data models often lack this level of standardisation, leading to challenges when attempting to combine data from different sources. Whereas, whilst some common data models support large-scale analytics, they may not be specifically tailored for observational research. This limitation can restrict researchers from conducting population-level studies efficiently and may require additional efforts to achieve comparable results
CogStack is an application framework that allows us to extract information from unstructured data sources, e.g. electronic health/patient records (EHR/EPR). At Guys and St Thomas’ NHS Foundation Trust (GSTT), it contains a catalogue of hospital documents from several of our clinical data sources and can be used to identify patient cohorts, as well as search for clinical information for other purposes, such as AI software evaluation.
⚠️ Please note that CogStack is an index of documents and not an index of patients, so much of the data’s worth will be focused in the free text of each document and will consequently require further analysis.
To gain access to CogStack, your request will need to be issue via your line manager to the CogStack team. This request should indicate which level of access you require, i.e. reader and/or admin. If you are granted reader access, you will be able to search CogStack but not export any data. If you do need to export data but do not/were not granted admin access, you will need to speak with a colleague who does have admin access and they will be able to help you export your data.
patient_TrustNumber
. The syntax in general is: keyword : "search term" AND keyword : "second search term" OR keyword : "third search term" AND NOT keyword : "exclusion term"
. You can use as many or as few keywords as you wish, and you can find the keywords by either inspecting a document or by scrolling through the options that drop down when you click on the search bar.
body_analysed
means the free text area of the document, i.e. your search terms are most likely to appear there.gstt_clinical_documents_letters
, gstt_clinical_epr
, and/or gstt_clinical_epr_results
for most purposes.
document_AncillaryReferenceID
.If you would like to contribute to the development of this website, we suggest you:
gh-pages
branch first by running git checkout -b gh-pages
.gh-pages
(git checkout -b bug/BugFix
or git checkout -b feature/AmazingFeature
).git commit -m 'Fixes a SmallBug'
or git commit -m 'Adds some AmazingFeature'
).
git push bug/BugFix
or git push origin feature/AmazingFeature
).gh-pages
branch.gh-pages
branch and the changes will be automatically updated in the live website.⚠️ If, at any point, you find something unexpected happens or require further support, please reach out to us.
You can view your changes during development with a locally-hosted version of the website on your machine by following the Installation instructions in the GitHub repository’s README here).
The National Data Opt Out (NDOO) ensures that patients have a right to withdraw their consent to any of their data being used for research or service development, i.e. secondary/indirect care purposes. Therefore, if they do have a recorded opt-out status, their data must therefore be excluded.
Within Guy’s and St Thomas’ NHS Foundation Trust (GSTT), the currently most straightforward way to check the opt-out statuses is to do so via the NDOO Check Service web application. This can only be accessed by staff with valid GSTT email addresses and whilst on the Trust VPN.
GSTT staff can either manually input one or more NHS numbers on the website itself, or they can submit a CSV file containing up to 100,000 NHS numbers per request. If they are unable to identify the NHS numbers of one or more patients within their cohort, the general recommendation from the Information Governance (IG) team is to exclude these patients.
Based on their submissions, GSTT staff will then receive a response file (to the email address provided in the original submission) containing the NHS numbers of patients who have not opted out and whose data may be included.
⚠️ Users of accessing the NDOO Check Service should not be maintaining a record of the patients who have opted out! The response file should only be used for the purpose of complying the NDOO policy and adhering to the data retention policy as outlined in the project’s Data Protection Impact Assessment (DPIA).
We recommend users reach out to the IG team for their further advice if they believe there are:
For more details on the NDOO Check Service or ongoing work to directly query NHS Digital’s NDOO database via their API, please reach out to Haleema or Dika.
⚠️ Before proceeding with the following steps, please ensure:
All artificial intelligence (AI) projects must be registered with the Trust Quality Improvement and Patient Safety Team as a service evaluation/clinical audit on the Trust database here.
The proposal will be sent to the Specialty Quality Improvement (QI) & Audit lead for approval and then to the Directorate QI & Audit lead. Once approved, you will be notified by email.
This guide describes how to book a room for CSC meetings, 10x workshops, etc., specifically for the following two main GSTT locations:
⚠️ If you do not have access to either of the above locations, please email the Security Access Authorisation team with your line manager copied to request this be granted.
To book a room, you will need to provide the required:
For Tabard House, the preferred room is the Library on the ground floor* and booking requests should be sent via email to the Medical Physics department’s Assistant Service Manager.
For York Road, you can either book:
* You will need an access code to enter this floor in Tabard House. If this has not already been provided, please email the CSC Team.
If you have invited colleagues who do not usually have access to either location, you will need to collect them from the reception.