Guides | CSC

We've created several guides and standard operating procedures (SOPs), which include required steps, our recommended best practices, and helpful bits of information, for use by the team, and our internal and external collaborators. Click on the sections in the left-hand sidebar to learn more.

These guides are updated as and when necessary, so we recommend both keeping an eye on this page, as well as reach out to us if you need any further information.

OMOP

What is it?

The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) is a data model designed for standardising and analysing healthcare data from different sources for observational research. The OMOP Common Data model enables the structured examination of diverse observational databases. This methodology involves converting data from these databases into a unified format (data model) and consistent representation (terminologies, vocabularies, coding schemes). Subsequently, researchers can conduct systematic analysis using a set of standardised analytic routines specifically designed for the common format.

Importance of Data Standardisaiton

Data standardisation plays a crucial role in establishing a unified format of data, which allows for collaborative research, enables large scale analysis, and promotes the sharing of advanced tools. The importance of this arises due to significant variations in healthcare data across different organisations. As data is collected for various purposes, it can be stored in diverse formats, using different systems and information models. Despite the increasing use of standard terminologies in healthcare, identical concepts can be represented differently at different organisations.

OMOP and the OHDSI standardised vocabularies

The Observational Health Data Sciences and Informatics (OHDSI) program developed the OMOP CDM. As part of the CDM, the OMOP Standardised Vocabularies serve two main purposes: they act as a common repository for all vocabularies used in the community, and they provide standardisation and mapping for use in research. The Standardised Vocabularies are freely available to the community and are mandatory for use as a reference table in OMOP CDM instances. To build the Standardised Vocabularies, all vocabularies are consolidated into a common format, simplifying the researchers’ work by eliminating the need to understand multiple formats and conventions. The OHDSI Vocabulary Team manages and updates the vocabularies regularly using the Pallas system. Researchers can access the Standardised Vocabularies from Athena, where they can select and download the vocabularies needed for their OMOP CDM. OHDSI prefers adopting existing vocabularies instead of building new ones due to their complexity and the existence of well-utilised vocabularies in the community. Concepts, representing the semantic notion of each clinical event in the OMOP CDM, are stored in the CONCEPT table, forming the foundation of the data records.

Converting into OMOP data model

An Extract, Transform, and Load (ETL) approach is used to convert source data into the OMOP CDM.

Designing the ETL

In the first stage of the process, both data experts and CDM experts work together to design the ETL. It helps to have prior experience in the implementation of ETLs to improve efficiency during this process, which requires an in-depth knowledge of the source data. Softwares such as WhiteRabbit and Rabbit-in-a-Hat can be used during the ETL design stage.

WhiteRabbit creates a scan of the source data and creates a report which includes:

a list of tables in the source database
a list of fields per table
a list of distinct values found in a field,
the frequency at which a value occurs

This report gives you a good idea of the general structure of the source data and can help with mapping of the data later on. The Rabbit-in-a-Hat software shows both the source data and the CDM. This is useful during the data mapping process as it shows how tables interlink.

Creating the Code Mappings

For the next stage of the process, people with medical and clinical coding knowledge will help create the code mappings. There is often a use of standard vocabularies such as the OHDSI vocabularies, RxNorm, ICD, and SNOMED. Data mapping is a big task and can be complex depending on the original source data format and quality. To make the process easier, it is best to focus on the most frequently used codes and either (1) exclude or (2) group codes which come up less often i.e., due to their relatively limited individual significance.

Implementing the ETL Design

After receiving the design of the ETL with the complete code mappings, a technical person can use this information to implement the ETL via a coding language such as Java.

Quality Control

Finally, quality control is essential, especially in a medical context, therefore everyone is involved in ensuring there is good data quality control throughout the design process. Tools such as Achilles are useful for spotting abnormal data inputs and data distributions and hence alerting the designers of data quality issues.

The potential of OMOP within the NHS

OMOP has the potential to revolutionise decision making and healthcare research within the NHS. It would facilitate the standardisation of healthcare data providing a unified data model that allows all healthcare data sources within the NHS to be accumulated and mapped onto a common format. This will enable researchers to analyse and compare data from multiple sources with more ease and on a larger scale leading to valuable insights being made to inform clinical decision making, drive evidence-based policies as well as allowing for a more collaborative style of research within the NHS. Secondly, the standardisation of data will increase data quality and data sharing capabilities, laying the foundations for a robust healthcare system.

In addition, OMOP could be used within the NHS to create large scale patient databases with real-time treatment patterns and outcomes. OMOP’s data analytics capabilities also enable the identification of patient subgroups based on shared characteristics, such as genetic predispositions or response patterns to treatments. This stratification can help identify patient populations who are more likely to benefit from specific therapies, allowing for more personalised and effective healthcare approaches.

The benefits of OMOP over other CDMs

OMOP provides a standardised, open-source data model that allows for the integration of diverse healthcare data sources. This standardisation ensures that data from various systems, institutions, and countries can be easily mapped and harmonised, promoting interoperability. Common data models often lack this level of standardisation, leading to challenges when attempting to combine data from different sources. Whereas, whilst some common data models support large-scale analytics, they may not be specifically tailored for observational research. This limitation can restrict researchers from conducting population-level studies efficiently and may require additional efforts to achieve comparable results

Adoption and Implementation of OMOP

Larger healthcare companies often have significant investments in their existing proprietary systems, data formats, and medical devices. Adopting OMOP or any other standardised data model would require significant resources and efforts for data transformation, staff training, and system updates. They may resist these changes due to the perceived costs and efforts involved.
Big healthcare organisations might be concerned about losing control over their data if they adopt a standardised data model like OMOP. Proprietary data formats may provide them with more control over how data is structured, stored, and accessed, whereas a standardised model could limit their autonomy over data management.
Big healthcare organisations often operate on complex and diverse IT systems, and switching to a new data model like OMOP might create compatibility and integration challenges. They may fear disruptions to their existing workflows and operations during the transition.

CogStack

CogStack is an application framework that allows us to extract information from unstructured data sources, e.g. electronic health/patient records (EHR/EPR). At Guys and St Thomas’ NHS Foundation Trust (GSTT), it contains a catalogue of hospital documents from several of our clinical data sources and can be used to identify patient cohorts, as well as search for clinical information for other purposes, such as AI software evaluation.

⚠️ Please note that CogStack is an index of documents and not an index of patients, so much of the data’s worth will be focused in the free text of each document and will consequently require further analysis.

To gain access to CogStack, your request will need to be issue via your line manager to the CogStack team. This request should indicate which level of access you require, i.e. reader and/or admin. If you are granted reader access, you will be able to search CogStack but not export any data. If you do need to export data but do not/were not granted admin access, you will need to speak with a colleague who does have admin access and they will be able to help you export your data.

Log into the GSTT network and open the Chrome web browser (it does not work with Internet Explorer).
Go to https://cogstack.gstt.nhs.uk:5601 and select Log in with GSTT authentication. You may be prompted to input your GSTT username and password.
An Access Agreement notice will appear warning you that you are about to see sensitive information. Press the Acknowledge and continue button.
You wil be presented with your home dashboard. Click on the three lines (the menu) on the left-hand side of the screen (just under the Elastic logo and under Analytics tab) and select Discover.
You will be then presented with a search bar (at the top), the results panel (centre right) and the index panel (centre left).
CogStack is built upon ElasticSearch and follows its query logic. Most of the fields within each document are tagged with an ID by which you can use to search through the documents, e.g. patient_TrustNumber. The syntax in general is: keyword : "search term" AND keyword : "second search term" OR keyword : "third search term" AND NOT keyword : "exclusion term". You can use as many or as few keywords as you wish, and you can find the keywords by either inspecting a document or by scrolling through the options that drop down when you click on the search bar.
- Please note that body_analysed means the free text area of the document, i.e. your search terms are most likely to appear there.
These documents are collated into catalogues represented by indexes, and although the default index is a good place to start, we recommend accessing gstt_clinical_documents_letters, gstt_clinical_epr, and/or gstt_clinical_epr_results for most purposes.
- The first will contain the letters sent to a patient’s primary care clinician (usually their GP or whoever referred them for treatment), which will contain details such as diagnosis and reports on treatment completion.
- The second contains observations, orders and results, such as blood test orders and results, pathology results, and radiology reports.
- The third will enable you to find accession numbers for the images stored under the keyword document_AncillaryReferenceID.
When you are happy with your query, press the Enter or click on Update button at the end of the search bar.
The Results panel will populate with documents that match your search terms. If your search is very broad, this may take a few seconds. An error box will appear on the bottom right-hand corner if there are errors with the search term itself.
- To investigate your results, click on the arrow on the left-hand side of each search result to expand it.
If you have admin access, you will be able to save your query results and export them for further processing. To do so:
- First, save your query by clicking on the saving menu on the left-hand side of the search bar, click on ‘Save current query’ and use a sensible name that will be easy to recognise later.
- Once your query is saved, click on Share on the top right-hand corner, then select CSV Report. This will output your results as a CSV file.

How to contribute

Website Development

If you would like to contribute to the development of this website, we suggest you:

If there isn’t already an existing Issue that covers the bug fix and/or feature(s) you want to fix and/or add, create a new Issue.
Go to the GitHub repository here and fork or clone the repository (see the GitHub documentation here for more information on how to do so).
Checkout the main i.e., gh-pages branch first by running git checkout -b gh-pages.
Create your Bugfix or Feature Branch off of gh-pages (git checkout -b bug/BugFix or git checkout -b feature/AmazingFeature).
Commit your changes (git commit -m 'Fixes a SmallBug' or git commit -m 'Adds some AmazingFeature').
- For commit messages, we recommend this commit message style (see here for simplified summary).
Push to the remote (git push bug/BugFix or git push origin feature/AmazingFeature).
Open a Pull Request (PR) and specify that you want to merge your branch into the gh-pages branch.
A CSC team member will review your PR and, if approved, merge it into the gh-pages branch and the changes will be automatically updated in the live website.

⚠️ If, at any point, you find something unexpected happens or require further support, please reach out to us.

Reviewing your changes during development

You can view your changes during development with a locally-hosted version of the website on your machine by following the Installation instructions in the GitHub repository’s README here).

National Data Opt-Out (NDOO)

The National Data Opt Out (NDOO) ensures that patients have a right to withdraw their consent to any of their data being used for research or service development, i.e. secondary/indirect care purposes. Therefore, if they do have a recorded opt-out status, their data must therefore be excluded.

Within Guy’s and St Thomas’ NHS Foundation Trust (GSTT), the currently most straightforward way to check the opt-out statuses is to do so via the NDOO Check Service web application. This can only be accessed by staff with valid GSTT email addresses and whilst on the Trust VPN.

GSTT staff can either manually input one or more NHS numbers on the website itself, or they can submit a CSV file containing up to 100,000 NHS numbers per request. If they are unable to identify the NHS numbers of one or more patients within their cohort, the general recommendation from the Information Governance (IG) team is to exclude these patients.

Based on their submissions, GSTT staff will then receive a response file (to the email address provided in the original submission) containing the NHS numbers of patients who have not opted out and whose data may be included.

⚠️ Users of accessing the NDOO Check Service should not be maintaining a record of the patients who have opted out! The response file should only be used for the purpose of complying the NDOO policy and adhering to the data retention policy as outlined in the project’s Data Protection Impact Assessment (DPIA).

We recommend users reach out to the IG team for their further advice if they believe there are:

Project-specific patient opt-in processes that should take precedence over the NDOO policy.
Patients who should still be included in their project despite not having identified their NHS numbers and therefore their NDOO opt-out statuses.

For more details on the NDOO Check Service or ongoing work to directly query NHS Digital’s NDOO database via their API, please reach out to Haleema or Dika.

Quality Improvement Projects (QIPS)

⚠️ Before proceeding with the following steps, please ensure:

Your project falls under “Service Development” and not “Research”, e.g. via this decision tool.
The Clinical Lead is happy to be added as a collaborator on the QIPS system.

All artificial intelligence (AI) projects must be registered with the Trust Quality Improvement and Patient Safety Team as a service evaluation/clinical audit on the Trust database here.

Log into with your Trust credentials.
Select Create New Proposal tab.
Select Service Evaluation as the Proposal Type.
Enter Project Title as agreed with the radiologist(s).
Add the Clinical Lead as the Proposer.
Add your telephone number.
Select No in response to the question Is proposal trust wide?
Select Medical Physics as the Lead Speciality.
Enter your name as the Responsible Person.
Select High risk service, Of local concern, Identified as a problem, and/or Quality improvement, and any other selections that may apply as Reasons for carrying out this project.
Briefly explain what you are trying to achieve with this project in response to the Objective section.
Briefly explain what you are trying to achieve with this project in response to the Objective section.
Click on Save and then press Next.
Enter Radiologists as the Stakeholder.
- Select all activities they will be involved with and press the green plus button.
If the project does not involve patients or carers, select No, otherwise select Yes.
Describe who will be included, e.g. ‘all chest CT’, and who will be excluded (if any) as the Population.
- As part of this Population or Sample, select all applicable fields and choose to gather data from 1/1 of last year to 31/1 of last year.
- Select data collection strategy (e.g. retrospective) and the data sources (e.g. patient records).
Click on Save and then press Next.
Click on Add measures. A pop-up will appear.
Set the standard or acceptable level of compliance (frequently ‘100%’ but write whatever is appropriate for your project).
Describe the evidence of quality of care or service, i.e. how will we know whether the AI is working?, etc. For example, “lung nodule in chest CT is correctly reported in report”.
Select No as Exception.
Describe how data will be acquired, e.g. “studies to be exported from PACS and processed offline”, as Data collection instructions.
Click on Add measures.
Fill in the target dates, e.g. for data collection (1 month from start), findings to be reviewed by 3 months from the data collection date, and report to be submitted 3 months from the findings date.
Click on Save and then press Next.
Select all applicable boxes (this is likely to be all of them).
Click on Next.
Agree with the declaration by ticking the box, click on Next and then press Submit.

The proposal will be sent to the Specialty Quality Improvement (QI) & Audit lead for approval and then to the Directorate QI & Audit lead. Once approved, you will be notified by email.

Room Booking

This guide describes how to book a room for CSC meetings, 10x workshops, etc., specifically for the following two main GSTT locations:

Tabard House (near Guy’s Hospital)
Education Centre (also referred to as York Road) (near St Thomas’ Hospital)

⚠️ If you do not have access to either of the above locations, please email the Security Access Authorisation team with your line manager copied to request this be granted.

To book a room, you will need to provide the required:

Date(s)
Time slot(s)
Seating capacity
Technical setup e.g., monitor(s), if applicable

For Tabard House, the preferred room is the Library on the ground floor* and booking requests should be sent via email to the Medical Physics department’s Assistant Service Manager.

For York Road, you can either book:

The Glass Meeting Pod on the 10th Floor via a request emailed to the Executive Assistant (EA) to the Deputy Chief Executive’s Office.
Other rooms in the building, as well as other sites such as Minerva House and Great Dover Street, via the MICAD room booking website, which must be accessed via the Trust VPN i.e., Citrix, if not on-site.

* You will need an access code to enter this floor in Tabard House. If this has not already been provided, please email the CSC Team.

Visitors

If you have invited colleagues who do not usually have access to either location, you will need to collect them from the reception.

XNAT

For the full list of SOPs available for XNAT at Guy’s and St Thomas’ NHS Foundation Trust (GSTT), please refer to our dedicated XNAT GitHub repository here.

For new or to-be-updated XNAT SOP requests, please refer to our Contributing guidelines here.