Core Programme 1: Provisioning Datasets for Research

Prof Andrew Morris, Dr Marion Bain

Aims

  1. To create a "research portal" for EPRs already held by NHS Scotland, the Scottish Health Information Service for Research (SHIS-R).
  2. To develop and evaluate innovative technical approaches that allow linkage between large, federated, "third party" research datasets between themselves and with SHIS-R.
  3. To develop and evaluate systems that work across institutional boundaries with adequate data manipulation and statistical functionality that provide rapid, secure, access to the type of data that clinical scientists require.

The challenge for this work package is the development of an infrastructure that satisfies a set of constraints that are critical for success, namely: inter-organisational sharing of data and information; economic and political agreements for the sharing of data; the maintenance of confidentiality; widely accepted approaches to governance and ethics; quality-controlled research environments; good security and audit; and generalisability and technical feasibility.

Provisioning Datasets for ResearchMethods

We propose to evaluate a hybrid platform that will pilot the integration of data in different ways, allowing us to address the issue of data sharing across institutional boundaries.

This pragmatic approach appreciates the key role of ISD and NHS Scotland and their plans for the rollout of a national data warehouse alongside the research community's efforts at federating externally-controlled research datasets. The latter leverages expertise and software developed collaboratively by the Health Informatics Centre (Dundee), including expertise in anonymisation, the National eScience Centre (Glasgow) and Robertson Centre for Biostatistics (Glasgow) to address the pragmatic aspects of data sharing in the health and non-healthcare domains.

Creating the Scottish Health Information Service for Research (SHIS-R)

(Dr Marion Bain, Prof Andrew Morris)

Access to accurate information from EPRs is vital for the NHS to be able to deliver services and to meet its objectives. The NHS in Scotland has a clearly defined strategy for the development of a Scottish Health Information Service (SHIS), based within ISD.

Central to this is the development of the NHS Scotland Data Warehouse containing the key data marts from all 14 Health Boards in Scotland that are relevant to research: the Scottish Morbidity Record (SMR01) including all general and acute in-patient and day cases (~750,000 p.a.); SMR06 cancer registration data; General Register Office for Scotland mortality records; primary care prescribing records; and some fields from the CHI register.

We will build upon this strategy to create SHIS-R that will provide: a portal for knowledge about the availability of appropriate health related data; metadata on their suitability for specific research purposes; and the procedures required to gain access to and use such data. Critically, it will also provide a single interface for linkage with external 'third party' research datasets. This will provide a sustainable resource of all nationally collected routine datasets held by NHS Scotland, making them available to researchers throughout the UK.

Evaluation of innovative developments for the storage, interrogation, integration and management of federated research datasets

(Dr Mark McGilchrist, Prof Richard Sinnott, Prof Ian Ford, Prof Frank Sullivan)

Building upon SHIS-R we will evaluate how best to facilitate interconnectivity of ISD held EPR datasets with existing non-government 'third party' research data resources, including genomic, epidemiological and non-health datasets. There are many ways to link data which offer different models of access-control. We will evaluate different method against identical criteria to assess their strengths and weaknesses.

Deliverables

Our case study will use different methods to link the SHIS-R with Generation Scotland - a nationwide genetics study of up to 50,000 family members in Scotland - assessing performance across nine criteria:

  1. Governance: How and whether all stakeholders exercise control over a linkage.
  2. Institutional control: The control source institutions exercise over a linkage.
  3. Confidentiality: What information each entity belonging to the infrastructure discovers during linkage.
  4. Source data description: How meta-data informs linkage.
  5. Quality control: How each method enables control of linkage and data quality.
  6. Access: How external researchers gain access to the infrastructure.
  7. Security: How the infrastructure can be undermined by external threats.
  8. Generalisation: The scalability of the technical approach, buy-in from data sources, governance and ethics structures.
  9. Technology: The availability of tools and trained personnel for a given method, and speed and cost of provisioning data through each mechanism.