SHIP comprises an innovative mix of studies conducting health research using electronic patient records and major longitudinal cohort databases.  Longitudinal datasets will be created through the integration of both EPRs and non-medical routinely collected data. For example, we are exploring the value of EPRs in clinical trials, conducting epidemiological studies of diabetes through the linkage of a new national diabetes register with other routinely collected data, and exploring the feasibility of building inter-generational datasets for those captured in large-scale Scottish genetic studies.

SHIP has a programme of training seminars and workshops. We have established the 'Exploiting Existing Data for Health Research' conference as a biennial event designed to attract researchers from around the world to present state-of-the-art research in the field. We have also instigated the first training programme to be conducted at six monthly intervals covering a wide range of issues, including methods of record linkage, the analysis of routinely collected longitudinal data and issues of confidentiality, ethics and governance. Both the conference and the training will be open to researchers from Scotland and the rest of the world - it is hoped they are the pre-eminent meetings in the field of EPR research.

We have planned for an extensive series of public engagement activities focusing on the use of EPRs. To date, while such public engagement has been explored in relation to the collection of genetic data, in particular by members of our team, there has been little serious attention given to the use of routinely collected data for health research.

Background: Scotland's unique resources
Scotland has some of the best health service data in the world. A simple and far-sighted decision in the 1970's means that every person registered with a general practitioner (GP) in Scotland is allocated a unique identifying number from a centrally maintained register called the Community Health Index (CHI). The CHI number is the unique patient identifier in all primary health care activities, and is now used in hospital based clinical information systems achieving 93% compliance. It is the key to linking health data for research purposes. The CHI register contains data on address, postcode, GP, date and region of registration and, where relevant, date of death, allowing the demographic profile of Scotland, death and patient migration to be easily analysed. In addition, there is a commitment that all clinical communications contain core identification data, including the CHI number, so that clinical data can be accessed when and wherever required, and coded according to the international coding systems READ 3 and ICD10, with a commitment to move to SNOMED-CT.

Where the CHI number is unavailable (e.g. historical data) probability matching is used, and the record linkage unit within Information Services Division (ISD) of NHS National Services Scotland has an international reputation for using probability matching to link research databases to routine admissions and death data.

Compared with the rest of the UK, data quality is high, the centralisation of data in NHS Scotland is efficient, and the comprehensive computerisation of routine clinical data, alongside the mandated use of the CHI number for all health episodes, means that access to data for research is becoming easier and less expensive. Considerable progress has also been made in linking health data to other data sources through the Scottish Longitudinal Study which is one of the world's largest administrative datasets including a wealth of health, demographic and socio-economic data. This is particularly edifying, as more research is essential to help understand Scotland's poor health and mortality experience compared with elsewhere in the UK and Western Europe.

Building on these strengths, we believe that a step-change in the quality, quantity and governance of research using EPRs can now be achieved with a more joined-up Scottish-wide strategy. Our achievements to date have resulted from ad hoc linkages, with little co-ordination, and no identified research arm within ISD. The SHIP programme will provide a platform for Scottish record linkage that will provide lessons for EPR research throughout the UK and abroad. This is particularly timely in light of the emergence of national clinical datasets and large genomic studies such as UK Biobank and Generation Scotland.

Overall deliverables of the project
To provide access to an exciting new national research facility, firmly embedded within and supported by NHS Scotland, providing the basis for numerous future studies using EPRs.

To develop and evaluate innovative approaches to the storage, interrogation and linkage of external third party datasets, including genomic information, to EPRs.

To provide training and workshops, as well as a biennial international conference; the only inter-disciplinary conference of its type which brings together world leaders in this field.

To undertake public engagement activities, building on considerable experience in the field of the public's attitudes to genetic studies, to define a transparent and publicly acceptable approach to the governance of EPR research.

To produce novel research using EPRs and major longitudinal cohort databases, specifically in the areas of clinical trials, pharmacovigilance, diabetes epidemiology, and research resulting from the linkage of EPRs to socioeconomic and environmental data.