Funding opportunity

Funding opportunity: Operationalising scaled production and sharing of synthetic data

Apply for funding to evaluate the use of low-fidelity synthetic versions of datasets held securely within:

  • UK Data Service
  • Office for National Statistics (ONS) Secure Research Service
  • other trusted research environments (TREs)

You must be based at a UK research organisation eligible for ESRC funding.

The successful team will produce a public report addressing the objectives of the funding opportunity.

The full economic cost of the grant can be up to £375,000. ESRC and ADR UK will fund 80% of the full economic cost (up to £300,000).

We would expect project proposals to be for 18 to 24 months in duration.

Who can apply

Proposals are welcome from individual researchers or small teams from eligible research organisations:

  • UK higher education institutions
  • research council institutes
  • UK Research and Innovation-approved independent research organisations
  • eligible public sector research establishments

Check if you are eligible to apply for research and innovation funding.

Read ESRC’s research funding guide.

We will be looking for:

  • demonstrable experience of qualitative or mixed-methods evaluations
  • a willingness to engage with data owners, academic and non-academic researchers, and TREs across the ADR UK partnership and beyond

Experience in generating or using synthetic data is not essential, however, we will be looking for expertise to evaluate its utility to researchers and to TREs using real-world examples.

Letters of support from relevant TREs are strongly encouraged. The successful applicant will need to work closely with them to evaluate the costs and benefits of different approaches for producing and sharing synthetic data.

What we're looking for

Specific objectives

The successful team will need to:

  1. Identify a collection of low-fidelity synthetic versions of secure datasets that are currently available for researchers to access for inclusion in the evaluation. Datasets for the analysis should include but are not limited to synthetic versions of:
    • Annual Survey of Hours and Earnings
    • Grading and Admissions Data for England
    • Ministry of Justice Data First datasets
    • National Pupil Database and Longitudinal Education Outcomes, when these become available
    • Hospital Episode Statistics

Proposals can also include the creation of new synthetic data in cases where you can justify a need for the purposes of evaluating systems-wide operationalisation more generally. Costs for generating any new synthetic datasets should be included in the overall budget of up to £375,000.

  1. Evaluate the broad set of costs associated with creating synthetic data for data owners and TREs including initial and ongoing costs (for example, updates).
  2. Evaluate different models for sharing synthetic data, including implications for data owners or data providers in resourcing sharing. This could include (but is not limited to):
    • data production
    • ingest and curation procedures
    • metadata sharing
    • discoverability through the use of existing data catalogues
  3. Evaluate efficiencies for data owners and TREs when synthetic data are available, including but not limited to:
    • impact on the TRE resources in terms of for example, time spent responding to researchers’ requests for information about the data
    • impact on secure environment usage load, run times, etc.
    • uptake of different synthetic datasets by researchers, and influence this has on the demand for the real data
  4. Evaluate the use of low-fidelity synthetic data on researchers’ experience of carrying out research using secure administrative or social survey data, including but not limited to:
    • utility of the synthetic data for users to understand the data, as well as scope research questions, in advance of applying for access to the real data
    • impact on quality of applications to access data for example, success rate of project applications submitted through the UK Statistics Authority Research Accreditation Service project approval times, and any other impacts on the project accreditation process
    • utility of the synthetic data to develop and test code outside of the secure environment, either while waiting for access to the real data, or after access has been granted
  5. To make recommendations for further scaled production and sharing of low-fidelity synthetic data which are acceptable to data owners and to the public, including identifying opportunities for automation to increase efficiency. Although the focus of the project should be on low-fidelity synthetic data, the evaluation should also reflect on how the operationalisation of high-fidelity synthetic data might fare and what additional considerations might be needed. Note that additional research is not expected, but more the provision of an informed response based on the exploration undertaken for these objectives.

Public involvement and engagement

ADR UK is committed to meaningful public involvement and engagement, and will be co-leading a public consultation on public attitudes to synthetic data in parallel to this funding opportunity. The successful candidate or team will be expected to proactively engage with and contribute to the activities of this programme, including, where appropriate:

  • attendance of the Project Advisory Group meetings
  • attendance at public dialogue workshops, where possible
  • sharing project findings to inform the public consultation, and vice versa
  • ensuring this project outputs compliments the public consultation

The outcomes of the public consultation are expected to inform any recommendations for the scaled production and sharing of synthetic data in the final publishable report (see funding opportunity deliverables).

Funding opportunity deliverables

The expected deliverables of the funded proposal include:

  • an interim update report. The draft study report may be subject to review and revision processes prior to acceptance and publication
  • a final publishable report setting out responses to each of the objectives for the funding opportunity, including overall recommendations for the scaled production and sustainable dissemination of low-fidelity synthetic versions of secure administrative and social survey data. The timely publication of this report in an open-access repository will be a condition of ESRC’s award. A short confidential annex for ESRC can be provided if necessary
  • at least 1 public-facing blog, written in accessible language
  • academic paper or papers

The successful team is expected to work with the ADR UK Strategic Hub (including the communication and engagement, and the programme management office teams) and ESRC data and infrastructure team. The team will communicate the work to the public and relevant stakeholders and facilitate meaningful engagement with relevant communities.

Applicant webinar

ADR UK will host a webinar for applicants on 3 March 2023, 1:00pm to 2:30pm UK time, which will include:

  • more information on:
    • the background to this funding opportunity
    • specific objectives
    • the application process
  • an opportunity for questions to be asked

Register for a place at the applicant webinar on eventbrite.

The event will also be recorded for applicants who are unable to attend the session.

How to apply

You must apply using the Joint Electronic Submission (Je-S) system. Detailed information on how to apply is provided in the Je-S guidance for applicants for this funding opportunity (see ‘Supporting documents’ in the ‘Additional info’ section).

You can find advice on completing your application in:

We recommend you start your application early.

Your host organisation will also be able to provide advice and guidance.

Submitting your application

Before starting an application, you will need to log in or create an account in Je-S.

All investigators involved in the project need to be registered on Je-S.

Any investigators who do not have a Je-S account must register for one at least 7 working days before the opportunity deadline.

When applying:

  1. Select ‘documents’, then ‘new document’.
  2. Select ‘call search’.
  3. To find the opportunity, search for: Operationalising scaled production and sharing of synthetic data.

This will populate:

  • council: ESRC
  • document type: Standard proposal
  • scheme: Research grant
  • call/type/mode: Operationalising scaled production and sharing of synthetic data 2023

Once you have completed your application, make sure you ‘submit document’.

You can save completed details in Je-S at any time and return to continue your application later.

Deadline

ESRC must receive your application by 9 May 2023 at 4:00pm UK time.

You will not be able to apply after this time. Please leave enough time for your proposal to pass through your organisation’s Je-S submission route before this date.

You should ensure you are aware of and follow any internal institutional deadlines that may be in place.

Attachments

In addition to a completed proposal, your application must include the following attachments:

  • case for support (maximum 6 pages)
  • justification of resources (maximum 2 pages)
  • applicants’ CVs (maximum 2 pages each)

Optional attachments:

  • letters of support from key partners or stakeholders
  • Gantt-style timeline (PDF format, 1 page, size A3 or A4)
  • other annexes (maximum 6 pages in total)

Attachments should be uploaded in PDF (rather than Microsoft Word) format, to reduce document corruption issues. With the exception of letters of support, attachments should be in font size 11 with 2cm margins (recommended font type is Arial or Garamond).

Case for support

This is the body of your research proposal. It must not exceed 6 sides and must include details on:

  • the overall approach and planned methods to conduct the evaluation
  • evaluation expertise in the team
  • previous relevant work
  • plans for engaging with relevant stakeholders, including data owners, TREs, and academic and non-academic researchers that use them
  • commitment to actively engage with the activities of the ADR UK-led public consultation on public attitudes to synthetic data
  • understanding of the synthetic data landscape, and knowledge of some of the barriers and opportunities it presents for supporting research using secure data
  • knowledge exchange and impact plans

The case for support should be a self-contained description of the proposed work with relevant background and references, and should not depend on additional information such as the inclusion of external links. The expert panel are advised to base their assessment on the information contained within the application and are under no obligation to access such links (so they should not be used to provide critical information).

CV

CVs should include:

  • contact details
  • qualifications (including class and subject)
  • academic and professional posts held since graduation
  • a list of the most relevant and recent publications
  • a record of research funded by ESRC and other bodies

This should not exceed 2 sides of A4.

Justification of resources

This is a 2-side A4 statement justifying the resources required to undertake the research project. Where you do not provide an explanation for an item that requires justification, it will be cut from any grant made.

Proposals that include co-investigators from the UK business sector or from third sector organisations that engage in economic activity must ensure that the involvement of these organisations complies with state aid legislation.

Proposals that include co-investigators from third sector organisations that are deemed not to engage in economic activity must provide evidence of this status in the justification of resources.

Please refer to the Je-S help text for further guidance.

How we will assess your application

Assessment criteria

Proposals will be assessed using the following criteria, which are supported by prompt questions for assessors:

Scientific excellence

Assessor questions:

  • is the proposed scheme of work of the highest scientific quality?
  • does the team have expertise in qualitative, or mixed methods evaluations, or both?
  • does the proposal give confidence that it is ethical and acceptable to the public, and is clearly in the public interest?

Delivery confidence

Assessor questions:

  • will the methods and resources proposed deliver the funding opportunity objectives?
  • is there appropriate expertise in the delivery of key functions and are the skills and experience of the applicants well-demonstrated and appropriate?
  • is the delivery timetable feasible?
  • are risks to delivery and mitigation plans clearly identified?

Collaboration and engagement

Assessor questions:

  • does the proposal contain a convincing plan for engaging with relevant stakeholders, including data owners, TREs, and academic and non-academic researchers that use them?
  • does the proposal include a commitment to meaningfully engage with the ADR UK-led public consultation on public attitudes to synthetic data?
  • does the proposal include activities that support effective knowledge exchange between research, policy and funder communities?

Value for money

Assessor questions:

  • has the proposal been clearly and adequately costed?
  • has the amount of resource requested been justified?

Assessment procedure

All proposals submitted to this funding opportunity will be subject to standard eligibility checks. Following these checks, eligible proposals will be independently reviewed by an expert review panel, who will then meet to discuss and agree a recommendation to ESRC on funding.

UK Research and Innovation (UKRI) supports the San Francisco declaration on research assessment (DORA) and recognise the relationship between research assessment and research integrity.

We also follow the UKRI principles of assessment and decision making when assessing your proposals.

Contact details

Get help with developing your proposal

For help and advice on costings and writing your proposal please contact your research office in the first instance, allowing sufficient time for your organisation’s submission process.

Ask about this funding opportunity

ADR UK team

Email: hub@adruk.org

Include ‘Operationalising synthetic data funding call’ in the subject line

Get help with applying through Je-S

Email

jeshelp@je-s.ukri.org

Telephone

01793 444164

Opening times

Je-S helpdesk opening times

Additional info

Background

Researchers increasingly rely on accessing sources of sensitive data to undertake their analyses. Infrastructure has been developed since the mid-2000s that enables researchers to access and use such data through what are now known as TREs.

Many TREs operate under the principles of the 5 safes framework, which stipulates that the 5 following areas should be considered when designing a data access solution:

  • safe people
  • safe projects
  • safe data
  • safe settings
  • safe outputs

Of these, data services which operate TREs typically establish processes for accrediting researchers and projects. These processes can:

  • be lengthy and cumbersome
  • rely on often imprecise information provided by the researcher about the data they are applying to access (since they are not able to see the data they want to use until after they have had their project accredited)

As a consequence, research projects can be held up for many months while a researcher waits for access to data, and when the researcher finally gets access, the data may not meet the researcher’s expectations.

Low-fidelity synthetic data presents a potential solution to this problem because it allows a researcher earlier access to a version of the data that resembles the real data but does not include any information about real individuals. This gives researchers the opportunity to understand the data and plan their research in advance of going through the lengthy process of applying to use the real data.

This has the potential to enable a researcher to:

  • submit a higher quality application that is more likely to be approved
  • generate code to analyse the data, based on their understanding of the structure of the synthetic data, while waiting for approvals to access the real data. This can significantly reduce the time between initially applying to access data and completing analyses, since researchers are able to carry out these activities in parallel

Low-fidelity’ synthetic datasets are typically created by randomly generating values within each variable that roughly follow the distribution of the real data within the variable, but do not preserve any of the relationships between them (univariate synthetic data).

Consequently, low-fidelity synthetic data is much less likely to inadvertently reproduce information about a real individual than high-fidelity synthetic data that mimics the real data much more closely. But can still be very useful to researchers in understanding the structure of the data and using it to generate code.

To date, there are a number of different methods to produce synthetic data, including an easy-to-use tool for the generation of low-fidelity synthetic data (PDF, 1.4MB), and some progress has been made to make synthetic versions of secure data available. Examples include synthetic versions of justice system datasets from the ADR UK-funded Data First programme at the Ministry of Justice, and synthetic versions of education datasets from the linked Grading and Admissions Data for England programme.

Yet we are far from seeing synthetic data operationalised to the point where TREs can make scale economies. And there is a lack of evidence to support decisions among data owners and data services about how the governance around this might be best implemented. Real-world use case studies on costs and benefits to consider more systematic approaches to the creation and sharing of low-fidelity synthetic data are therefore needed.

Supporting documents

Je-S guidance for applicants (PDF, 299KB)

Equality impact assessment (PDF, 209KB)

This is the website for UKRI: our seven research councils, Research England and Innovate UK. Let us know if you have feedback or would like to help improve our online products and services.