The Centre for Victorian Data Linkage (CVDL) is an expert data linkage and integration unit, located within the Department of Health and Human Services (DHHS). The CVDL was established in 2009, has undertaken more than 1000 data linkage projects and has been accredited as an integrating by the Commonwealth Government.
The Registry of Births, Deaths and Marriages, VicRoads and Victorian Electoral Commission are providing data to the CVDL to develop a trial population linkage spine (the Spine), with de-identified data, to support Victoria’s response to COVID-19.
Development of the Spine will improve the accuracy, speed and population coverage of linked data, which is critical to Victoria’s response to COVID-19.
Victoria’s electoral enrolment information has been made available under the ‘public interest’ test following a stringent process under Section 34 of the Electoral Act . This means that sharing the information benefits the public more than not sharing it. This is a once-off arrangement and excludes silent elector details.
Only de-identified data is used for health monitoring following strict privacy and data protection rules. This data will not be used to trace individual COVID-19 status or contacts of identified persons.
The electoral enrolment data is housed in a protected resource within the CVDL that is designed to prevent access to identities of individuals. There are security measures in place to ensure that the data can only be used in a de-identified and aggregated way.
This secure environment meets Office of the Victorian Information Commissioner’s Victorian Protective Data Security Standards to protect public sector information (which is utilised for Spine development).
The department has a privacy statement and guidelines for managing privacy incidents. Complaints and privacy concerns should be directed to the appropriate areas in DH, and where appropriate to Office of the Victorian Information Commissioner .
Any queries about the nature of this project should be addressed to CVDL at email@example.com.
Frequently Asked Questions
No. The data utilised for COVID19 health monitoring and response will be de-identified. That means any information that can identify you, such as your name, address and date of birth will be removed from the integrated dataset.
You will not get letters or phone calls or be traced in any way because of this work. In fact, there are penalties as severe as imprisonment for anyone who tries to use the data to re-identify any individual.
No. The Department of Health and Human Services (DHHS) had to submit an application, and seek approval under section 34 of the Electoral Act (Act).
The Act allows the VEC to make electoral roll information (excluding silent electors) available to organisations if it is in the public interest.
The process also ensured that governance, privacy and data protection requirements were met. That includes making sure that the information is protected from improper access or release.
Access to the de-identified integrated data resource is restricted to authorised personnel performing COVID19 health monitoring and response activities.
The electoral enrolment data is housed in a protected resource within the Centre for Victorian Data Linkage that is designed to prevent access to identities of individuals. There are security measures in place to ensure that the data can only be used in a de-identified and aggregated way.
Access to that data is restricted. Any form of misuse is an offence which can attract significant penalties including imprisonment.
The VEC is providing electoral details such as name, address, date of birth and gender.
CVDL uses these variables to link to records that belong to the same individual across multiple datasets to create cross-service client records. After the data is integrated it is de-identified and will only be used following strict privacy and data protection rules for COVID 19 surveillance and response.
The data will be managed and controlled by the CVDL, which is part of the Department of Health and Human Services (DHHS).
Data will be safe and secure. This secure environment meets Office of the Victorian Information Commissioner’s Victorian Protective Data Security Standards (VPDSS) to protect public sector information which is utilised for Spine development. DHHS works with sensitive information on a daily basis and the CVDL’s security systems exceed Commonwealth security standards.
The VEC is providing data to the CVDL to develop a trial population linkage spine to support Victoria’s response to COVID19.
The CVDL was established in 2009 to provide specialist data linkage and integration services and has since undertaken more than 1000 linkage and integration projects. For more information about these projects visit CVDL Project Webpage.
The CVDL will utilise the trial population linkage spine to undertake data linkage between COVID19 notifications, emergency department attendances, hospital admissions and deaths to enable DHHS to monitor the progress and impact of COVID19. The data is de-identified and there are penalties as severe as imprisonment for anyone who tries to use the data to re-identify any individual.
- Public interest is a measure that looks at whether sharing electoral roll information benefits the public more than not sharing it. The information is being shared because it will help deliver DHHS’s COVID19 pandemic surveillance and response.
- Applications are decided by the Victorian Electoral after consulting with the Victorian Information .
No. Only silent voters are exempt from having their information on the electoral roll.
Exclusion of groups or individuals can create a distorted or incorrect view of the population. By having the most comprehensive view possible of Victoria’s population, DHHS will be better able to effectively deliver DHHS’s COVID19 pandemic monitoring and response activities.
Data linkage is a technique for creating links within and between data sources for information that is thought to relate to the same person, place, family or event.
While the data linkage process initially uses identifying descriptors (like name and address) to create links, the linked information will only be used in a de-identified form following strict privacy and data protection rules.
Data linkage enables DHHS to:
- integrate information and evidence held within a dataset or in disparate datasets
- preserve important facts while protecting identity
- identify new and important questions
- identify patterns, trends, pathways and profiles
- better use administrative data as a resource – for research, policy development, service and program planning.
De-identified data does not have any personal information and cannot be used to identify individuals.
There are a number of ways data is de-identified and kept confidential, for example:
- instead of a person’s name appearing in a dataset, a random combination of numbers or letters may appear instead
- five-year age groups are provided rather than exact birth dates.
- instead of a person’s address appearing, the dataset might only show their general town, suburb or postcode.
Aggregated data is used to describe data that has been sorted and then summarised. This may be by percentage, proportion, trend etc.
As an example, aggregated data can be used to inform us that 50% of the population prefers vanilla ice cream to chocolate ice cream. It can also describe information in other ways, for example:
- an equal portion of the population prefers vanilla or chocolate where there were two flavours to choose from
- or, the researchers were not able to discern whether a greater portion of the population preferred vanilla or chocolate ice cream.
COVID-19 Linkage Project update
The aim of this project is to increase the accuracy, speed and population coverage of DHHS’s COVID-19 pandemic monitoring and response.
Phase One of the COVID-19 Linkage Project involves daily linkage of datasets currently held by DHHS. This project will build on the work previously undertaken which involved linkage of the same datasets of interest. The datasets include:
- Public Health Event Surveillance System (reports of notifiable infectious diseases such as COVID-19)
- Victorian Admitted Episodes Dataset (hospital inpatient episodes)
- Victorian Emergency Minimum Dataset (emergency department attendances)
- Victorian Death Index (deaths registrations from Registry of Births Deaths and Marriages)
- Better Patient data (patient identifying information from public health services)
Timeline: Phase One is underway with a minimum viable product completed in mid-April.
CVDL is developing a trial population linkage spine (the Spine), which will form the ongoing basis for the COVID-19 data linkage. The aim of Phase Two is to establish a stable dataset for enduring linkage and population coverage of DHHS’s COVID-19 pandemic surveillance and response. The datasets that will be included in the Spine include:
- Victorian Electoral Commissions (VEC) electoral enrolment information
- VicRoads drivers licence data
- Births from the Registry of Births, Deaths and Marriages (BDM)
The CVDL already holds Births data under a Memorandum of Understanding with BDM. The VEC and VicRoads datasets have been provided to CVDL as a one-off provision to support development of the Spine for COVID19 pandemic surveillance and response.
Rationale for the development of the Spine
The population spine has foundational implications for DHHS's data linkage capability on establishing a stable dataset for enduring linkage and improved 'population' coverage.
Stable dataset for enduring linkage
Enduring linkage refers to the concept that if linkage is run in year 1, an individual will be assigned the same Linkage ID if linkage is then run again in year 2 with new data Enduring linkage facilitates longitudinal research because consistency in linkage IDs over time enables more accurate understanding of pathways and outcomes over time of de-identified cohorts.
The CVDL’s capacity to ensure enduring linkage has previously been limited as the datasets available for linkage have been primarily health and human service delivery datasets which often have quality issues and limited population coverage. Without a population linkage spine, the CVDL has used a clustering linkage methodology. With each linkage map update, all 30 plus datasets are linked to each other using specialist linkage software to generate a linkage ID for each individual. This linkage ID changes with each update, currently on a six-month basis.
This methodology contrasts with the linkage spine methodology, where a small number of high-quality datasets with strong population coverage form the central basis for the linkage, with other datasets connecting to the linkage spine for linkage purposes. The CVDL has not previously been able to apply the linkage spine methodology due to lack of access to key high quality, high population coverage datasets, other than births data.
The consistency provided by a linkage spine is particularly important for the COVID-19 linkage as it will be undertaken very frequently and requires a very high level of accuracy for population health monitoring. The COVID-19 linkage is a bespoke program developed by the CVDL and Health Protection Branch, outside the CVDL’s usual linkage technology and software. The CVDL is currently upgrading its technical environment which will support more routine application of the population spine methodology for broader public benefits.
Traditional health and human services datasets are limited to collecting information about 'individuals' that have touched the system. That means while we collect information about our clients, we do not collect information about our 'potential' clients in the family or locality. This is the 'denominator' factor. A population spine allows more complete coverage of the Victorian population and is extremely valuable for COVID-19 geographic risk analysis. The datasets in the Victorian Population spine will not provide complete population coverage as not everyone living in Victoria will be captured in births, electoral enrolment and drivers licence data (for example, children born overseas or interstate). However, the population coverage will be much more complete than previously, and this will facilitate a broader range of critical research questions.
For example, while we are treating a patient in a hospital, we need to understand the possible risk in a geographic area, and to plan for the locality/area base response. Without the population spine, DHHS depends on census population based intensive care unit and beds planning. It is much more efficient to plan for projected requirements based on knowledge of locally based high-risk groups and a patient's relation networks.
Reviewed 18 November 2021