Title: | UK Epidemiological Data Management |
---|---|
Description: | Contains utilities and functions for the cleaning, processing and management of patient level public health data for surveillance and analysis held by the UK Health Security Agency, UKHSA. |
Authors: | Alex Bhattacharya [aut, cre], Alice Graham [aut] , Harshana Liyanage [aut] |
Maintainer: | Alex Bhattacharya <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.5 |
Built: | 2024-10-30 09:15:51 UTC |
Source: | https://github.com/alexbhatt/epidm |
A continuous inpatient (CIP) spell is a continuous period of care within the NHS, which does allow specific types of transfers to take place. It can therefore be made up of one or more provider spells. A CIP spell starts when a decision has been made to admit the patient, and a consultant has taken responsibility for their care. The spell ends when the patient dies or is discharged from hospital. This follows the NHS Digital Provider Spells Methodology: http://content.digital.nhs.uk/media/11859/Provider-Spells-Methodology/pdf/Spells_Methodology.pdf
cip_spells( x, group_vars, spell_start_date, admission_method, admission_source, spell_end_date, discharge_destination, patient_classification, .forceCopy = FALSE )
cip_spells( x, group_vars, spell_start_date, admission_method, admission_source, spell_end_date, discharge_destination, patient_classification, .forceCopy = FALSE )
x |
a data frame; will be converted to a data.table |
group_vars |
a vector containing any variables to be used for record grouping, minimum is a patient identifier |
spell_start_date |
Inpatient provider spell or episode admission date |
admission_method |
CDS admission method code |
admission_source |
CDS admission source code |
spell_end_date |
Inpatient provider spell or episode discharge date |
discharge_destination |
CDS discharge destination code |
patient_classification |
CDS patient classification code |
.forceCopy |
default FALSE; TRUE will force data.table to take a copy instead of editing the data without reference |
the original data.frame as a data.table with the following new fields:
cip_indx
an id field for the CIP spell
cip_spell_start
the start date for the CIP spell
cip_spell_end
the end date for the CIP spell
cip_test <- data.frame( id = c('465','465','465','465','8418','8418','8418', '8418','8418','8418','8418','8418','26443', '26443','26443','33299','33299','33299','33299', '33299','33299','33299','33299','33299','33299', '52635','52635','52635','52635','52635','52635', '52635','52635','52635','52635','52635','52635', '52635','52635','52635','52635','52635','52635', '52635','52635','52635','52635','52635','52635', '52635','52635','52635','78915','78915','78915'), provider = c('X1T','X1T','X1T','X1T','KHA','KHA','KHA', 'KHA','KHA','KHA','KHA','KHA','BX2','BX2', 'BX2','PXH','PXH','PXH','PXH','PXH','PXH', 'PXH','PXH','PXH','PXH','9HA','9HA','9HA', '9HA','9HA','9HA','9HA','9HA','9HA','9HA', '9HA','9HA','9HA','9HA','9HA','9HA','YYT', 'YYT','YYT','YYT','YYT','YYT','YYT','YYT', 'YYT','YYT','YYT','ABX','ABX','ABX'), spell_start = as.Date(c( '2020-03-07','2020-03-07','2020-03-25','2020-04-03','2020-01-25', '2020-01-26','2020-07-14','2020-08-02','2020-08-12','2020-08-19', '2020-08-19','2020-11-19','2019-11-12','2020-04-17','2020-04-23', '2020-07-03','2020-01-17','2020-02-07','2020-03-20','2020-04-27', '2020-06-21','2020-07-02','2020-10-17','2020-11-27','2021-01-02', '2019-12-31','2020-01-02','2020-01-14','2020-01-16','2020-02-07', '2020-02-11','2020-02-14','2020-02-18','2020-02-21','2020-02-25', '2020-02-28','2020-03-09','2020-03-11','2020-03-12','2020-03-13', '2020-03-14','2020-02-04','2020-02-07','2020-02-11','2020-02-14', '2020-02-18','2020-02-21','2020-02-25','2020-02-28','2020-03-09', '2020-03-11','2020-03-12','2020-04-16','2020-04-24','2020-05-13')), spell_end = as.Date(c( '2020-03-07','2020-03-25','2020-04-02','2020-04-27','2020-01-25', '2020-01-27','2020-07-17','2020-08-07','2020-08-14','2020-08-19', '2020-08-22','2020-12-16','2020-04-17','2020-04-23','2020-05-20', '2020-07-24','2020-01-28','2020-02-07','2020-03-23','2020-04-29', '2020-06-21','2020-07-03','2020-11-27','2021-01-02','2021-01-10', '2019-12-31','2020-01-11','2020-01-14','2020-02-04','2020-02-07', '2020-02-11','2020-02-14','2020-02-18','2020-02-21','2020-02-25', '2020-02-28','2020-03-09','2020-03-11','2020-03-12','2020-03-13', '2020-03-30','2020-02-07','2020-02-11','2020-02-14','2020-02-18', '2020-02-21','2020-02-25','2020-02-28','2020-03-09','2020-03-11', '2020-03-12','2020-03-13','2020-04-24','2020-05-13','2020-06-11')), adm_meth = c('21','81','21','81','21','21','11','21','21','21','21', '21','21','81','21','81','21','21','21','21','21','21', '21','13','13','12','22','12','2D','13','13','13','13', '13','13','13','13','13','13','13','21','81','81','81', '81','81','13','81','81','13','13','13','21','11','81'), adm_src = c('19','51','19','51','19','51','19','51','19','19','19', '51','19','51','19','51','19','19','19','19','19','19', '19','51','19','19','19','19','19','19','19','19','19', '19','19','19','51','51','51','51','19','51','51','51', '51','51','51','51','51','51','51','51','19','51','51'), dis_meth = c('1','1','1','1','1','1','1','1','1','1','1','4','1','1', '4','1','1','1','1','1','1','1','8','1','4','1','1','1', '1','1','1','1','1','1','1','1','1','1','1','1','1','1', '1','1','1','1','1','1','1','1','1','1','1','1','2'), dis_dest = c('51','51','51','54','51','19','19','19','19','51','19', '79','51','51','79','65','19','19','19','19','19','29', '98','51','79','19','19','19','51','19','19','19','51', '51','51','19','19','51','51','19','51','51','51','51', '51','51','51','51','51','51','51','51','29','54','19'), patclass = c('1','1','1','1','1','1','1','1','1','1','1','1','1','1', '1','1','1','1','1','1','1','1','1','1','1','2','1','2', '1','2','2','2','2','2','2','2','2','2','2','2','1','1', '1','1','1','1','1','1','1','1','1','1','1','1','1') ) cip_spells(x=cip_test, group_vars = c('id','provider'), patient_classification = 'patclass', spell_start_date = 'spell_start', admission_method = 'adm_meth', admission_source = 'adm_src', spell_end_date = 'spell_end', discharge_destination = 'dis_dest' )[]
cip_test <- data.frame( id = c('465','465','465','465','8418','8418','8418', '8418','8418','8418','8418','8418','26443', '26443','26443','33299','33299','33299','33299', '33299','33299','33299','33299','33299','33299', '52635','52635','52635','52635','52635','52635', '52635','52635','52635','52635','52635','52635', '52635','52635','52635','52635','52635','52635', '52635','52635','52635','52635','52635','52635', '52635','52635','52635','78915','78915','78915'), provider = c('X1T','X1T','X1T','X1T','KHA','KHA','KHA', 'KHA','KHA','KHA','KHA','KHA','BX2','BX2', 'BX2','PXH','PXH','PXH','PXH','PXH','PXH', 'PXH','PXH','PXH','PXH','9HA','9HA','9HA', '9HA','9HA','9HA','9HA','9HA','9HA','9HA', '9HA','9HA','9HA','9HA','9HA','9HA','YYT', 'YYT','YYT','YYT','YYT','YYT','YYT','YYT', 'YYT','YYT','YYT','ABX','ABX','ABX'), spell_start = as.Date(c( '2020-03-07','2020-03-07','2020-03-25','2020-04-03','2020-01-25', '2020-01-26','2020-07-14','2020-08-02','2020-08-12','2020-08-19', '2020-08-19','2020-11-19','2019-11-12','2020-04-17','2020-04-23', '2020-07-03','2020-01-17','2020-02-07','2020-03-20','2020-04-27', '2020-06-21','2020-07-02','2020-10-17','2020-11-27','2021-01-02', '2019-12-31','2020-01-02','2020-01-14','2020-01-16','2020-02-07', '2020-02-11','2020-02-14','2020-02-18','2020-02-21','2020-02-25', '2020-02-28','2020-03-09','2020-03-11','2020-03-12','2020-03-13', '2020-03-14','2020-02-04','2020-02-07','2020-02-11','2020-02-14', '2020-02-18','2020-02-21','2020-02-25','2020-02-28','2020-03-09', '2020-03-11','2020-03-12','2020-04-16','2020-04-24','2020-05-13')), spell_end = as.Date(c( '2020-03-07','2020-03-25','2020-04-02','2020-04-27','2020-01-25', '2020-01-27','2020-07-17','2020-08-07','2020-08-14','2020-08-19', '2020-08-22','2020-12-16','2020-04-17','2020-04-23','2020-05-20', '2020-07-24','2020-01-28','2020-02-07','2020-03-23','2020-04-29', '2020-06-21','2020-07-03','2020-11-27','2021-01-02','2021-01-10', '2019-12-31','2020-01-11','2020-01-14','2020-02-04','2020-02-07', '2020-02-11','2020-02-14','2020-02-18','2020-02-21','2020-02-25', '2020-02-28','2020-03-09','2020-03-11','2020-03-12','2020-03-13', '2020-03-30','2020-02-07','2020-02-11','2020-02-14','2020-02-18', '2020-02-21','2020-02-25','2020-02-28','2020-03-09','2020-03-11', '2020-03-12','2020-03-13','2020-04-24','2020-05-13','2020-06-11')), adm_meth = c('21','81','21','81','21','21','11','21','21','21','21', '21','21','81','21','81','21','21','21','21','21','21', '21','13','13','12','22','12','2D','13','13','13','13', '13','13','13','13','13','13','13','21','81','81','81', '81','81','13','81','81','13','13','13','21','11','81'), adm_src = c('19','51','19','51','19','51','19','51','19','19','19', '51','19','51','19','51','19','19','19','19','19','19', '19','51','19','19','19','19','19','19','19','19','19', '19','19','19','51','51','51','51','19','51','51','51', '51','51','51','51','51','51','51','51','19','51','51'), dis_meth = c('1','1','1','1','1','1','1','1','1','1','1','4','1','1', '4','1','1','1','1','1','1','1','8','1','4','1','1','1', '1','1','1','1','1','1','1','1','1','1','1','1','1','1', '1','1','1','1','1','1','1','1','1','1','1','1','2'), dis_dest = c('51','51','51','54','51','19','19','19','19','51','19', '79','51','51','79','65','19','19','19','19','19','29', '98','51','79','19','19','19','51','19','19','19','51', '51','51','19','19','51','51','19','51','51','51','51', '51','51','51','51','51','51','51','51','29','54','19'), patclass = c('1','1','1','1','1','1','1','1','1','1','1','1','1','1', '1','1','1','1','1','1','1','1','1','1','1','2','1','2', '1','2','2','2','2','2','2','2','2','2','2','2','1','1', '1','1','1','1','1','1','1','1','1','1','1','1','1') ) cip_spells(x=cip_test, group_vars = c('id','provider'), patient_classification = 'patclass', spell_start_date = 'spell_start', admission_method = 'adm_meth', admission_source = 'adm_src', spell_end_date = 'spell_end', discharge_destination = 'dis_dest' )[]
A convenience function to allow you to pull data from NHS, ONS and ODR assets
csv_from_zip(x)
csv_from_zip(x)
x |
a zip file from the web |
a zip file for ingestion into your chosen readr
## Not run: read.csv(csv_from_zip("https://files.digital.nhs.uk/assets/ods/current/succarc.zip")) ## End(Not run)
## Not run: read.csv(csv_from_zip("https://files.digital.nhs.uk/assets/ods/current/succarc.zip")) ## End(Not run)
A reference table of bacterial gram stain results by genus to allow faster filtering of bacterial results. This dataset has been maintained manually against the PHE SGSS database. If there are organisms missing, please raise and issue or push request on the epidm GitHub
genus_gram_stain
genus_gram_stain
A data frame with four columns
The bacterial genus
A character string to indicate POSITIVE or NEGATIVE type
A 0/1 flag to indicate if the genus is gram positive
A 0/1 flag to indicate if the genus is gram negative
In order to group A&E discharge destination from SNOWMED into human readable groups, a lookup table has been created. These work with Emergency Care Dataset (ECDS) data with the destination_code field to show where a patient goes after discharge from A&E.
group_ecds_discharge_destination
group_ecds_discharge_destination
the ECDS destination_code
the destination grouping as a human readable string
In order to group hospital inpatient admissions into human readable groups, a lookup table has been created. These work with Hospital Episode Statistics (HES) and Secondary Use Services (SUS) data with the admission_method fields.
group_inpatient_admission_method
group_inpatient_admission_method
the admission_method code
the admission_method grouping as a human readable string
In order to group hospital inpatient discharge destination into human readable groups, a lookup table has been created. These work with Hospital Episode Statistics (HES) and Secondary Use Services (SUS) data with the discharge_destination fields.
group_inpatient_discharge_destination
group_inpatient_discharge_destination
the discharge_destination code
the discharge_destination grouping as a human readable string
Group across multiple observations of overlapping time intervals, with defined start and end dates, or events within a static/fixed or rolling window of time. These are commonly used with inpatient HES/SUS data to group spells with defined start and end dates, or to group positive specimen tests, based on specimen dates together into infection episodes.
group_time( x, date_start, date_end, window, window_type = c("rolling", "static"), group_vars, indx_varname = "indx", min_varname = "date_min", max_varname = "date_max", .forceCopy = FALSE )
group_time( x, date_start, date_end, window, window_type = c("rolling", "static"), group_vars, indx_varname = "indx", min_varname = "date_min", max_varname = "date_max", .forceCopy = FALSE )
x |
data frame, this will be converted to a data.table |
date_start |
column containing the start dates for the grouping, provided quoted |
date_end |
column containing the end dates for the interval, quoted |
window |
an integer representing a time window in days which will be applied to the start date for grouping events |
window_type |
character, to determine if a 'rolling' or 'static' grouping method should be used when grouping events |
group_vars |
in a vector, the all columns used to group records, quoted |
indx_varname |
a character string to set variable name for the index column which provides a grouping key; default is indx |
min_varname |
a character string to set variable name for the time period minimum |
max_varname |
a character string set variable name for the time period maximum |
.forceCopy |
default FALSE; TRUE will force data.table to take a copy instead of editing the data without reference |
the original data.frame as a data.table with the following new fields:
indx
; renamed using indx_varname
an id field for the new
aggregated events/intervals; note that where the date_start
is NA, an
indx
value will also be NA
min_date
; renamed using min_varname
the start date for the aggregated events/intervals
max_date
; renamed using max_varname
the end date for the aggregated events/intervals
episode_test <- structure( list( pat_id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), species = c(rep("E. coli",7),rep("K. pneumonia",7)), spec_type = c(rep("Blood",7),rep("Blood",4),rep("Sputum",3)), sp_date = structure(c(18262, 18263, 18281, 18282, 18262, 18263, 18281, 18265, 18270, 18281, 18283, 18259, 18260, 18281), class = "Date") ), row.names = c(NA, -14L), class = "data.frame") group_time(x=episode_test, date_start='sp_date', window=14, window_type = 'static', indx_varname = 'static_indx', group_vars=c('pat_id','species','spec_type'))[] spell_test <- data.frame( id = c(rep(99,6),rep(88,4),rep(3,3)), provider = c("YXZ",rep("ZXY",5),rep("XYZ",4),rep("YZX",3)), spell_start = as.Date( c( "2020-03-01", "2020-07-07", "2020-02-08", "2020-04-28", "2020-03-15", "2020-07-01", "2020-01-01", "2020-01-12", "2019-12-25", "2020-03-28", "2020-01-01", rep(NA,2) ) ), spell_end = as.Date( c( "2020-03-10", "2020-07-26", "2020-05-22", "2020-04-30", "2020-05-20", "2020-07-08", "2020-01-23", "2020-03-30", "2020-01-02", "2020-04-20", "2020-01-01", rep(NA,2) ) ) ) group_time(x = spell_test, date_start = 'spell_start', date_end = 'spell_end', group_vars = c('id','provider'), indx_varname = 'spell_id', min_varname = 'spell_min_date', max_varname = 'spell_max_date')[]
episode_test <- structure( list( pat_id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), species = c(rep("E. coli",7),rep("K. pneumonia",7)), spec_type = c(rep("Blood",7),rep("Blood",4),rep("Sputum",3)), sp_date = structure(c(18262, 18263, 18281, 18282, 18262, 18263, 18281, 18265, 18270, 18281, 18283, 18259, 18260, 18281), class = "Date") ), row.names = c(NA, -14L), class = "data.frame") group_time(x=episode_test, date_start='sp_date', window=14, window_type = 'static', indx_varname = 'static_indx', group_vars=c('pat_id','species','spec_type'))[] spell_test <- data.frame( id = c(rep(99,6),rep(88,4),rep(3,3)), provider = c("YXZ",rep("ZXY",5),rep("XYZ",4),rep("YZX",3)), spell_start = as.Date( c( "2020-03-01", "2020-07-07", "2020-02-08", "2020-04-28", "2020-03-15", "2020-07-01", "2020-01-01", "2020-01-12", "2019-12-25", "2020-03-28", "2020-01-01", rep(NA,2) ) ), spell_end = as.Date( c( "2020-03-10", "2020-07-26", "2020-05-22", "2020-04-30", "2020-05-20", "2020-07-08", "2020-01-23", "2020-03-30", "2020-01-02", "2020-04-20", "2020-01-01", rep(NA,2) ) ) ) group_time(x = spell_test, date_start = 'spell_start', date_end = 'spell_end', group_vars = c('id','provider'), indx_varname = 'spell_id', min_varname = 'spell_min_date', max_varname = 'spell_max_date')[]
This function helps to determine when a patient has been in hospital across spell aggregation. When retaining the final record the following criteria is used:
Current admissions take priority
When conflicting on the same day, inpatient admissions take priority over A&E emergency care data
Where a patient has a linked A&E admission to a hospital inpatient stay, the A&E admission date is used
Where a patient has a positive test between two hospital stays the most recent completed hospital stay prior to the test is retained except if the time between these events is greater than 14 days, then the first admission following the test is retained
hospital_in_out_dates( data, person_id = "id", hospital = list(org_code = "organisation_code_of_provider", event_date = "ev_date", ae_arrive = "arrival_date", ae_depart = "departure_date", ae_discharge = "ecds_discharge", in_spell_start = "spell_start_date", in_spell_end = "spell_end_date", in_discharge = "discharge_destination") )
hospital_in_out_dates( data, person_id = "id", hospital = list(org_code = "organisation_code_of_provider", event_date = "ev_date", ae_arrive = "arrival_date", ae_depart = "departure_date", ae_discharge = "ecds_discharge", in_spell_start = "spell_start_date", in_spell_end = "spell_end_date", in_discharge = "discharge_destination") )
data |
the linked asset holding A&E and Inpatient data |
person_id |
the column containing the unique patient ID |
hospital |
a list containing the following items
|
new date columns on the data.table for hospital_in
and hospital_out
and hospital_event_rank
epidm::lookup_recode()
epidm::group_time()
epidm::cip_spells()
## Not run: hospital_in_out_dates(link, person_id = 'id', hospital = list( org_code = 'organisation_code_of_provider', event_date = 'ev_date', ae_arrive = 'arrival_date', ae_depart = 'departure_date', ae_discharge = 'ecds_discharge', in_spell_start = 'spell_start_date', in_spell_end = 'spell_end_date', in_discharge = 'discharge_destination' ))[] ## End(Not run)
## Not run: hospital_in_out_dates(link, person_id = 'id', hospital = list( org_code = 'organisation_code_of_provider', event_date = 'ev_date', ae_arrive = 'arrival_date', ae_depart = 'departure_date', ae_discharge = 'ecds_discharge', in_spell_start = 'spell_start_date', in_spell_end = 'spell_end_date', in_discharge = 'discharge_destination' ))[] ## End(Not run)
When HES/SUS ICD/OPCS codes are provided in wide format you may want to clean them up into long for easier analysis. This function helps by reshaping long as a separate table. Ensuring they're separate allows you to retain source data, and aggregate appropriately later.
inpatient_codes( x, field_strings, patient_id_vars, type = c("icd9", "icd10", "opcs"), .forceCopy = FALSE )
inpatient_codes( x, field_strings, patient_id_vars, type = c("icd9", "icd10", "opcs"), .forceCopy = FALSE )
x |
a data.frame or data.table containing inpatient data |
field_strings |
a vector or string containing the regex for the the columns |
patient_id_vars |
a vector containing colnames used to identify a patient episode or spell |
type |
a string to denote if the codes are diagnostic or procedural |
.forceCopy |
default FALSE; TRUE will force data.table to take a copy instead of editing the data without reference |
a separate table with codes and id in long form
inpatient_test <- data.frame( id = c(1053L,5487L,8180L,528L,1085L,344L,2021L,2040L, 6504L,10867L,12411L,7917L,2950L,2812L,7757L,12227L,2675L, 8548L,536L,11830L,12708L,10421L,5503L,2494L,14001L), spell_id = c("dwPDw","iSpUq","qpgk5","8vrJ1","BAur9","l6LZk", "KJllb","tgZID","fJkh8","Y9IPv","DAlUZ", "9Ooc4","hUxGn","wtMG9","dw3dO","cz3fI", "gdxZK","npplb","tynBh","Uu0Sd","gV1Ac", "vOpA1","ttlcD","Fqo29","ivTmN"), primary_diagnosis_code = c("K602","U071-","I501","U071 ","J22X","J189", "J189","I951","N130","U071","K510 D",NA, "G409-","C780","N185","J955","K573","U071", "I330","L309","M513","U071","A419","U071", "N185-"), secondary_diagnosis_code_1 = c("K641","J128-","I489","J128 ","Q348","F059", "R296","R296","N131","J128","M0750A",NA, "R401-","C782","Z491","C321","D125","J128", "B952","J459","M4780","B972","N390","J128", "Z491-"), secondary_diagnosis_code_2 = c("E039","B972-","I10X","L031 ","Z115","I509", "F051","I251","K862","B972","K590-",NA, "E876-","C798","N085","Z938","I209","B972", "I214","Z880","M8588","R296","B962","B972", NA), secondary_diagnosis_code_3 = c("I422","J9691","E119","I489 ","D509","I489", "D509","I252","T391","J440","R21X-",NA, "R945-","E119","M310","I480","I252","J9690", "E111",NA,"Z115","R410","J181","Z518",NA), secondary_diagnosis_code_4 = c(NA,"I10X-","E669","E109 ","K219","Z921","I251", "I259","R458","B972","F200-",NA,"E039-", "I10X",NA,"I500","F171","I489","E162",NA, "I480","M2551","L892","E86X",NA), secondary_diagnosis_code_5 = c(NA,"E119-","J449","F03X ",NA,"Z518","I252", "I209","C61X","A419","R761-",NA,"E119-", "K219",NA,"Z115","F329","N179","N179",NA, "H353","Z638","L033","R54X",NA), secondary_diagnosis_code_6 = c(NA,NA,"Z966","I10X ",NA,"N179","N183","Z115", "K627","N390",NA,NA,"J459-","M4780",NA, "Z900",NA,"I10X","R34X",NA,"I951","I10X", "D510","F059",NA), secondary_diagnosis_code_7 = c(NA,NA,"Z854","I679 ",NA,"N183","Z951","M190", "R634","L031",NA,NA,"I10X-","M512",NA, "Z921",NA,"E119","I959",NA,"H903","I678", "K639","F03X",NA), secondary_diagnosis_code_8 = c(NA,NA,"Z864","J459 ",NA,"E115","E119","N183", "E111","E871",NA,NA,"R51X-","H409",NA, "Z870",NA,NA,"J90X",NA,"M199","J459", "N133","F29X",NA), secondary_diagnosis_code_9 = c(NA,NA,"Z921","R296 ",NA,"L97X","I10X","M4806", "E114","S099",NA,NA,"Q070-","H544",NA, NA,NA,NA,"I501",NA,"K811","F03X","J90X", "N189",NA), secondary_diagnosis_code_10 = c(NA,NA,NA,"Z921 ",NA,"L089","Z921","N40X", "G590","R296",NA,NA,"E668-","Z858",NA,NA,NA, NA,"I489",NA,"K219","G20X","N202", "F719",NA), secondary_diagnosis_code_11 = c(NA,NA,NA,"Z515 ",NA,"R02X","Z507","Z864", "E162","I489",NA,NA,"G473-","Z923",NA,NA,NA, NA,"I447",NA,"J459","E119","L031", "Z960",NA), secondary_diagnosis_code_12 = c(NA,NA,NA,"Z501 ",NA,"B370","K579","Z955", "E46X","Z921",NA,NA,"R600-","Z926",NA,NA,NA, NA,"E86X",NA,"I10X",NA,"J981","Z922", NA), secondary_diagnosis_code_13 = c(NA,NA,NA,"Z507 ",NA,"E039","M109",NA,"I259", "K709",NA,NA,"M1999","Z895",NA,NA,NA,NA, "R33X",NA,"J40X",NA,"E119",NA,NA), secondary_diagnosis_code_14 = c(NA,NA,NA,NA,NA,NA,"J459",NA,"N131","Z864",NA, NA,"R468-","Z902",NA,NA,NA,NA,"R296", NA,NA,NA,"I739",NA,NA), secondary_diagnosis_code_15 = c(NA,NA,NA,NA,NA,NA,"Z880",NA,"K862","Z501",NA, NA,"Z115-","Z971",NA,NA,NA,NA,"R468", NA,NA,NA,"N183",NA,NA), secondary_diagnosis_code_16 = c(NA,NA,NA,NA,NA,NA,"Z867",NA,"T391","Z505",NA, NA,"Z501-","Z878",NA,NA,NA,NA,"R31X", NA,NA,NA,"I489",NA,NA), secondary_diagnosis_code_17 = c(NA,NA,NA,NA,NA,NA,"Z864",NA,"R458","Z518",NA, NA,"Z507-","Z958",NA,NA,NA,NA,"Z115", NA,NA,NA,"M549",NA,NA), secondary_diagnosis_code_18 = c(NA,NA,NA,NA,NA,NA,"F03X",NA,"C61X",NA,NA,NA, NA,"Z867",NA,NA,NA,NA,"I252",NA,NA, NA,"I252",NA,NA), secondary_diagnosis_code_19 = c(NA,NA,NA,NA,NA,NA,NA,NA,"K627",NA,NA,NA,NA, "Z864",NA,NA,NA,NA,"I259",NA,NA,NA, "I259",NA,NA), secondary_diagnosis_code_20 = c(NA,NA,NA,NA,NA,NA,NA,NA,"R634",NA,NA,NA,NA, "Z880",NA,NA,NA,NA,"I10X",NA,NA,NA, "E669",NA,NA), secondary_diagnosis_code_21 = c(NA,NA,NA,NA,NA,NA,NA,NA,"E111",NA,NA,NA,NA, "Z800",NA,NA,NA,NA,"I352",NA,NA,NA, "Z867",NA,NA), secondary_diagnosis_code_22 = c(NA,NA,NA,NA,NA,NA,NA,NA,"E114",NA,NA,NA,NA, "Z801",NA,NA,NA,NA,"R15X",NA,NA,NA, "Z896",NA,NA), secondary_diagnosis_code_23 = c(NA,NA,NA,NA,NA,NA,NA,NA,"G590",NA,NA,NA,NA, NA,NA,NA,NA,NA,"R32X",NA,NA,NA, "Z960",NA,NA), secondary_diagnosis_code_24 = c(NA,NA,NA,NA,NA,NA,NA,NA,"E162",NA,NA,NA,NA, NA,NA,NA,NA,NA,"R418",NA,NA,NA, "Z874",NA,NA), primary_procedure_code = c("H289",NA,"K634",NA,"X292",NA,NA,NA,NA,NA, "H251",NA,"U051","L913","X403",NA,"H231", "U071","M473","X384",NA,NA,NA,NA,"X403"), primary_procedure_date = c("20170730",NA,"20201202",NA,"20170914",NA,NA,NA, NA,NA,"20210105",NA,"20170724", "20210111","20171114",NA,"20170622","20210104", "20171013","20170313",NA,NA,NA,NA, "20171107"), secondary_procedure_code_1 = c("H626",NA,"Y534",NA,"U297",NA,NA,NA,NA,NA, "Z286",NA,"Y981","Y031",NA,NA,"Z286", "Y981",NA,NA,NA,NA,NA,NA,NA), secondary_procedure_date_1 = c("20170730",NA,"20201202",NA,"20170928",NA,NA,NA, NA,NA,"20210105",NA,"20170724", "20210111",NA,NA,"20170622","20210104",NA,NA,NA, NA,NA,NA,NA), secondary_procedure_code_2 = c("H444",NA,"Z941",NA,NA,NA,NA,NA,NA,NA,NA,NA, "U212",NA,NA,NA,NA,NA,NA,NA,NA,NA, NA,NA,NA), secondary_procedure_date_2 = c("20170730",NA,"20201202",NA,NA,NA,NA,NA,NA,NA, NA,NA,"20170729",NA,NA,NA,NA,NA,NA, NA,NA,NA,NA,NA,NA), secondary_procedure_code_3 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"Y973", NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA), secondary_procedure_date_3 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, "20170729",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA,NA), secondary_procedure_code_4 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"Y982", NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA), secondary_procedure_date_4 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, "20170729",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA,NA), secondary_procedure_code_5 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"Z926", NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA), secondary_procedure_date_5 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, "20170729",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA,NA), secondary_procedure_code_6 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"O161", NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA), secondary_procedure_date_6 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, "20170729",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA,NA) ) inpatient_codes(x=inpatient_test, field_strings='diagnosis', patient_id_vars = c('id','spell_id'), type = 'icd10') inpatient_codes(x=inpatient_test, field_strings=c('procedure_code','procedure_date'), patient_id_vars = c('id','spell_id'), type = 'opcs')
inpatient_test <- data.frame( id = c(1053L,5487L,8180L,528L,1085L,344L,2021L,2040L, 6504L,10867L,12411L,7917L,2950L,2812L,7757L,12227L,2675L, 8548L,536L,11830L,12708L,10421L,5503L,2494L,14001L), spell_id = c("dwPDw","iSpUq","qpgk5","8vrJ1","BAur9","l6LZk", "KJllb","tgZID","fJkh8","Y9IPv","DAlUZ", "9Ooc4","hUxGn","wtMG9","dw3dO","cz3fI", "gdxZK","npplb","tynBh","Uu0Sd","gV1Ac", "vOpA1","ttlcD","Fqo29","ivTmN"), primary_diagnosis_code = c("K602","U071-","I501","U071 ","J22X","J189", "J189","I951","N130","U071","K510 D",NA, "G409-","C780","N185","J955","K573","U071", "I330","L309","M513","U071","A419","U071", "N185-"), secondary_diagnosis_code_1 = c("K641","J128-","I489","J128 ","Q348","F059", "R296","R296","N131","J128","M0750A",NA, "R401-","C782","Z491","C321","D125","J128", "B952","J459","M4780","B972","N390","J128", "Z491-"), secondary_diagnosis_code_2 = c("E039","B972-","I10X","L031 ","Z115","I509", "F051","I251","K862","B972","K590-",NA, "E876-","C798","N085","Z938","I209","B972", "I214","Z880","M8588","R296","B962","B972", NA), secondary_diagnosis_code_3 = c("I422","J9691","E119","I489 ","D509","I489", "D509","I252","T391","J440","R21X-",NA, "R945-","E119","M310","I480","I252","J9690", "E111",NA,"Z115","R410","J181","Z518",NA), secondary_diagnosis_code_4 = c(NA,"I10X-","E669","E109 ","K219","Z921","I251", "I259","R458","B972","F200-",NA,"E039-", "I10X",NA,"I500","F171","I489","E162",NA, "I480","M2551","L892","E86X",NA), secondary_diagnosis_code_5 = c(NA,"E119-","J449","F03X ",NA,"Z518","I252", "I209","C61X","A419","R761-",NA,"E119-", "K219",NA,"Z115","F329","N179","N179",NA, "H353","Z638","L033","R54X",NA), secondary_diagnosis_code_6 = c(NA,NA,"Z966","I10X ",NA,"N179","N183","Z115", "K627","N390",NA,NA,"J459-","M4780",NA, "Z900",NA,"I10X","R34X",NA,"I951","I10X", "D510","F059",NA), secondary_diagnosis_code_7 = c(NA,NA,"Z854","I679 ",NA,"N183","Z951","M190", "R634","L031",NA,NA,"I10X-","M512",NA, "Z921",NA,"E119","I959",NA,"H903","I678", "K639","F03X",NA), secondary_diagnosis_code_8 = c(NA,NA,"Z864","J459 ",NA,"E115","E119","N183", "E111","E871",NA,NA,"R51X-","H409",NA, "Z870",NA,NA,"J90X",NA,"M199","J459", "N133","F29X",NA), secondary_diagnosis_code_9 = c(NA,NA,"Z921","R296 ",NA,"L97X","I10X","M4806", "E114","S099",NA,NA,"Q070-","H544",NA, NA,NA,NA,"I501",NA,"K811","F03X","J90X", "N189",NA), secondary_diagnosis_code_10 = c(NA,NA,NA,"Z921 ",NA,"L089","Z921","N40X", "G590","R296",NA,NA,"E668-","Z858",NA,NA,NA, NA,"I489",NA,"K219","G20X","N202", "F719",NA), secondary_diagnosis_code_11 = c(NA,NA,NA,"Z515 ",NA,"R02X","Z507","Z864", "E162","I489",NA,NA,"G473-","Z923",NA,NA,NA, NA,"I447",NA,"J459","E119","L031", "Z960",NA), secondary_diagnosis_code_12 = c(NA,NA,NA,"Z501 ",NA,"B370","K579","Z955", "E46X","Z921",NA,NA,"R600-","Z926",NA,NA,NA, NA,"E86X",NA,"I10X",NA,"J981","Z922", NA), secondary_diagnosis_code_13 = c(NA,NA,NA,"Z507 ",NA,"E039","M109",NA,"I259", "K709",NA,NA,"M1999","Z895",NA,NA,NA,NA, "R33X",NA,"J40X",NA,"E119",NA,NA), secondary_diagnosis_code_14 = c(NA,NA,NA,NA,NA,NA,"J459",NA,"N131","Z864",NA, NA,"R468-","Z902",NA,NA,NA,NA,"R296", NA,NA,NA,"I739",NA,NA), secondary_diagnosis_code_15 = c(NA,NA,NA,NA,NA,NA,"Z880",NA,"K862","Z501",NA, NA,"Z115-","Z971",NA,NA,NA,NA,"R468", NA,NA,NA,"N183",NA,NA), secondary_diagnosis_code_16 = c(NA,NA,NA,NA,NA,NA,"Z867",NA,"T391","Z505",NA, NA,"Z501-","Z878",NA,NA,NA,NA,"R31X", NA,NA,NA,"I489",NA,NA), secondary_diagnosis_code_17 = c(NA,NA,NA,NA,NA,NA,"Z864",NA,"R458","Z518",NA, NA,"Z507-","Z958",NA,NA,NA,NA,"Z115", NA,NA,NA,"M549",NA,NA), secondary_diagnosis_code_18 = c(NA,NA,NA,NA,NA,NA,"F03X",NA,"C61X",NA,NA,NA, NA,"Z867",NA,NA,NA,NA,"I252",NA,NA, NA,"I252",NA,NA), secondary_diagnosis_code_19 = c(NA,NA,NA,NA,NA,NA,NA,NA,"K627",NA,NA,NA,NA, "Z864",NA,NA,NA,NA,"I259",NA,NA,NA, "I259",NA,NA), secondary_diagnosis_code_20 = c(NA,NA,NA,NA,NA,NA,NA,NA,"R634",NA,NA,NA,NA, "Z880",NA,NA,NA,NA,"I10X",NA,NA,NA, "E669",NA,NA), secondary_diagnosis_code_21 = c(NA,NA,NA,NA,NA,NA,NA,NA,"E111",NA,NA,NA,NA, "Z800",NA,NA,NA,NA,"I352",NA,NA,NA, "Z867",NA,NA), secondary_diagnosis_code_22 = c(NA,NA,NA,NA,NA,NA,NA,NA,"E114",NA,NA,NA,NA, "Z801",NA,NA,NA,NA,"R15X",NA,NA,NA, "Z896",NA,NA), secondary_diagnosis_code_23 = c(NA,NA,NA,NA,NA,NA,NA,NA,"G590",NA,NA,NA,NA, NA,NA,NA,NA,NA,"R32X",NA,NA,NA, "Z960",NA,NA), secondary_diagnosis_code_24 = c(NA,NA,NA,NA,NA,NA,NA,NA,"E162",NA,NA,NA,NA, NA,NA,NA,NA,NA,"R418",NA,NA,NA, "Z874",NA,NA), primary_procedure_code = c("H289",NA,"K634",NA,"X292",NA,NA,NA,NA,NA, "H251",NA,"U051","L913","X403",NA,"H231", "U071","M473","X384",NA,NA,NA,NA,"X403"), primary_procedure_date = c("20170730",NA,"20201202",NA,"20170914",NA,NA,NA, NA,NA,"20210105",NA,"20170724", "20210111","20171114",NA,"20170622","20210104", "20171013","20170313",NA,NA,NA,NA, "20171107"), secondary_procedure_code_1 = c("H626",NA,"Y534",NA,"U297",NA,NA,NA,NA,NA, "Z286",NA,"Y981","Y031",NA,NA,"Z286", "Y981",NA,NA,NA,NA,NA,NA,NA), secondary_procedure_date_1 = c("20170730",NA,"20201202",NA,"20170928",NA,NA,NA, NA,NA,"20210105",NA,"20170724", "20210111",NA,NA,"20170622","20210104",NA,NA,NA, NA,NA,NA,NA), secondary_procedure_code_2 = c("H444",NA,"Z941",NA,NA,NA,NA,NA,NA,NA,NA,NA, "U212",NA,NA,NA,NA,NA,NA,NA,NA,NA, NA,NA,NA), secondary_procedure_date_2 = c("20170730",NA,"20201202",NA,NA,NA,NA,NA,NA,NA, NA,NA,"20170729",NA,NA,NA,NA,NA,NA, NA,NA,NA,NA,NA,NA), secondary_procedure_code_3 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"Y973", NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA), secondary_procedure_date_3 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, "20170729",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA,NA), secondary_procedure_code_4 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"Y982", NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA), secondary_procedure_date_4 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, "20170729",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA,NA), secondary_procedure_code_5 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"Z926", NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA), secondary_procedure_date_5 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, "20170729",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA,NA), secondary_procedure_code_6 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"O161", NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA), secondary_procedure_date_6 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, "20170729",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA,NA) ) inpatient_codes(x=inpatient_test, field_strings='diagnosis', patient_id_vars = c('id','spell_id'), type = 'icd10') inpatient_codes(x=inpatient_test, field_strings=c('procedure_code','procedure_date'), patient_id_vars = c('id','spell_id'), type = 'opcs')
A dataset containing synthetic lab data for testing epidemiological data transformation functions.
data(lab_data)
data(lab_data)
A data frame with the following columns:
NHS number
Patient identifier such as hospital number
Date of birth of the patients.
Gender of the patients (Factor with levels: "Female", "Male").
Patient surname
Patient forename
Organism species name (Factor with levels: "KLEBSIELLA PNEUMONIAE").
Date of specimen collection.
Type of specimen: BLOOD or URINE.
Laboratory codes (Factor with unique levels).
Name of the local authority.
Code of the local authority.
Postcode
data(lab_data) head(lab_data)
data(lab_data) head(lab_data)
A function to call an epidm lookup table and recode where we are aware of a new value.
Built in are the organism re-classifications and specimen_type groupings and a manual mode.
lookup_recode( src, type = c("species", "specimen", "inpatient_admission_method", "inpatient_discharge_destination", "ecds_destination_code", "manual"), .import = NULL )
lookup_recode( src, type = c("species", "specimen", "inpatient_admission_method", "inpatient_discharge_destination", "ecds_destination_code", "manual"), .import = NULL )
src |
a character, vector or column containing the value(s) to be referenced |
type |
a character value to denote the lookup table used |
.import |
a list in the order list(new,old) containing the values for another lookup table existing in the environment |
a list object of the recoded field
df <- data.frame( spec = c( sample(grep(")", respeciate_organism$previous_organism_name, value=TRUE, invert = TRUE), 9), "ESCHERICHIA COLI","SARS-COV-2","CANDIDA AUREUS"), type = sample(specimen_type_grouping$specimen_type,12), date = sample(seq.Date(from = Sys.Date()-365, to = Sys.Date(), by = "day"),12) ) df <- df[order(df$date),] # show the data before the changes df # check the lookup tables # observe the changes head(respeciate_organism[1:2]) df$species <- lookup_recode(df$spec,'species') df[,c('spec','species','date')] head(specimen_type_grouping) df$grp <- lookup_recode(df$type,'specimen') df[,c('species','type','grp','date')] # for a tidyverse use # df %>% mutate(spec=lookup_recode(spec,'species)) # manual input of your own lookup # .import=list(new,old) lookup_recode( "ALCALIGENES DENITRIFICANS", type = 'manual', .import=list(respeciate_organism$organism_species_name, respeciate_organism$previous_organism_name) )
df <- data.frame( spec = c( sample(grep(")", respeciate_organism$previous_organism_name, value=TRUE, invert = TRUE), 9), "ESCHERICHIA COLI","SARS-COV-2","CANDIDA AUREUS"), type = sample(specimen_type_grouping$specimen_type,12), date = sample(seq.Date(from = Sys.Date()-365, to = Sys.Date(), by = "day"),12) ) df <- df[order(df$date),] # show the data before the changes df # check the lookup tables # observe the changes head(respeciate_organism[1:2]) df$species <- lookup_recode(df$spec,'species') df[,c('spec','species','date')] head(specimen_type_grouping) df$grp <- lookup_recode(df$type,'specimen') df[,c('species','type','grp','date')] # for a tidyverse use # df %>% mutate(spec=lookup_recode(spec,'species)) # manual input of your own lookup # .import=list(new,old) lookup_recode( "ALCALIGENES DENITRIFICANS", type = 'manual', .import=list(respeciate_organism$organism_species_name, respeciate_organism$previous_organism_name) )
Correcting for missing end dates on HES/SUS episodes
proxy_episode_dates( x, group_vars, spell_start_date, spell_end_date, discharge_destination, .dropTmp = TRUE, .forceCopy = FALSE )
proxy_episode_dates( x, group_vars, spell_start_date, spell_end_date, discharge_destination, .dropTmp = TRUE, .forceCopy = FALSE )
x |
a data frame; will be converted to a data.table |
group_vars |
a vector containing any variables to be used for record grouping, minimum is a patient identifier |
spell_start_date |
Inpatient provider spell or episode admission date |
spell_end_date |
Inpatient provider spell or episode discharge date |
discharge_destination |
CDS discharge destination code |
.dropTmp |
default TRUE; a logical to drop all tmp values used |
.forceCopy |
default FALSE; TRUE will force data.table to take a copy instead of editing the data without reference |
a data.table with cleaned start and end dates, and an indicator proxy_missing where the value has changed
proxy_test <- data.frame( id = c( rep(3051, 4), rep(7835,3), rep(9891,3), rep(1236,3) ), provider = c( rep("QKJ", 4), rep("JSD",3), rep("YJG",3), rep("LJG",3) ), spell_start = as.Date(c( "2020-07-03", "2020-07-14", "2020-07-23", "2020-08-05", "2020-11-01", "2020-11-13", "2020-12-01", "2020-03-28", "2020-04-06", "2020-04-09", "2020-10-06", "2020-11-05", "2020-12-25" )), spell_end = as.Date(c( "2020-07-11", "2020-07-22", "2020-07-30", "2020-07-30", "2020-11-11", NA, "2020-12-03", "2020-03-28", NA, "2020-04-09", "2020-10-06", "2020-11-05", NA )), disdest = c( 19, 19, 51, 19, 19, 19, 19, 51, 98, 19, 19, 19, 98 ) ) proxy_episode_dates( x=proxy_test, group_vars = c('id','provider'), spell_start_date = 'spell_start', spell_end_date = 'spell_end', discharge_destination = 'disdest' )[]
proxy_test <- data.frame( id = c( rep(3051, 4), rep(7835,3), rep(9891,3), rep(1236,3) ), provider = c( rep("QKJ", 4), rep("JSD",3), rep("YJG",3), rep("LJG",3) ), spell_start = as.Date(c( "2020-07-03", "2020-07-14", "2020-07-23", "2020-08-05", "2020-11-01", "2020-11-13", "2020-12-01", "2020-03-28", "2020-04-06", "2020-04-09", "2020-10-06", "2020-11-05", "2020-12-25" )), spell_end = as.Date(c( "2020-07-11", "2020-07-22", "2020-07-30", "2020-07-30", "2020-11-11", NA, "2020-12-03", "2020-03-28", NA, "2020-04-09", "2020-10-06", "2020-11-05", NA )), disdest = c( 19, 19, 51, 19, 19, 19, 19, 51, 98, 19, 19, 19, 98 ) ) proxy_episode_dates( x=proxy_test, group_vars = c('id','provider'), spell_start_date = 'spell_start', spell_end_date = 'spell_end', discharge_destination = 'disdest' )[]
Some samples within SGSS are submitted by laboratories as "GENUS SP" or "GENUS UNNAMED". However, they may also have a fully identified sample taken from the same site within a recent time period. This function captures species_col from another sample within X-days of an unspeciated isolate.
respeciate_generic( x, group_vars, species_col, date_col, window = c(0:Inf), .forceCopy = FALSE )
respeciate_generic( x, group_vars, species_col, date_col, window = c(0:Inf), .forceCopy = FALSE )
x |
a data.frame or data.table object |
group_vars |
the minimum grouping set of variables for like samples in a character vector; suggest c('patient_id','specimen_type','genus') |
species_col |
a character containing the column with the organism species_col name |
date_col |
a character containing the column with the specimen/sample date_col |
window |
an integer representing the number of days for which you will allow a sample to be respeciated |
.forceCopy |
default FALSE; TRUE will force data.table to take a copy instead of editing the data without reference |
a data.table with a recharacterised species_col
column
df <- data.frame( ptid = c(round(runif(25,1,5))), spec = sample(c("KLEBSIELLA SP", "KLEBSIELLA UNNAMED", "KLEBSIELLA PNEUMONIAE", "KLEBEIELLA OXYTOCA"), 25,replace = TRUE), type = "BLOOD", specdate = sample(seq.Date(Sys.Date()-21,Sys.Date(),"day"),25,replace = TRUE) ) respeciate_generic(x=df, group_vars=c('ptid','type'), species_col='spec', date_col='specdate', window = 14)[]
df <- data.frame( ptid = c(round(runif(25,1,5))), spec = sample(c("KLEBSIELLA SP", "KLEBSIELLA UNNAMED", "KLEBSIELLA PNEUMONIAE", "KLEBEIELLA OXYTOCA"), 25,replace = TRUE), type = "BLOOD", specdate = sample(seq.Date(Sys.Date()-21,Sys.Date(),"day"),25,replace = TRUE) ) respeciate_generic(x=df, group_vars=c('ptid','type'), species_col='spec', date_col='specdate', window = 14)[]
Occasionally, research shows that two organisms, previously thought to be different are in fact one and the same. The reverse is also true. This is a manually updated list. If there are organisms missing, or new respeciates to be added, please raise and issue or push request on the epidm GitHub
respeciate_organism
respeciate_organism
What the organism used to be known as, in the form GENUS SPECIES
What the organism is known as now, in the form GENUS SPECIES
The genus of the recoded organism
A 0/1 flag to indicate if the genus has changed
A 0/1 flag to indicate if all species under that genus should change
In order to help clean up an analysis based on a group of specimen types, a lookup table has been created to help group sampling sites. This is a manually updated list. If there are organisms missing, or new respeciates to be added, please raise and issue or push request on the epidm GitHub
specimen_type_grouping
specimen_type_grouping
The primary specimen type with detail
A simple grouping of like specimen sites
A utility function to read in a SQL query from a character object, clipboard or text file and remove all comments for use with database query packages
sql_clean(sql)
sql_clean(sql)
sql |
a SQL file or text string |
a cleaned SQL query without comments as a character string
testSQL <- c( "/********* INTRO HEADER COMMENTS", "*********/", " SELECT ", " [VAR 1] -- with comments", ",[VAR 2]",",[VAR 3]", "FROM DATASET ","-- output here") sql_clean(testSQL)
testSQL <- c( "/********* INTRO HEADER COMMENTS", "*********/", " SELECT ", " [VAR 1] -- with comments", ",[VAR 2]",",[VAR 3]", "FROM DATASET ","-- output here") sql_clean(testSQL)
An function to help setup connections to SQL databases acting as a wrapper for the odbc and DBI packages. Used by other sql_* tools within epidm. This uses the credential manager within the system and assumes you are using a trusted connection.
sql_connect(server, database)
sql_connect(server, database)
server |
a string containing the server connection;
note that servers may require the use of double backslash |
database |
a string containing the database name within the data store |
a SQL connection object
sql_clean sql_read sql_write
## Not run: sql <- list( dsn = list(ser = 'covid.ukhsa.gov.uk', dbn = 'infections') ) sgss_con = sql_connect(server = sql$dsn$ser, database = sql$dsn$dbn) ## End(Not run)
## Not run: sql <- list( dsn = list(ser = 'covid.ukhsa.gov.uk', dbn = 'infections') ) sgss_con = sql_connect(server = sql$dsn$ser, database = sql$dsn$dbn) ## End(Not run)
Read a table object to a SQL database. Acts a wrapper for odbc and DBI packages.
sql_read(server, database, sql)
sql_read(server, database, sql)
server |
a string containing the server connection |
database |
a string containing the database name within the data store |
sql |
a string containing a SQL query or to a .sql/.txt SQL query |
a table from a SQL database
sql_clean sql_connect
Write a table object to a SQL database. Acts a wrapper for odbc and DBI packages with additional checks to ensure upload completes.
sql_write(x, server, database, tablename)
sql_write(x, server, database, tablename)
x |
a data.frame/data.table/tibble object |
server |
a string containing the server connection |
database |
a string containing the database name within the data store |
tablename |
a string containing the chosen SQL database table name |
writes a data.frame/data.table/tibble to a SQL database
Groups patient records from multiple isolates with a single integer patientID by grouping patient identifiers.
Grouping is based on the following stages:
matching nhs number and date of birth
Hospital number & Date of Birth
NHS number & Hospital Number
NHS number & Name
Hospital number & Name
Sex & Date of Birth & Surname
Sex & Date of Birth & Fuzzy Name
Sex & Year and Month of Birth & Fuzzy Name
Postcode & Name
Name Swaps (when first and last name are the wrong way around)
Identifiers are copied over where they are missing or invalid to the grouped records.
uk_patient_id( data, id = list(nhs_number = "nhs_number", hospital_number = "patient_hospital_number", date_of_birth = "date_of_birth", sex_mfu = "sex", forename = "forename", surname = "surname", postcode = "postcode"), .useStages = c(1:11), .sortOrder, .keepValidNHS = FALSE, .forceCopy = FALSE )
uk_patient_id( data, id = list(nhs_number = "nhs_number", hospital_number = "patient_hospital_number", date_of_birth = "date_of_birth", sex_mfu = "sex", forename = "forename", surname = "surname", postcode = "postcode"), .useStages = c(1:11), .sortOrder, .keepValidNHS = FALSE, .forceCopy = FALSE )
data |
a data.frame or data.table containing the patient data |
id |
a named list to provide the column names with identifiers, quoted
|
.useStages |
optional, default 1:11; set to 1 if you wish patient ID to be assigned cases with the same DOB and NHS number, set to 2 if you wish patient ID to be assigned to cases with the same hospital number (HOS) and DOB, set to 3 if you wish patient ID to be assigned cases with the same NHS and HOS number, set to 4 if you wish patient ID to be assigned cases with the same NHS number and surname, set to 5 if you wish patient ID to be assigned cases with the same hospital number and surname, set to 6 if you wish patient ID to be assigned cases with the same DOB and surname, set to 7 if you wish patient ID to be assigned cases with the same sex and full name, set to 8 if you wish patient ID to be assigned cases with the same sex, DOB and fuzzy name, set to 9 if you wish patient ID to be assigned cases with the same DOB and fuzzy name, set to 10 if you wish patient ID to be assigned cases with the same name and postcode, set to 11 if you wish patient ID to be assigned cases with the same first name or second name in changing order and date of birth. |
.sortOrder |
optional; a column as a character to allow a sorting order on the id generation |
.keepValidNHS |
optional, default FALSE; set TRUE if you wish to retain the column with the NHS checksum result stored as a BOOLEAN |
.forceCopy |
optional, default FALSE; TRUE will force data.table to take a copy instead of editing the data without reference |
A dataframe with one new variable:
id
a unique patient id
valid_nhs
if retained using argument .keepValidNHS=TRUE
, a
BOOLEAN containing the result of the NHS checksum validation
uk_patient_id( data = head(epidm::lab_data), id = list( nhs_number = 'nhs_number', hospital_number = 'local_patient_identifier', date_of_birth = 'patient_birth_date', sex_mfu = 'sex', forename = 'forename', surname = 'surname' postcode = 'postcode' ), .sortOrder = 'specimen_date', .forceCopy = TRUE )[]
uk_patient_id( data = head(epidm::lab_data), id = list( nhs_number = 'nhs_number', hospital_number = 'local_patient_identifier', date_of_birth = 'patient_birth_date', sex_mfu = 'sex', forename = 'forename', surname = 'surname' postcode = 'postcode' ), .sortOrder = 'specimen_date', .forceCopy = TRUE )[]
Check if NHS numbers are valid based on the checksum algorithm
This uses the first 9 digits, multiplied by 10 down to 2 eg digit 1x10, d2x9
The sum of the products of the first 9 digits are divided by 11
The remainder is checked against the 10th digit
Where the remainder is 11, it is replaced with 0
valid_nhs(nhs_number)
valid_nhs(nhs_number)
nhs_number |
a vector |
a vector, 1 if NHS number is valid, 0 if not valid
test <- floor(runif(1000,1000000000,9999999999)) valid_nhs(test) valid_nhs(9434765919)
test <- floor(runif(1000,1000000000,9999999999)) valid_nhs(test) valid_nhs(9434765919)