Synthetic Data Generation for Linkage Methods Development


[Up] [Top]

Documentation for package ‘sdglinkage’ version 0.1.0

Help Pages

acquire_error_flag Add a column of error flags given two data frames.
address_uk UK addresses.
add_dependent_error Add two dependent error flags to a data frame.
add_random_error Add random error flags to a data frame.
add_variable Add a synthetic but realistic variable to a dataset following some rules.
adult Adult dataset.
bn_flag_inference Bayesian inference for error prediction .
check_swap_char Check if two strings are the same after we swaped the position of two letters.
compare_cart Compare the synthetic data generated by CART with the real data.
compare_sdg Compare the performance of generators.
compare_two_df Compare two data frames.
damage_gold_standard Generate a linkage file by damaging the gold standard file.
diff_two_strings Find all letters in 'string1' which are not in 'string2'. 'diff_two_strings' is adopted from package vecsets function vsetdiff, it returns all letters in 'string1' which are not in 'string2'.
do_ocr_replacement Replace a string with its ocr error.
do_pho_replacement Replace a string with its phonetic error.
do_typo_replacement Replace a string with its typo error.
extract_address Extract addresses.
firstname_uk Baby birth first names in England and Wales.
firstname_uk_variant First name variants in the UK.
firstname_us First names in the US census.
gen_address Generate an address.
gen_bn_elicit Generate synthetic data using BN parameter learning with an elicted structure.
gen_bn_learn Generate synthetic data using BN learning.
gen_cart Generate synthetic data using CART.
gen_dob Generate a record of date of birth.
gen_firstname Randomly generate a firstname.
gen_lastname Randomly generate a lastname.
gen_nhsid Generate a random nhsid.
get_address Get an address.
get_transformation_del Delete a character randomly.
get_transformation_insert Insert a character/digit/space/symbol randomly.
get_transformation_name_variant Randomly assign a name to its variant.
get_transformation_ocr Encode OCR error to a string.
get_transformation_pho Encode phonetic error to a string.
get_transformation_trans_char Randomly transpose two neighbouring characters.
get_transformation_trans_date Transpose the position of day and month.
get_transformation_typo Encode typographic error to a string.
lastname_uk Last names in UK,
lastname_uk_variant Last name variants in the UK.
lastname_us Last names in the US census.
ocr_rules Look up table of Optical Character Recognition (OCR) errors.
pho_rules Look up table of phonetic errors.
plot_bn Plot the BN structure.
plot_compared_sdg Plot the distribution of a varaible from the synthetic data comparing with the real data.
replace_firstname Replace the firstnames with values from another database.
replace_lastname Replace the lastnames with values from another database.
replace_nhsid Replace nhsid with another random nhsid.
slavo_germanic Detect if it has slavo transformation.
split_data Split the data into a training_set and a testing_set.