block.ids.from.blocking |
Returns the block ids associated with a blocking method. |
block_setup_v2 |
Function that divides all records into bins using locality sensitive hashing and using TLSH (based upon community detection technique) |
compare_buckets |
Function that creates a similarity graph and divides it into communities (or blocks) for entity resolution |
confusion.from.blocking |
Perform evaluations (recall) for blocking. |
eval.blocksetup |
Function to evaluate the blocking step |
extract_pairs_from_band |
Function that extracts pairs of records from a band in the signature matrix M import bit64 |
hash_signature |
Function to take a signature matrix M composed of b bands and r rows and return a bucket for each band for each record |
minhash_v2 |
Function to create a matrix of minhashed signatures |
my_hash |
Function that applies a hash function to each column of the band from the signature matrix import bit64 |
primest |
Function to generate all primes larger than an integer n1 (lower limit) and less than any other integer n2 (upper limit) |
reduction.ratio |
Returns the reduction ratio associated with a blocking method |
reduction.ratio.from.blocking |
Returns the reduction ratio associated with a blocking method |
rhash_funcs |
Function to generate a vector of random hash functions (or optionally one vector-valued function) |
shingled_record_to_index_vec |
Function to convert to tell what index the shingle corresponds to in the record |
shingles |
Function to shingle (token or gram) a string into its k components |