Transitive Locality-Sensitive Hashing (LSH) for Record Linkage


[Up] [Top]

Documentation for package ‘tlsh’ version 0.1.0

Help Pages

block.ids.from.blocking Returns the block ids associated with a blocking method.
block_setup_v2 Function that divides all records into bins using locality sensitive hashing and using TLSH (based upon community detection technique)
compare_buckets Function that creates a similarity graph and divides it into communities (or blocks) for entity resolution
confusion.from.blocking Perform evaluations (recall) for blocking.
eval.blocksetup Function to evaluate the blocking step
extract_pairs_from_band Function that extracts pairs of records from a band in the signature matrix M import bit64
hash_signature Function to take a signature matrix M composed of b bands and r rows and return a bucket for each band for each record
minhash_v2 Function to create a matrix of minhashed signatures
my_hash Function that applies a hash function to each column of the band from the signature matrix import bit64
primest Function to generate all primes larger than an integer n1 (lower limit) and less than any other integer n2 (upper limit)
reduction.ratio Returns the reduction ratio associated with a blocking method
reduction.ratio.from.blocking Returns the reduction ratio associated with a blocking method
rhash_funcs Function to generate a vector of random hash functions (or optionally one vector-valued function)
shingled_record_to_index_vec Function to convert to tell what index the shingle corresponds to in the record
shingles Function to shingle (token or gram) a string into its k components