ruby - Best matching between two arrays using fuzzy string matching -


i need way find best matching between 2 arrays.

array a contains product names array b refers same products names may differ slightly.

a = [     "f542521376-34-reg",     "af7u",     "af106u",     "f521521376-30r" ]  b = [     "f54252137634r",     "af7u",     "af106u",     "f52152137630r" ] 

best matching:

"f542521376-34-reg" - "f54252137634r" "af7u"              - "af7u" "af106u"            - "af106u" "f521521376-30r"    - "f52152137630r" 

or:

a[0] - b[0] a[1] - b[1] a[2] - b[2] a[3] - b[3] 

(the first , last elements varied between lists.)

i can use fuzzy string matching algorithm numerical value string similarity (0.0-1.0). alone wont me best possible matching of list elements. i've not found algorithm , don't want brute force it.

the actual application is, have middle-man ruby code translates information between 2 third party systems , data quality on place. need match elements create look-up table. there no telling formatting , mutations of product "names" might be.

i had similar problem used gem fuzzy_match solve. proposal assumes relationship between , b not 1 one.

require 'fuzzy_match'  fz=fuzzymatch.new(a)  map = {} map[nil] = []   # elements in b no match in  a.each{|r| map[r] = []} # in case more 1 element in b match  b.each |name|  map[fz.find(name)] << name end 

this gives "map":

{"f542521376-34-reg"=>["f54252137634r"],  "af7u"=>["af7u"],  "af106u"=>["af106u"],  "f521521376-30r"=>["f52152137630r"]} 

if match not enough there several parameters fuzzy_match can used improve matching result.


Comments

Popular posts from this blog

c++ - Delete matches in OpenCV (Keypoints and descriptors) -

java - Could not locate OpenAL library -

sorting - opencl Bitonic sort with 64 bits keys -