ruby - Best matching between two arrays using fuzzy string matching -
i need way find best matching between 2 arrays.
array a
contains product names array b
refers same products names may differ slightly.
a = [ "f542521376-34-reg", "af7u", "af106u", "f521521376-30r" ] b = [ "f54252137634r", "af7u", "af106u", "f52152137630r" ]
best matching:
"f542521376-34-reg" - "f54252137634r" "af7u" - "af7u" "af106u" - "af106u" "f521521376-30r" - "f52152137630r"
or:
a[0] - b[0] a[1] - b[1] a[2] - b[2] a[3] - b[3]
(the first , last elements varied between lists.)
i can use fuzzy string matching algorithm numerical value string similarity (0.0-1.0). alone wont me best possible matching of list elements. i've not found algorithm , don't want brute force it.
the actual application is, have middle-man ruby code translates information between 2 third party systems , data quality on place. need match elements create look-up table. there no telling formatting , mutations of product "names" might be.
i had similar problem used gem fuzzy_match solve. proposal assumes relationship between , b not 1 one.
require 'fuzzy_match' fz=fuzzymatch.new(a) map = {} map[nil] = [] # elements in b no match in a.each{|r| map[r] = []} # in case more 1 element in b match b.each |name| map[fz.find(name)] << name end
this gives "map":
{"f542521376-34-reg"=>["f54252137634r"], "af7u"=>["af7u"], "af106u"=>["af106u"], "f521521376-30r"=>["f52152137630r"]}
if match not enough there several parameters fuzzy_match can used improve matching result.
Comments
Post a Comment