Ruby's HTTP Client improperly parsing certain URLs -


it quite possible there answer following question, if so, unable recognize it.

here's thing: making ruby program sweeping dictionary list of entries. need because want sweep each entry in search of specific words, that's beside point. problem program has trouble downloading data encoded links, has never occured before.

by encoded, mean encoding replacing non-ascii characters etc., link this: http://www.dict.cc/deutsch-englisch/a+%5bauch+a%5d+%5bbuchstabe%5d.html looks this: http://www.dict.cc/deutsch-englisch/a+%5bauch+a%5d+%5bbuchstabe%5d.html

the funny thing while above not work, of links work, instance: /deutsch-englisch/a+an+b+anpassen.html

i have tested random links , work, , regex matches supposed match.

here's function using:

def getdataoverhttpget(link, proxy = nil)     link = uri.unescape(link)           # added     http = httpclient.new(:agent_name => 'mozilla/5.0 (windows nt 6.1; wow64; rv:12.0) gecko/20100101 firefox/25.0')     http.proxy = proxy if proxy     r = http.get(link)     raise r.status.to_s if r.status != 200     return r.body end 

which worked fine until now. has been suggested me urls might escaped http client, added unescape thing. got in return empty string instead of information program generates missing data (= failed match using regex). however, using uri.escape makes no change, might case. however, have no idea else can try.

also, strings in program in utf-8 (no bom).

try calling uri.parse, so:

http = httpclient.new(:agent_name => 'mozilla/5.0 (windows nt 6.1; wow64; rv:12.0) gecko/20100101 firefox/25.0') http.proxy = proxy if proxy r = http.get(uri.parse(link)) 

Comments

Popular posts from this blog

c++ - Delete matches in OpenCV (Keypoints and descriptors) -

java - Could not locate OpenAL library -

sorting - opencl Bitonic sort with 64 bits keys -