Ruby's HTTP Client improperly parsing certain URLs -

- June 15, 2015

it quite possible there answer following question, if so, unable recognize it.

here's thing: making ruby program sweeping dictionary list of entries. need because want sweep each entry in search of specific words, that's beside point. problem program has trouble downloading data encoded links, has never occured before.

by encoded, mean encoding replacing non-ascii characters etc., link this: http://www.dict.cc/deutsch-englisch/a+%5bauch+a%5d+%5bbuchstabe%5d.html looks this: http://www.dict.cc/deutsch-englisch/a+%5bauch+a%5d+%5bbuchstabe%5d.html

the funny thing while above not work, of links work, instance: /deutsch-englisch/a+an+b+anpassen.html

i have tested random links , work, , regex matches supposed match.

here's function using:

def getdataoverhttpget(link, proxy = nil)     link = uri.unescape(link)           # added     http = httpclient.new(:agent_name => 'mozilla/5.0 (windows nt 6.1; wow64; rv:12.0) gecko/20100101 firefox/25.0')     http.proxy = proxy if proxy     r = http.get(link)     raise r.status.to_s if r.status != 200     return r.body end

which worked fine until now. has been suggested me urls might escaped http client, added unescape thing. got in return empty string instead of information program generates missing data (= failed match using regex). however, using uri.escape makes no change, might case. however, have no idea else can try.

also, strings in program in utf-8 (no bom).

try calling uri.parse, so:

http = httpclient.new(:agent_name => 'mozilla/5.0 (windows nt 6.1; wow64; rv:12.0) gecko/20100101 firefox/25.0') http.proxy = proxy if proxy r = http.get(uri.parse(link))

Search This Blog

Print F

Ruby's HTTP Client improperly parsing certain URLs -

Comments

Post a Comment

Popular posts from this blog

node.js - How to mock a third-party api calls in the backend -

node.js - Why do I get "SOCKS connection failed. Connection not allowed by ruleset" for some .onion sites? -

matlab - 0-by-1 sym - What do I need to change in order to get proper symbolic results? -