Httrack faulty when encountering japanese encoded

2019-07-25 21:25发布

I usually don't have any problem with Httrack, but this time, I found out that it doesn't manage to grab pages with non ascii characters like this japanese URL :

domain.com/リーク情報の真偽のほ/

( read by the browser this way : domain.com/E3%83%A0%E7%A3%A8%E3%81%8D%E3%82%82%E5%A4%A7%E4%BA%8B%EF%BC%81%E3%82%B9%E3%83%9E%E3%83%9B%E3%83%95%E3%82%A9%E3%83%BC%E3%83%A0%E3%81%A7%E3%81%AE%E6%9C%80%E9%81%A9%E3%81%AA-2/ )

Httrack can grabs 50% of the folders but the html files inside them are all 0kb. The other 50% have totally garbled strings, and are empty as well.

Then I tried the DOS/ISO spidering options but it changes the structure too much (and makes all files/folders upper-cased).

Is there any way to have httrack work properly on these urls?

0条回答
登录 后发表回答