How to exclude links using POST parameters with wg

2019-09-09 08:20发布

站内文章 / 后端开发

19 0

闹够了就滚

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I want to download all accessible html files under www.site.com/en/. However, there are a lot of linked URLS with post parameters on the site (e.g. pages 1,2,3.. for each product category). I want wget NOT to download these links. I'm using

-R "*\?*"

But it's not perfect because it only removes the file after downloading it.

Is there some way for example to filter the links followed by wget with a regex?

回答1:

It is possible to avoid those files with a regex, you would have to use --reject-regex '(.*)\?(.*)' but it will work only with wget version 1.15, so I would recommend you to check your wget version first.