我正在做一个脚本,它所有的链接从谷歌网页在bash。 我获得与谷歌的网页w3m
实用性和这个脚本:
#!/bin/bash
# performs a google search using a word in input
word=$1
touch .google
if [ -z $word ]
then
echo "$word missing!"
echo "Aborting..."
exit
fi
a="www.google.com/search?q="
search=$a$word
w3m -no-cookie $search > .google
sleep 1
接下来,我得从这个页面的所有网站。 我的想法是把所有以启动字符串www.
和结尾/
echo `grep -wo "www[^/]*" .google`> .temp
这里的问题是,我错过了很多不以启动链接的www
,并在同一时间,我铤而走险的一切,当有是不是结束一个网站/
。
有什么更好的办法,我可以从中得到响应的网址吗?
链接提取是一个难题。 然而, lynx
方案有一个方便-dump
选项,可以让你跳过大部分(或全部)HTML解析。
特别要注意References
在底部。 你可以采取从该行以后的输出,并去掉了领先的子弹数目:
$ lynx -dump 'http://www.seomoz.org/'
#[1]RSS 2.0 [2]publisher
[3]SEOmoz
* [4]Log in
* [5]Sign up
* [6]Help
+ [7]Help Resources
+ [8]Support Forums
+ [9]Request a Feature
+ [10]Contact Us
* [11]Features
* [12]Pricing & Plans
* [13]Community
+ [14]SEO Blog
+ [15]YOUmoz User Blog
+ [16]Top Users
+ [17]Events
+ [18]Recommended Companies
* [19]Resources
+ [20]Learn SEO
+ [21]SEO Tools
+ [22]PRO Q&A Forum
+ [23]Mozscape API
* [24]Blog
+ [25]SEO Blog
+ [26]YOUmoz User Blog
* [27]About
+ [28]Our TAGFEE Mission
+ [29]Meet the Mozzers
+ [30]Contact Us
+ [31]Join Our Team
+ [32]Press & Awards
+ [33]Events
Search SEOmoz
____________________ Search
SEO & Social Monitoring
Made Simple.
SEOmoz PRO combines SEO management, social media monitoring, actionable
recommendations, and so much more in one easy-to-use platform. Try it
free for 30 days.
[34]Try it for Free!
[35]Take a tour of SEOmoz PRO
or see [36]plans & pricing
* Campaign Overview
* Social Dashboard
* Crawl Diagnostics
* Dashboard
* Google Analytics
* Link Analysis
Loved By...
* Zillow
* Disney
* Overstock
* Best Buy
* Yelp
* Sun Microsystems
Roger Mozbot
Be My Buddy...
* [37]RSS
* [38]Twitter
* [39]Facebook
* [40]Google+
Effectively Manage Your SEO and Monitor Your Social Media
[41]Link Analysis
Analyze links and track key performance metrics in an efficient
all-in-one dashboard.
[42]Identify SEO Issues
Identify critical SEO issues and get actionable recommendations.
[43]Monitor Changes
Automatically monitor changes to your rankings and take control of your
organic traffic.
Avinash Kaushik
"SEOmoz tools provide best of class data. Their tools are a
must-have for marketers looking to optimize their organic search
results."
Avinash Kaushik,
Author, Web Analytics 1.0: An Hour A Day
Patrick Altoft
"SEOmoz has enabled us to scale our link-building process quickly
without compromising on quality."
Patrick Altoft,
CEO, Branded3
Latest from the SEOmoz Blog
__________________________________________________________________
[44]jennita
[45]Winners of #MozCation 2012
Posted by [46]jennita on 08/04/2012
Whoa. Ever have one of those times where your expectations are
completely blown out of the water? Well that's what happened during
this year's nomination for a MozCation. Wait, wait, wait, before I get
too far ahead of myself, I...
[47]Read Full Entry
13
2
[48]13 Comments
__________________________________________________________________
Latest from the Community YouMoz Blog
__________________________________________________________________
[49]larry.kim
[50]Does SEO Even Work for Small Businesses?
Posted by [51]larry.kim on 08/03/2012
Clicks on paid search listings beat out organic listings by nearly a
2:1 margin for keywords with high commercial intent in the US. Is SEO
still a viable marketing tactic for the average small business owner?
[52]Read Full Entry
17
3
[53]28 Comments
__________________________________________________________________
Voted Best SEO Tool 2010!
[54]Try it for Free!
Looking for SEO consulting?
SEOmoz doesn't provide consulting, but our friends at [55]Distilled
still do. Rock on!
Copyright ? 1996-2012 SEOmoz. All Rights Reserved.
Product and Tools
* [56]SEOmoz PRO
* [57]Pricing and Plans
* [58]Open Site Explorer
* [59]SEO Toolbar
* [60]Mozscape API
* [61]More SEO Tools
Company
* [62]About
* [63]SEO Blog
* [64]YOUmoz Blog
* [65]Affiliate Program
* [66]Terms & Privacy Policy
* [67]PRO Perks
Popular Content
* [68]Link Building
* [69]Reputation Management
* [70]Analytics
* [71]Social Media
* [72]Content & Blogging
* [73]See All Categories
Stay in Touch
*
+ [74]RSS
+ [75]Twitter
+ [76]Facebook
+ [77]LinkedIn
*
SEOmoz
119 Pine St. Suite 400
Seattle, WA 98101
206.632.3171
* [78]Contact Us
* [79]Sitemap
References
1. http://feeds.feedburner.com/seomoz
2. https://plus.google.com/112544075040456048636
3. http://www.seomoz.org/
4. https://www.seomoz.org/users/login
5. https://www.seomoz.org/users/register
6. http://www.seomoz.org/
7. http://www.seomoz.org/help
8. http://www.seomoz.org/q
9. http://seomoz.zendesk.com/forums/293194-seomoz-PRO-feature-requests
10. http://www.seomoz.org/about/contact
11. http://www.seomoz.org/features
12. http://www.seomoz.org/plans
13. http://www.seomoz.org/community
14. http://www.seomoz.org/blog
15. http://www.seomoz.org/ugc
16. http://www.seomoz.org/users
17. http://www.seomoz.org/about/events
18. http://www.seomoz.org/article/recommended
19. http://www.seomoz.org/resources
20. http://www.seomoz.org/learn-seo
21. http://www.seomoz.org/tools
22. http://www.seomoz.org/q
23. http://www.seomoz.org/api
24. http://www.seomoz.org/blog
25. http://www.seomoz.org/blog
26. http://www.seomoz.org/ugc
27. http://www.seomoz.org/about
28. http://www.seomoz.org/about/mission
29. http://www.seomoz.org/about/team
30. http://www.seomoz.org/about/contact
31. http://www.seomoz.org/about/jobs
32. http://www.seomoz.org/about/press
33. http://www.seomoz.org/about/seo-events
34. http://www.seomoz.org/cart/freetrial?pg=home
35. http://www.seomoz.org/features
36. http://www.seomoz.org/plans
37. http://feeds.feedburner.com/seomoz
38. http://twitter.com/seomoz
39. http://www.facebook.com/SEOmoz
40. https://plus.google.com/112544075040456048636?prsrc=3
41. http://www.seomoz.org/features
42. http://www.seomoz.org/features
43. http://www.seomoz.org/features
44. http://www.seomoz.org/users/profile/81197
45. http://www.seomoz.org/blog/winners-mozcation-2012
46. http://www.seomoz.org/users/profile/81197
47. http://www.seomoz.org/blog/winners-mozcation-2012
48. http://www.seomoz.org/blog/winners-mozcation-2012#comments
49. http://www.seomoz.org/users/profile/402613
50. http://www.seomoz.org/ugc/does-seo-even-work-for-small-businesses
51. http://www.seomoz.org/users/profile/402613
52. http://www.seomoz.org/ugc/does-seo-even-work-for-small-businesses
53. http://www.seomoz.org/ugc/does-seo-even-work-for-small-businesses#comments
54. http://www.seomoz.org/cart/freetrial?pg=features
55. http://www.seomoz.org/dp/distilled
56. http://www.seomoz.org/features
57. http://www.seomoz.org/plans
58. http://www.opensiteexplorer.org/
59. http://www.seomoz.org/seo-toolbar
60. http://www.seomoz.org/api
61. http://www.seomoz.org/tools
62. http://www.seomoz.org/about
63. http://www.seomoz.org/blog
64. http://www.seomoz.org/ugc
65. http://www.seomoz.org/dp/seomoz-pro-affiliate-program
66. http://www.seomoz.org/terms-and-privacy
67. http://www.seomoz.org/pro-perks
68. http://www.seomoz.org/blog/category/4
69. http://www.seomoz.org/blog/category/19
70. http://www.seomoz.org/blog/category/8
71. http://www.seomoz.org/blog/category/18
72. http://www.seomoz.org/blog/category/1
73. http://www.seomoz.org/blog
74. http://feeds.feedburner.com/seomoz
75. http://twitter.com/seomoz
76. http://www.facebook.com/SEOmoz
77. http://www.linkedin.com/groups?about=&gid=2976409&trk=anet_ug_grppro
78. http://www.seomoz.org/about/contact
79. http://www.seomoz.org/sitemap
您可能需要到grep的<a href="
并采取值到下一个报价符号,然后过滤掉所有的JavaScript的东西。虽然这种解决方案可能不是防呆无论是。