Getting encoded text while scraping the data from

2020-07-29 00:39发布

问题:

Code Part:

[<div class="hidden_elem"><code id="u_0_8"><!-- <div class="_4-u2 _5z71 _18ib _4-u8"><div class="_4-u3 _5z73"><div class="clearfix"><div class="lfloat _ohe"><a class="_5z74" href="/events/dialog/public_guest_list/?acontext%5Bref%5D=51&amp;acontext%5Bsource%5D=1&amp;acontext%5Baction_history%5D=%5B%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22surface%22%2C%22extra_data%22%3A%5B%5D%7D%2C%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22guest_list%22%2C%22extra_data%22%3A%5B%5D%7D%5D&amp;acontext%5Bhas_source%5D=1&amp;event_id=1407771472571452" rel="dialog" role="button">560 \u091c\u093e \u0930\u0939\u0947 \u0939\u0948\u0902&nbsp;\xb7&nbsp;3.1 \u0939\u091c\u093c\u093e\u0930 \u0915\u0940 \u0930\u0941\u091a\u093f \u0939\u0948</a><div class="_5z7d">\u0907\u0938 \u0908\u0935\u0947\u0902\u091f \u0915\u094b \u0905\u092a\u0928\u0947 \u092e\u093f\u0924\u094d\u0930\u094b\u0902 \u0938\u0947 \u0938\u093e\u091d\u093e \u0915\u0930\u0947\u0902</div></div><a class="_42ft _4jy0 _i8v _3-8w rfloat _ohf _4jy4 _517h _51sy" role="button" href="#" ajaxify="#" rel="dialog" data-testid="event_invite_button"><i class="_3-8_ _3-8_ img sp_WYmAGAVQNZh sx_82e44d"></i>\u0906\u092e\u0902\u0924\u094d\u0930\u093f\u0924 \u0915\u0930\u0947\u0902</a></div></div></div> --></code></div>, <div class="hidden_elem"><code id="u_0_i"><!-- <div class="_5vl5 _3a9j"><ul class="uiList _4kg _4ks"><li class="_3slj"><div class="_36hm"><table class="uiGrid _51mz" cellspacing="0" cellpadding="0"><tbody><tr class="_51mx"><td class="_51m- _phw"><div class="_6a" aria-hidden="true"><div class="_6a _6b" style="height:18px"></div><div class="_6a _6b"><i class="_ohg img sp_ESbkBsVlxUv sx_c2b8bd"><u>clock</u></i></div></div></td><td class="_51m- _4930 _phw _51mw"><div class="_xkh _phw"><div class="_6a"><div class="_6a _6b" style="height:18px"></div><div class="_6a _6b"><div class="_publicProdFeedInfo__timeRowTitle _5xhk" content="2017-07-28T21:30:00-07:00 to 2017-07-29T05:00:00-07:00"><span><span itemprop="startDate">29 \u091c\u0941\u0932\u093e\u0908</span></span> <span title="09:30 &#x905;&#x92a;&#x930;&#x93e;&#x939;&#x94d;&#x928; &#x906;&#x92a;&#x915;&#x947; &#x938;&#x92e;&#x92f; &#x92e;&#x947;&#x902;">10:00 \u092a\u0942\u0930\u094d\u0935\u093e\u0939\u094d\u0928</span> - <span title="05:00 &#x92a;&#x942;&#x930;&#x94d;&#x935;&#x93e;&#x939;&#x94d;&#x928; &#x906;&#x92a;&#x915;&#x947; &#x938;&#x92e;&#x92f; &#x92e;&#x947;&#x902;">05:30 \u0905\u092a\u0930\u093e\u0939\u094d\u0928 UTC+05:30</span></div><div class="_5xhp fsm fwn fcg"></div></div></div></div></td></tr></tbody></table></div></li><li class="_3xd0 _3slj"><div class="_36hm _5cmn" id="u_0_9"><table class="uiGrid _51mz" cellspacing="0" cellpadding="0"><tbody><tr class="_51mx"><td class="_51m- _phw"><div class="_6a" aria-hidden="true"><div class="_6a _6b" style="height:32px"></div><div class="_6a _6b"><i class="_ohg img sp_ESbkBsVlxUv sx_f4bee6"><u>pin</u></i></div></div></td><td class="_51m- _51mw"><div class="clearfix _4930"><div class="_xkg _phw rfloat _ohf"><div><div id="u_0_a"><div class="_6a"><div class="_6a _6b" style="height:32px"></div><div class="_6a _6b"><a href="#" role="button">\u092e\u0948\u092a \u0926\u093f\u0916\u093e\u090f\u0901</a></div></div></div><div class="hidden_elem" id="u_0_b"><div class="_6a"><div class="_6a _6b" style="height:32px"></div><div class="_6a _6b"><a href="#" role="button">\u092e\u0948\u092a \u091b\u093f\u092a\u093e\u090f\u0901</a></div></div></div></div></div><div class="_xkh _phw _42ef"><div class="_6a"><div class="_6a _6b" style="height:32px"></div><div class="_6a _6b"><a class="_5xhk" href="https://www.facebook.com/iitd.delhi/" id="u_0_d" data-testid="event-permalink-location">IIT Delhi</a><div class="_5xhp fsm fwn fcg">Hauz Khaz, New Delhi, India 110016</div></div></div></div></div></td></tr></tbody></table></div><div class="_4-u2 hidden_elem _5xhn _4-u8" id="u_0_c"><div class="clearfix _ikh"><div class="_4bl7"><div class="_23mo"><div class="fbPlaceFlyoutWrap _5xho" id="u_0_e"><div class="fbPlaceFlyout" style="width:240px;"><div class="fbPlaceFlyoutShell" style="width:46px;bottom:-31px;"><div class="fbPlaceFlyoutBox uiBoxWhite" style="width: 46px"><div><div class="_52i5"><a href="https://www.facebook.com/iitd.delhi/"><img class="_s0 img" src="https://scontent.fdel6-1.fna.fbcdn.net/v/t1.0-1/p40x40/255575_512250575469178_612128240_n.jpg?oh=dc9acf8d4452db344aaba7fde25efa84&amp;oe=59AD9DC7" alt="" itemprop="image" aria-label="IIT Delhi" role="img" style="width:40px;height:40px" /></a></div></div><div class="fbPlaceFlyoutMapArrow"><i class="img sp_ESbkBsVlxUv sx_104d97"></i></div><div class="fbPlaceFlyoutMapArrow"><i class="img sp_ESbkBsVlxUv sx_104d97"></i></div></div></div></div><a href="#" rel="dialog" ajaxify="/places/map/?id=211928345501404" role="button"><div><div class="_4j7v _2vs2"><img class="_a3f img" alt="" aria-label="&#x928;&#x915;&#x94d;&#x936;&#x93e; &#x905;&#x91f;&#x948;&#x91a;&#x92e;&#x947;&#x902;&#x91f;" src="https://external.fdel6-1.fna.fbcdn.net/static_map.php?region=IN&amp;v=29&amp;osm_provider=2&amp;size=240x132&amp;center=28.545188216208%2C77.193069476906&amp;zoom=15&amp;markers=28.54518822%2C77.19306948&amp;language=hi_IN" width="240" height="132" /><span id="u_0_g"></span></div></div></a></div></div></div><div class="_4bl9 _2qsg"><div><span class="_c24">\u0915\u0949\u0932\u0947\u091c \u0914\u0930 \u092f\u0942\u0928\u093f\u0935\u0930\u094d\u0938\u093f\u091f\u0940</span><div><div class="_4iae"><div><div class="_6a _5xoz _5xo-"><i class="img sp_ESbkBsVlxUv sx_ac5297"></i></div><div class="_6a _5xoz"><i class="img sp_ESbkBsVlxUv sx_ac5297"></i></div><div class="_6a _5xoz"><i class="img sp_ESbkBsVlxUv sx_ac5297"></i></div><div class="_6a _5xoz"><i class="img sp_ESbkBsVlxUv sx_ac5297"></i></div><div class="_6a _5xoz _4ial"><i class="img sp_ESbkBsVlxUv sx_ac5297"></i></div></div><div class="_559j" style="clip: rect(0px, 63px, 16px, 0px)"><div class="_6a _5xoz _5xo-"><i class="img sp_ESbkBsVlxUv sx_59de11"></i></div><div class="_6a _5xoz"><i class="img sp_ESbkBsVlxUv sx_59de11"></i></div><div class="_6a _5xoz"><i class="img sp_ESbkBsVlxUv sx_59de11"></i></div><div class="_6a _5xoz"><i class="img sp_ESbkBsVlxUv sx_59de11"></i></div><div class="_6a _5xoz _4ial"><i class="img sp_ESbkBsVlxUv sx_59de11"></i></div></div></div></div><hr class="_23mm" /><div><span class="_c24">011 2659 6316</span></div><div><span class="_c24"></span></div><div class="ptm"><a class="_42ft _4jy0 _4jy3 _517h _51sy" role="button" href="http://l.facebook.com/l.php?u=http%3A%2F%2Fshare.here.com%2Fr%2Fmylocation%2Fe-eyJuYW1lIjoiSUlUIERlbGhpIiwiYWRkcmVzcyI6IkhhdXogS2hheiwgTmV3IERlbGhpLCBJbmRpYSAxMTAwMTYiLCJsYXRpdHVkZSI6MjguNTQ1MTg4MjE2MjA4LCJsb25naXR1ZGUiOjc3LjE5MzA2OTQ3NjkwNiwicHJvdmlkZXJOYW1lIjoiZmFjZWJvb2siLCJwcm92aWRlcklkIjoyMTE5MjgzNDU1MDE0MDR9%3Flink%3Dunknown%26fb_locale%3Dhi_IN%26ref%3Dfacebook&amp;h=ATP2RoDOmV19cipyFvxN_S_G4uI7FP1aDGQXs8I8palbouMF9Ut2wIJBE-D0XSb9O2x9_YcBTP1eLGOs-qvz3hHjCMi-5oGqGiE1TJerNdX-KKhRgc6j392SdLAY&amp;s=1" id="u_0_f" target="_blank" rel="nofollow" onmouseover="LinkshimAsyncLink.swap(this, &quot;http:\\\\/\\\\/share.here.com\\\\/r\\\\/mylocation\\\\/e-eyJuYW1lIjoiSUlUIERlbGhpIiwiYWRkcmVzcyI6IkhhdXogS2hheiwgTmV3IERlbGhpLCBJbmRpYSAxMTAwMTYiLCJsYXRpdHVkZSI6MjguNTQ1MTg4MjE2MjA4LCJsb25naXR1ZGUiOjc3LjE5MzA2OTQ3NjkwNiwicHJvdmlkZXJOYW1lIjoiZmFjZWJvb2siLCJwcm92aWRlcklkIjoyMTE5MjgzNDU1MDE0MDR9?link=unknown&amp;fb_locale=hi_IN&amp;ref=facebook&quot;);" onclick="LinkshimAsyncLink.swap(this, &quot;http:\\\\/\\\\/l.facebook.com\\\\/l.php?u=http\\\\u00253A\\\\u00252F\\\\u00252Fshare.here.com\\\\u00252Fr\\\\u00252Fmylocation\\\\u00252Fe-eyJuYW1lIjoiSUlUIERlbGhpIiwiYWRkcmVzcyI6IkhhdXogS2hheiwgTmV3IERlbGhpLCBJbmRpYSAxMTAwMTYiLCJsYXRpdHVkZSI6MjguNTQ1MTg4MjE2MjA4LCJsb25naXR1ZGUiOjc3LjE5MzA2OTQ3NjkwNiwicHJvdmlkZXJOYW1lIjoiZmFjZWJvb2siLCJwcm92aWRlcklkIjoyMTE5MjgzNDU1MDE0MDR9\\\\u00253Flink\\\\u00253Dunknown\\\\u002526fb_locale\\\\u00253Dhi_IN\\\\u002526ref\\\\u00253Dfacebook&amp;h=ATP2RoDOmV19cipyFvxN_S_G4uI7FP1aDGQXs8I8palbouMF9Ut2wIJBE-D0XSb9O2x9_YcBTP1eLGOs-qvz3hHjCMi-5oGqGiE1TJerNdX-KKhRgc6j392SdLAY&amp;s=1&quot;);">\u0926\u093f\u0936\u093e\u090f\u0901 \u092a\u094d\u0930\u093e\u092a\u094d\u0924 \u0915\u0930\u0947\u0902</a></div></div></div></div></div></li></ul><div id="event_navigation" class="_4dn9"><div id="u_0_h"></div></div></div> --></code></div>, <div class="hidden_elem"><code id="u_0_m"><!-- <div class="_4z-v"><div class="_4-u2 _3xaf _3-95 _4-u8"><div class="_4-u3 _5dwa _5dwb _57_-"><span class="_38my _5803">\u0935\u093f\u0935\u0930\u0923<span class="_c1c"></span></span><div class="_3s3-"></div></div><div class="_2qgs"><span class="_4n-j _fbReactionComponent__eventDetailsContentTags fsl" data-testid="event-permalink-details">Indian Youth Forum is proud to announce the first-ever Startup Festival 2017 which will bring together the brightest startups of the country all in one place. And these startups are looking to hire you!<br /> For the first time ever, these bright and young startups, will open their ships to technical and non-technical talent, on an adventurous voyage filled with learning to become the next big company. The event is open to working professionals and talented freshers looking for a challenging and enriching role.<br /> <br /> For Any Kind of Association Queries Mail us at -<br /> mystory&#064;indiayf.in or Inbox us .</span></div><div class="_1r51"><ul class="uiList uiCollapsedList uiCollapsedListHidden _509- _4ki" id="u_0_j"><li><a href="/events/discovery/?acontext=%7B%22ref%22%3A51%2C%22source%22%3A1%2C%22action_history%22%3A%22%5B%7B%5C%22surface%5C%22%3A%5C%22permalink%5C%22%2C%5C%22mechanism%5C%22%3A%5C%22surface%5C%22%2C%5C%22extra_data%5C%22%3A%5B%5D%7D%2C%7B%5C%22surface%5C%22%3A%5C%22permalink%5C%22%2C%5C%22mechanism%5C%22%3A%5C%22event_information%5C%22%2C%5C%22extra_data%5C%22%3A%7B%5C%22tag%5C%22%3A%5C%22StartUp%5C%22%7D%7D%5D%22%2C%22has_source%22%3Atrue%7D&amp;suggestion_token=%7B%22tags%22%3A%5B181836542181749%5D%7D"><span class="_47od">StartUp</span></a></li><li><a href="/events/discovery/?acontext=%7B%22ref%22%3A51%2C%22source%22%3A1%2C%22action_history%22%3A%22%5B%7B%5C%22surface%5C%22%3A%5C%22permalink%5C%22%2C%5C%22mechanism%5C%22%3A%5C%22surface%5C%22%2C%5C%22extra_data%5C%22%3A%5B%5D%7D%2C%7B%5C%22surface%5C%22%3A%5C%22permalink%5C%22%2C%5C%22mechanism%5C%22%3A%5C%22event_information%5C%22%2C%5C%22extra_data%5C%22%3A%7B%5C%22tag%5C%22%3A%5C%22Job+hunting%5C%22%7D%7D%5D%22%2C%22has_source%22%3Atrue%7D&amp;suggestion_token=%7B%22tags%22%3A%5B111193155571103%5D%7D"><span class="_47od">Job hunting</span></a></li><li><a href="/events/discovery/?acontext=%7B%22ref%22%3A51%2C%22source%22%3A1%2C%22action_history%22%3A%22%5B%7B%5C%22surface%5C%22%3A%5C%22permalink%5C%22%2C%5C%22mechanism%5C%22%3A%5C%22surface%5C%22%2C%5C%22extra_data%5C%22%3A%5B%5D%7D%2C%7B%5C%22surface%5C%22%3A%5C%22permalink%5C%22%2C%5C%22mechanism%5C%22%3A%5C%22event_information%5C%22%2C%5C%22extra_data%5C%22%3A%7B%5C%22tag%5C%22%3A%5C%22Startup.com%5C%22%7D%7D%5D%22%2C%22has_source%22%3Atrue%7D&amp;suggestion_token=%7B%22tags%22%3A%5B109416335743992%5D%7D"><span class="_47od">Startup.com</span></a></li></ul></div></div><div class="_4-u2 _3xaf _3-95 _4-u8"><div class="_4-u3 _5dwa _5dwb _57_-"><span class="_38my _5803">Indian Youth Forum \u0915\u0947 \u092c\u093e\u0930\u0947 \u092e\u0947\u0902<span class="_c1c"></span></span><div class="_3s3-"></div></div><div><div><div class="_37p5"><div class="clearfix"><img class="_37p7 _8o _8r lfloat _ohe img" height="100" src="https://scontent.fdel6-1.fna.fbcdn.net/v/t1.0-0/c5.0.100.100/p100x100/16708216_1083815345075324_1809238266151282211_n.jpg?oh=cdc9096728fec80a0147133a6b1599d6&amp;oe=59E5EFDB" alt="" /><div class="_8u _42ef"><div class="_37p8"><div class="_50f4"><span class="fwb"><a class="profileLink" href="https://www.facebook.com/IyfIndianyouthforum/">Indian Youth Forum</a></span></div><div class="_37p9 _50f3">News &amp; Media Website</div><div class="_37pa _50f3">We find and tell stories of people doing good to inspire global action. Because we&#039;re convinced each of us has the power to make the world better .</div></div></div></div></div></div></div></div><div class="_4-u2 _3xaf _3-95 _4-u8"><div class="_4-u3 _5dwa _5dwb _57_-"><span class="_38my _5803">\u0938\u094d\u0925\u093e\u0928 \u0915\u0947 \u092c\u093e\u0930\u0947 \u092e\u0947\u0902<span class="_c1c"></span></span><div class="_3s3-"></div></div><div class="_37p6"><div><div><div><div class="_4sdm _6lh _dcs"><div class="_5hv6"><div class="_6lp"><div class="_6ln fsxxl fwb"><a href="https://www.facebook.com/iitd.delhi/" data-ft="&#123;&quot;tn&quot;:&quot;k&quot;&#125;">IIT Delhi</a></div><div class="_6lo ellipsis fsm fwn fcg">\u0915\u0949\u0932\u0947\u091c \u0914\u0930 \u092f\u0942\u0928\u093f\u0935\u0930\u094d\u0938\u093f\u091f\u0940</div></div></div><div class="uiScaledImageContainer _6li _6l-" style="width:100%"><img class="scaledImageFitWidth img" src="https://scontent.fdel6-1.fna.fbcdn.net/v/t1.0-0/p320x320/1660351_782270428467190_610794429_n.jpg?oh=4b4957698cf37eaa2621307fc3c61b8f&amp;oe=59E14DBB" style="top:-60px;" alt="&#039;Picture credit: Arshad Nasser (2013JDS6003) M.Des- Industrial Design&#039;" width="480" height="320" /></div><a class="_8xh" href="https://www.facebook.com/iitd.delhi/" style="width:100%" data-ft="&#123;&quot;tn&quot;:&quot;k&quot;&#125;"></a><a class="_3aml" href="https://www.facebook.com/iitd.delhi/" style="width:100%"></a><div class="clearfix _5kun"><a class="_6ll lfloat _ohe" href="https://www.facebook.com/iitd.delhi/" data-ft="&#123;&quot;tn&quot;:&quot;k&quot;&#125;"><div class="_6lm _4m78"><div class="uiScaledImageContainer profilePic" style="width: 96px; height: 96px"><img class="scaledImageFitWidth img" src="https://scontent.fdel6-1.fna.fbcdn.net/v/t1.0-1/p100x100/255575_512250575469178_612128240_n.jpg?oh=e2bf449617f68eac2b8cd02d7c35a513&amp;oe=59A0C926" alt="IIT Delhi" width="96" height="96" /></div></div></a><div class="_6lk _42ef"><div><div class="_8yb"><div>2,82,390 \u092a\u0938\u0902\u0926</div><div>2,019 \u0932\u094b\u0917 \u0907\u0938 \u092c\u093e\u0930\u0947 \u092e\u0947\u0902 \u092c\u093e\u0924 \u0915\u0930 \u0930\u0939\u0947 \u0939\u0948\u0902</div></div></div></div></div></div></div></div></div></div><div class="_4z-w"><a class="_4b4x" href="https://www.facebook.com/iitd.delhi/" id="u_0_k">\u092a\u0947\u091c \u092a\u0930 \u091c\u093e\u090f\u0901</a></div></div><div class="_4-u2 _3xaf _3-95 _4-u8"><div class="_4x0f"><div class="_4x0g"><div class="_4x0d _4x0e"><div class="_41dr _4x0c"><span><img class="_s0 _41ds _54ru img" src="https://scontent.fdel6-1.fna.fbcdn.net/v/t1.0-1/c4.15.32.32/p40x40/15747342_1195628017184471_1949447432837553984_n.jpg?oh=54f25e123a74d63f279279ee62318a79&amp;oe=59B5B106" alt="" aria-label="Jha Ayush" role="img" /></span></div><div class="_41dr _4x0c"><a href="https://www.facebook.com/IyfIndianyouthforum/"><img class="_s0 _41ds _54ru img" src="https://scontent.fdel6-1.fna.fbcdn.net/v/t1.0-1/p32x32/15541314_1041942845929241_1722198877754933119_n.jpg?oh=973e318ede53168d58f6e7be835583c0&amp;oe=59A926CC" alt="" aria-label="Indian Youth Forum" role="img" /></a></div><div class="_41dr _4x0c"><a href="https://www.facebook.com/kumeshyadav"><img class="_s0 _41ds _54ru img" src="https://scontent.fdel6-1.fna.fbcdn.net/v/t1.0-1/p32x32/15337627_10153988267585286_2118657580809154297_n.jpg?oh=182fa980f18ed2d94c6717f8de3af7ad&amp;oe=599BC3CD" alt="" aria-label="Kumesh Yadav" role="img" /></a></div><div class="_41dr _4x0c"><span><img class="_s0 _41ds _54ru img" src="https://scontent.fdel6-1.fna.fbcdn.net/v/t1.0-1/p32x32/15965812_10158191872490352_4833263074795798396_n.jpg?oh=ce18a15878fc5814539a57aed4c0446b&amp;oe=59A47E1F" alt="" aria-label="Kanika Gupta" role="img" /></span></div></div></div><div class="_4x0h">\u091a\u0930\u094d\u091a\u093e \u092e\u0947\u0902 12 \u092a\u094b\u0938\u094d\u091f.</div></div><div class="_4z-w"><a class="_4b4x" href="/events/1407771472571452/?active_tab=discussion" id="u_0_l">\u091a\u0930\u094d\u091a\u093e \u0926\u0947\u0916\u0947\u0902</a></div></div></div> --></code></div>]

Above is the part of code from which i need to scrape the text in div class = '_publicProdFeedInfo__timeRowTitle _5xhk' and as i am scraping it shows encoded text like this :

<div class="_publicProdFeedInfo__timeRowTitle _5xhk" content="2017-07-28T21:30:00-07:00 to 2017-07-29T05:00:00-07:00"><span><span itemprop="startDate">29 जुलाई</span></span> <span title="09:30 अपराह्न आपके समय में">10:00 पूर्वाह्न</span> - <span title="05:00 पूर्वाह्न आपके समय में">05:30 अपराह्न UTC+05:30</span></div>

While text is present in the source code of the url :https://www.facebook.com/events/1407771472571452/

can you please tell me how can i resolve it

Here is the python code that i m using

import urllib2
from bs4 import BeautifulSoup
facebook="https://www.facebook.com/events/1407771472571452/"
page = urllib2.urlopen(facebook)
soup = BeautifulSoup(page, 'lxml')
data = soup.findAll("div", {"class": "hidden_elem"})
for item in data:
             commentedHTML = item.find('code').contents[0]
             more_soup = BeautifulSoup(commentedHTML, 'lxml')
             wanted_text = more_soup.findAll('div', {'class': '_publicProdFeedInfo__timeRowTitle _5xhk'})
             if wanted_text:
                gotdata2 = (wanted_text[0])

                print gotdata2

回答1:

Upon reading response do decoding from UTF-8:

page = urllib2.urlopen(facebook)
soup = BeautifulSoup(page.read().decode('utf-8', 'ignore'), 'lxml)

NOTE: ignore was added in order to avoid failing due to existed invalid UTF-8 characters, with it those would be removed while parsing.



回答2:

Identify the div element, then the code element within it. The comment is available as the string of this code and can be passed for parsing to BeautifulSoup. Once you have another soup made of the contents of the comment you can process it as you would any other.

>>> import bs4
>>> import requests
>>> page = requests.get('https://www.facebook.com/events/1407771472571452/').text
>>> soup = bs4.BeautifulSoup(page, 'lxml')
>>> div = soup.find('div', attrs={'class':"hidden_elem"})
>>> code = div.find('code')
>>> soup_2 = bs4.BeautifulSoup(code.string, 'lxml')
>>> soup_2.findAll('a')
[<a class="_5z74" href="/events/dialog/public_guest_list/?acontext%5Bref%5D=51&amp;acontext%5Bsource%5D=1&amp;acontext%5Baction_history%5D=%5B%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22surface%22%2C%22extra_data%22%3A%5B%5D%7D%2C%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22guest_list%22%2C%22extra_data%22%3A%5B%5D%7D%5D&amp;acontext%5Bhas_source%5D=1&amp;event_id=1407771472571452" rel="dialog" role="button">601 Going · 3.3K Interested</a>, <a ajaxify="#" class="_42ft _4jy0 _i8v _3-8w rfloat _ohf _4jy4 _517h _51sy" data-testid="event_invite_button" href="#" rel="dialog" role="button"><i class="_3-8_ _3-8_ img sp__Uck8Egf9Z1 sx_deb798"></i>Invite</a>]

Edit: If I do what is suggested in the comment this is what appears.

>>> divs_2 = soup_2.findAll('div')
>>> for item in divs_2:
...     item.contents
...     
[<div class="_4-u3 _5z73"><div class="clearfix"><div class="lfloat _ohe"><a class="_5z74" href="/events/dialog/public_guest_list/?acontext%5Bref%5D=51&amp;acontext%5Bsource%5D=1&amp;acontext%5Baction_history%5D=%5B%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22surface%22%2C%22extra_data%22%3A%5B%5D%7D%2C%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22guest_list%22%2C%22extra_data%22%3A%5B%5D%7D%5D&amp;acontext%5Bhas_source%5D=1&amp;event_id=1407771472571452" rel="dialog" role="button">602 Going · 3.3K Interested</a><div class="_5z7d">Share this event with your friends</div></div><a ajaxify="#" class="_42ft _4jy0 _i8v _3-8w rfloat _ohf _4jy4 _517h _51sy" data-testid="event_invite_button" href="#" rel="dialog" role="button"><i class="_3-8_ _3-8_ img sp__Uck8Egf9Z1 sx_deb798"></i>Invite</a></div></div>]
[<div class="clearfix"><div class="lfloat _ohe"><a class="_5z74" href="/events/dialog/public_guest_list/?acontext%5Bref%5D=51&amp;acontext%5Bsource%5D=1&amp;acontext%5Baction_history%5D=%5B%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22surface%22%2C%22extra_data%22%3A%5B%5D%7D%2C%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22guest_list%22%2C%22extra_data%22%3A%5B%5D%7D%5D&amp;acontext%5Bhas_source%5D=1&amp;event_id=1407771472571452" rel="dialog" role="button">602 Going · 3.3K Interested</a><div class="_5z7d">Share this event with your friends</div></div><a ajaxify="#" class="_42ft _4jy0 _i8v _3-8w rfloat _ohf _4jy4 _517h _51sy" data-testid="event_invite_button" href="#" rel="dialog" role="button"><i class="_3-8_ _3-8_ img sp__Uck8Egf9Z1 sx_deb798"></i>Invite</a></div>]
[<div class="lfloat _ohe"><a class="_5z74" href="/events/dialog/public_guest_list/?acontext%5Bref%5D=51&amp;acontext%5Bsource%5D=1&amp;acontext%5Baction_history%5D=%5B%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22surface%22%2C%22extra_data%22%3A%5B%5D%7D%2C%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22guest_list%22%2C%22extra_data%22%3A%5B%5D%7D%5D&amp;acontext%5Bhas_source%5D=1&amp;event_id=1407771472571452" rel="dialog" role="button">602 Going · 3.3K Interested</a><div class="_5z7d">Share this event with your friends</div></div>, <a ajaxify="#" class="_42ft _4jy0 _i8v _3-8w rfloat _ohf _4jy4 _517h _51sy" data-testid="event_invite_button" href="#" rel="dialog" role="button"><i class="_3-8_ _3-8_ img sp__Uck8Egf9Z1 sx_deb798"></i>Invite</a>]
[<a class="_5z74" href="/events/dialog/public_guest_list/?acontext%5Bref%5D=51&amp;acontext%5Bsource%5D=1&amp;acontext%5Baction_history%5D=%5B%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22surface%22%2C%22extra_data%22%3A%5B%5D%7D%2C%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22guest_list%22%2C%22extra_data%22%3A%5B%5D%7D%5D&amp;acontext%5Bhas_source%5D=1&amp;event_id=1407771472571452" rel="dialog" role="button">602 Going · 3.3K Interested</a>, <div class="_5z7d">Share this event with your friends</div>]
['Share this event with your friends']

For me, the simpler case might be to try to request the page in English, to avoid the need to translate the strings that are encoded in some other language. I have no experience of this but you might try investigating what options are available with requests or urllib2 for making a request like this.



回答3:

Well Finally After Many Tries I fixed It BY Specifying Language In Request Header:

url:https://www.facebook.com/events/1407771472571452/
headers = {"Accept-Language": "en-US,en;q=0.5"}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text,'lxml')