how to exclude all title with find?

i have function that get me all the titles from my website i dont want to get the title from some products is this the right way ? i dont want titles from products with the words "OLP NL" or "Arcserve" or "LicSAPk" or "symantec"

def get_title ( u ):
html = requests.get ( u )
bsObj = BeautifulSoup ( html.content, 'xml' )
title = str ( bsObj.title ).replace ( '<title>', '' ).replace ( '</title>', 
'' )
if (title.find ( 'Arcserve' ) or title.find ( 'OLP NL' ) or title.find ( 
'LicSAPk' ) or title.find (
        'Symantec' ) is not -1):
    return 'null'
else:
    return title

            if (title != 'null'):
            ws1 [ 'B1' ] = title
            meta_desc = get_metaDesc ( u )
            ws1 [ 'C1' ] = meta_desc
            meta_keyWrds = get_metaKeyWrds ( u )
            ws1 [ 'D1' ] = meta_keyWrds
            print ( "writing product no." + str ( i ) )
        else:
            print("skipped product no. " + str ( i ))
            continue;

the problem is that the program exclude all my products and all i'm seeing is "skipped product no." ? whay ? not all of them have these words ...

标签： python beautifulsoup find web-crawler

2条回答

Explosion°爆炸

2楼-- · 2019-09-20 18:38

You can change the if statement for (title.find ( 'Arcserve' )!=-1 or title.find ( 'OLP NL' )!=-1 or title.find ('LicSAPk' )!=-1 or title.find ('Symantec' )!=-1) or you can create a function to evaluate the terms that you want to find

def TermFind(Title):
    terms=['Arcserve','OLP NL','LicSAPk','Symantec']
    disc=False
    for val in terms:
        if Title.find(val)!=-1:
            disc=True
            break
    return disc

When I used the if statement always returned True regardless of the title value. I couldn't find an explanation for such behavior, but you can try checking this [Python != operation vs "is not" and [nested "and/or" if statements. Hope it helps.

0人赞添加讨论(0) 举报

欢心

3楼-- · 2019-09-20 18:46

A similar idea using any

import requests 
from bs4 import BeautifulSoup

url = 'https://www.cdsoft.co.il/index.php?id_product=300610&controller=product'
html = requests.get(url)
bsObj = BeautifulSoup(html.content, 'lxml')
title = str ( bsObj.title ).replace ( '<title>', '' ).replace ( '</title>', '' )
items = ['Arcserve','OLP NL','LicSAPk','Symantec']

if not any(item in title for item in items):
    print(title)

0人赞添加讨论(0) 举报

how to exclude all title with find?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间