Iterating over a dictionary in python and strippin

2019-02-17 13:56发布

I am working with the web scraping framework Scrapy and I am a bit of a noob when it comes to python. So I am wondering how do I iterate over all of the scraped items which seem to be in a dictionary and strip the white space from each one.

Here is the code I have been playing with in my item pipeline.:

for info in item:
   info[info].lstrip()

But this code does not work, because I cannot select items individually. So I tried to do this:

for key, value item.items():
   value[1].lstrip()

This second method works to a degree, but the problem is that I have no idea how then to loop over all of the values.

I know this is probably such an easy fix, but I cannot seem to find it. Any help would be greatly appreciated. :)

7条回答
我只想做你的唯一
2楼-- · 2019-02-17 13:58

Not a direct answer to the question, but I would suggest you look at Item Loaders and input/output processors. A lot of your cleanup can be take care of here.

An example which strips each entry would be:

class ItemLoader(ItemLoader):

    default_output_processor = MapCompose(unicode.strip)
查看更多
成全新的幸福
3楼-- · 2019-02-17 14:03

Try

for k,v in item.items():
   item[k] = v.replace(' ', '')

or in a comprehensive way as suggested by monkut:

newDic = {k,v.replace(' ','') for k,v in item.items()}
查看更多
贪生不怕死
4楼-- · 2019-02-17 14:05

In a dictionary comprehension (available in Python >=2.7):

clean_d = { k:v.strip() for k, v in d.iteritems()}
查看更多
欢心
5楼-- · 2019-02-17 14:06

Although @zquare had the best answer for this question, I feel I need to chime in with a Pythonic method that will also account for dictionary values that are not strings. This is not recursive mind you, as it only works with one dimensional dictionary objects.

d.update({k: v.lstrip() for k, v in d.items() if isinstance(v, str) and v.startswith(' ')})

This updates the original dictionary value if the value is a string and starts with a space.

UPDATE: If you want to use Regular Expressions and avoid using starts with and endswith. You can use this:

import re
rex = re.compile(r'^\s|\s$')
d.update({k: v.strip() for k, v in d.items() if isinstance(v, str) and rex.search(v)})

This version strips if the value has a leading or trailing white space character.

查看更多
做个烂人
6楼-- · 2019-02-17 14:08

What you should note is that lstrip() returns a copy of the string rather than modify the object. To actually update your dictionary, you'll need to assign the stripped value back to the item.

For example:

for k, v in your_dict.iteritems():
    your_dict[k] = v.lstrip()

Note the use of .iteritems() which returns an iterator instead of a list of key value pairs. This makes it somewhat more efficient.

I should add that in Python3, .item() has been changed to return "views" and so .iteritems() would not be required.

查看更多
趁早两清
7楼-- · 2019-02-17 14:10

Assuming you would like to strip the values of yourDict creating a new dict called newDict:

newDict = dict(zip(yourDict.keys(), [v.strip() if isinstance(v,str) else v for v in yourDict.values()]))

This code can handle multi-type values, so will avoid stripping int, float, etc.

查看更多
登录 后发表回答