I am working with the web scraping framework Scrapy and I am a bit of a noob when it comes to python. So I am wondering how do I iterate over all of the scraped items which seem to be in a dictionary and strip the white space from each one.
Here is the code I have been playing with in my item pipeline.:
for info in item:
info[info].lstrip()
But this code does not work, because I cannot select items individually. So I tried to do this:
for key, value item.items():
value[1].lstrip()
This second method works to a degree, but the problem is that I have no idea how then to loop over all of the values.
I know this is probably such an easy fix, but I cannot seem to find it. Any help would be greatly appreciated. :)
Not a direct answer to the question, but I would suggest you look at Item Loaders and input/output processors. A lot of your cleanup can be take care of here.
An example which strips each entry would be:
Try
or in a comprehensive way as suggested by monkut:
In a dictionary comprehension (available in Python >=2.7):
Although @zquare had the best answer for this question, I feel I need to chime in with a Pythonic method that will also account for dictionary values that are not strings. This is not recursive mind you, as it only works with one dimensional dictionary objects.
This updates the original dictionary value if the value is a string and starts with a space.
UPDATE: If you want to use Regular Expressions and avoid using starts with and endswith. You can use this:
This version strips if the value has a leading or trailing white space character.
What you should note is that
lstrip()
returns a copy of the string rather than modify the object. To actually update your dictionary, you'll need to assign the stripped value back to the item.For example:
Note the use of
.iteritems()
which returns an iterator instead of a list of key value pairs. This makes it somewhat more efficient.I should add that in Python3,
.item()
has been changed to return "views" and so.iteritems()
would not be required.Assuming you would like to strip the values of
yourDict
creating a newdict
callednewDict
:This code can handle multi-type values, so will avoid stripping
int
,float
, etc.