(Python) Identify the missing character and replac

2020-06-30 03:18发布

问题:

my_string = "        Name         Last_Name              Place"
my_string_another = "Aman         Raparia                India"

I have two string which I have provided above and this is not an output of CSV. At present what I am doing is that I read the first string and convert to a list like this

my_string = my_string.strip("\r\n")
my_string = my_string.split(" ")
my_string[:] = [elem for elem in my_string if elem != ""]

which provides the output in the format of

my_string = ['Name', 'Last_Name', 'Place']

Similary I do this for my_string_another to produce another list as

my_another_string = ["Aman", "Raparia", "India"]

Hence I can easily create a dict object.

The problem occurs when my_string_another is missing one of the fields like:-

my_string_another = "Aman                             India"

When I use my same logic to convert the my_string_another to a list it produces

my_string_another = ["Aman", "India"]

So that when I map them together it will be mapped to the Last Name, not to Place.

Is there a way I can get the output in the format of:-

 my_another_string = ["Aman", "NA", "India"]

So that when I map both the String they are matched properly.

回答1:

You could use the re module:

>>> import re
>>> my_string = "        Name         Last_Name              Place"
>>> my_string_another = "Aman         Raparia                India"
>>> re.search('(\S+)\s+(\S*)\s+(\S+)',my_string).groups()
('Name', 'Last_Name', 'Place')
>>> re.search('(\S+)\s+(\S*)\s+(\S+)',my_string_another).groups()
('Aman', 'Raparia', 'India')
>>> my_string_another = "Aman                             India"
>>> re.search('(\S+)\s+(\S*)\s+(\S+)',my_string_another).groups()
('Aman', '', 'India')

This roughly means: capture three groups of non-white-spaces characters. The middle one is optionnal.

You can then use list comprehension to change the empty string by NA:

>>> m = re.search('(\S+)\s+(\S*)\s+(\S+)',my_string_another).groups()
>>> m = [i if i else 'NA' for i in m]
>>> m
['Aman', 'NA', 'India']