How to replace all those Special Characters with w

How to replace all those special characters with white spaces in python ?

I have a list of names of a company . . .

Ex:-[myfiles.txt]

MY company.INC

Old Wine pvt

master-minds ltd

"apex-labs ltd"

"India-New corp"

Indo-American pvt/ltd

Here, as per the above example . . . I need all the special characters[-,",/,.] in the file myfiles.txt must be replaced with a single white space and saved into another text file myfiles1.txt.

Can anyone please help me out?

标签： python replace special-characters whitespace text-files

5条回答

smile是对你的礼貌

2楼-- · 2019-05-22 12:47

import re
strs = "how much for the maple syrup? $20.99? That's ricidulous!!!"
strs = re.sub(r'[?|$|.|!]',r'',strs) #for remove particular special char
strs = re.sub(r'[^a-zA-Z0-9 ]',r'',strs) #for remove all characters
strs=''.join(c if c not in map(str,range(0,10)) else '' for c in strs) #for remove numbers
strs = re.sub('  ',' ',strs) #for remove extra spaces
print(strs) 

Ans: how much for the maple syrup Thats ricidulous

0人赞添加讨论(0) 举报

叼着烟拽天下

3楼-- · 2019-05-22 12:49

import string

specials = '-"/.' #etc
trans = string.maketrans(specials, ' '*len(specials))
#for line in file
cleanline = line.translate(trans)

e.g.

>>> line = "Indo-American pvt/ltd"
>>> line.translate(trans)
'Indo American pvt ltd'

0人赞添加讨论(0) 举报

叼着烟拽天下

4楼-- · 2019-05-22 12:57

Assuming you mean to change everything non-alphanumeric, you can do this on the command line:

cat foo.txt | sed "s/[^A-Za-z0-99]/ /g" > bar.txt

Or in Python with the re module:

import re
original_string = open('foo.txt').read()
new_string = re.sub('[^a-zA-Z0-9\n\.]', ' ', original_string)
open('bar.txt', 'w').write(new_string)

0人赞添加讨论(0) 举报

太酷不给撩

5楼-- · 2019-05-22 13:06

At first i thought to provide a string.maketrans/translate example, but maybe you are using some utf-8 encoded strings and the ord() sorted translate-table will blow in your face, so i thought about another solution:

conversion = '-"/.'
text =  f.read()
newtext = ''
for c in text:
    newtext += ' ' if c in conversion else c

It's not the fastest way, but easy to grasp and modify.

So if your text is non-ascii you could decode conversion and the text-strings to unicode and afterwards reencode in whichever encoding you want to.

0人赞添加讨论(0) 举报

放荡不羁爱自由

6楼-- · 2019-05-22 13:11

While maketrans is the fastes way to do it, I never remerber the syntax. Since speed is rarely an issue and I know regular expression, I would tend to do this:

>>> line = "-[myfiles.txt] MY company.INC"
>>> import re
>>> re.sub(r'[^a-zA-Z0-9]', ' ',line)
'  myfiles txt  MY company INC'

This has the additional benefit of declaring the character you accept instead of the one you reject, which feels easier in this case.

Of couse if you are using non ASCII caracters you'll have to go back to removing the characters you reject. If there are just punctuations sign, you can do:

>>> import string
>>> chars = re.escape(string.punctuation)
>>> re.sub(r'['+chars+']', ' ',line)
'  myfiles txt  MY company INC'

But you'll notice

0人赞添加讨论(0) 举报

How to replace all those Special Characters with w

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间