Replace punctuation symbols in Python
Below is an explanation of how to replace punctuation symbols with whitespace in Python.
import re, string
def remove_punctuation(text):
return re.sub('[%s]' % re.escape(string.punctuation), ' ', text)
Calling the previous function:
>>> remove_punctuation("El perro, de San Roque, no tiene rabo; ni nunca lo ha tenido.")
We will get this output:
'El perro de San Roque no tiene rabo ni nunca lo ha tenido '
We could make the function a little bit more generic to replace punctuation symbols with any other string.
import re, string
def replace_punctuation(text, replace):
return re.sub('[%s]' % re.escape(string.punctuation), replace, text)
Calling the function to replace punctuation symbols by "[stop]":
>>> replace_punctuation(
"El perro, de San Roque, no tiene rabo; ni nunca lo ha tenido.",
'[stop]')
# output
'El perro[stop] de San Roque[stop] no tiene rabo[stop] ni nunca lo ha tenido[stop]'