python-在给定字符串中的两个特定单词之间打印单词

如果一个特定的单词不以另一个特定的单词结尾,则保留它.这是我的字符串:

x = 'john got shot dead. john with his .... ? , john got killed or died in 1990. john with his wife dead or died'

我想打印和计算约翰与死者或死者或死者之间的所有单词.
如果约翰没有以任何死亡或死亡或死亡的话结尾.别管它.再次以john word开始.

我的代码:

x = re.sub(r'[^\w]', ' ', x)  # removed all dots, commas, special symbols

for i in re.findall(r'(?<=john)' + '(.*?)' + '(?=dead|died|death)', x):
    print i
    print len([word for word in i.split()])

我的输出:

 got shot 
2
 with his          john got killed or 
6
 with his wife 
3

我想要的输出:

got shot
2
got killed or
3
with his wife
3

我不知道我在哪里做错.
这只是一个示例输入.我必须一次检查20,000个输入.

最佳答案

您可以使用此负前瞻正则表达式:

>>> for i in re.findall(r'(?<=john)(?:(?!john).)*?(?=dead|died|death)', x):
...     print i.strip()
...     print len([word for word in i.split()])
...

got shot
2
got killed or
3
with his wife
3

代替您的.*?这个正则表达式正在使用(?:( ?! john).)*?仅当此匹配项中不存在john时,它才会惰性地匹配0个或多个字符.

我还建议使用单词边界使其与完整单词匹配:

re.findall(r'(?<=\bjohn\b)(?:(?!\bjohn\b).)*?(?=\b(?:dead|died|death)\b)', x)

Code Demo