s = 'k1:text k2: more text k3:andk4: more yet'
key_list = ['k1','k2','k3']
(missing code)
# s_dict = {'k1':'text', 'k2':'more text', 'k3':'andk4: more yet'}
在这种情况下,键必须以空格,换行符开头,或者为字符串的第一个字符,并且必须(紧随其后)以冒号开头,否则它们不会被解析为键.因此,在示例中,k1,k2和k3被读取为键,而k4是k3值的一部分.我也删除了尾随空格,但认为这是可选的.
>>> import re
>>> dict(re.findall(r'(?:(?<=\s)|(?<=^))(\S+?):(.*?)(?=\s[^\s:]+:|$)', s))
{'k1': 'text', 'k2': ' more text', 'k3': 'andk4: more yet'}
正则表达式需要一些反复试验.凝视它足够长的时间,您就会了解它的作用.
细节
(?:
(?<=\s) # lookbehind for a space
| # regex OR
(?<=^) # lookbehind for start-of-line
)
(\S+?) # non-greedy match for anything that isn't a space
: # literal colon
(.*?) # non-greedy match
(?= # lookahead (this handles the third key's case)
\s # space
[^\s:]+ # anything that is not a space or colon
: # colon
|
$ # end-of-line
)