我有以下组织模式语法:
** Hardware [0/1]
- [ ] adapt a programmable motor to a tripod to be used for panning
** Reading - Technology [1/6]
- [X] Introduction to Networking - Charles Severance
- [ ] A Tour of C++ - Bjarne Stroustrup
- [ ] C++ How to Program - Paul Deitel
- [X] Computer Systems - Randal Bryant
- [ ] The C programming language - Brian Kernighan
- [ ] Beginning Linux Programming -Matthew and Stones
** Reading - Health [3/4]
- [ ] Patrick McKeown - The Oxygen Advantage
- [X] Total Knee Health - Martin Koban
- [X] Supple Leopard - Kelly Starrett
- [X] Convict Conditioning 1 and 2
我想提取项目,例如:
getitems "Hardware"
我应该得到:
- [ ] adapt a programmable motor to a tripod to be used for panning
如果我要求“阅读 - 健康”,我应该得到:
- [ ] Patrick McKeown - The Oxygen Advantage
- [X] Total Knee Health - Martin Koban
- [X] Supple Leopard - Kelly Starrett
- [X] Convict Conditioning 1 and 2
我正在使用以下模式:
pattern = re.compile("\*\* "+ head + " (.+?)\*?$", re.DOTALL)
请求“Reading - Technology”时的输出是:
- [X] Introduction to Networking - Charles Severance
- [ ] A Tour of C++ - Bjarne Stroustrup
- [ ] C++ How to Program - Paul Deitel
- [X] Computer Systems - Randal Bryant
- [ ] The C programming language - Brian Kernighan
- [ ] Beginning Linux Programming -Matthew and Stones
** Reading - Health [3/4]
- [ ] Patrick McKeown - The Oxygen Advantage
- [X] Total Knee Health - Martin Koban
- [X] Supple Leopard - Kelly Starrett
- [X] Convict Conditioning 1 and 2
我也试过:
pattern = re.compile("\*\* "+ head + " (.+?)[\*|\z]", re.DOTALL)
最后一个适用于除最后一个之外的所有标题。
请求“Reading - Health”时的输出:
- [ ] Patrick McKeown - The Oxygen Advantage
- [X] Total Knee Health - Martin Koban
- [X] Supple Leopard - Kelly Starrett
如您所见,它与最后一行不匹配。
我正在使用 python 2.7 和 findall。
请您参考如下方法:
你可以用
import re
string = """
** Hardware [0/1]
- [ ] adapt a programmable motor to a tripod to be used for panning
** Reading - Technology [1/6]
- [X] Introduction to Networking - Charles Severance
- [ ] A Tour of C++ - Bjarne Stroustrup
- [ ] C++ How to Program - Paul Deitel
- [X] Computer Systems - Randal Bryant
- [ ] The C programming language - Brian Kernighan
- [ ] Beginning Linux Programming -Matthew and Stones
** Reading - Health [3/4]
- [ ] Patrick McKeown - The Oxygen Advantage
- [X] Total Knee Health - Martin Koban
- [X] Supple Leopard - Kelly Starrett
- [X] Convict Conditioning 1 and 2
"""
def getitems(section):
rx = re.compile(r'^\*{2} ' + re.escape(section) + r'.+[\n\r](?P<block>(?:(?!^\*{2})[\s\S])+)', re.MULTILINE)
try:
items = rx.search(string)
return items.group('block')
except:
return None
items = getitems('Reading - Technology')
print(items)
代码的核心是(浓缩)表达式:
^\*{2}.+[\n\r] # match the beginning of the line, followed by two stars, anything else in between and a newline
(?P<block> # open group "block"
(?: # non-capturing group
(?!^\*{2}) # a neg. lookahead, making sure no ** follows at the beginning of a line
[\s\S] # any character...
)+ # ...at least once
) # close group "block"
在 ** 之后插入搜索字符串的位置在实际代码中。查看 Reading - Technology 的演示在 regex101.com 。
作为后续行动,您也可以只返回选定的值,如下所示:
def getitems(section, selected=None):
rx = re.compile(r'^\*{2} ' + re.escape(section) + r'.+[\n\r](?P<block>(?:(?!^\*{2})[\s\S])+)', re.MULTILINE)
try:
items = rx.search(string).group('block')
if selected:
rxi = re.compile(r'^ - \[X\]\ (.+)', re.MULTILINE)
try:
selected_items = rxi.findall(items)
return selected_items
except:
return None
return items
except:
return None
items = getitems('Reading - Health', selected=True)
print(items)
