从一个文本文件作为输入,我需要替换在输入列表中找到的单词。 输出是相同的文本文件,但找到的单词需要替换为:<repl>matached_word</repl>
. 我为此构建了一系列循环,但我无法复制相同的文本文件。我尝试使用 20 行字符串文本文件,但输出有数百万行重复。
这是一个例子。 输入文本文件可以是:
bucharest sdfsadf
sofia sdf sdf dsf
vienna etc
etc
can
sdfds
22
rdf
fd
paris
Paris
我试过的代码是:
# input files
input_file = r"....\input_txt_test.txt"
list_names = ["Bucharest", "bucharest", "vienna", "Paris", "buc"]
out_file = r"....\output_txt_test.txt"
# Perform replacement
with open(out_file, 'w') as outfile:
with open(input_file, 'r') as f:
text = f.readlines()
for line in text:
line_sp = line.split(" ")
for name in list_names:
for word in line_sp:
if name in word:
strreplace = '''<repl>%s</repl>''' % name
repl = line.replace(name, strreplace)
outfile.write(repl)
else:
outfile.write(line)
我期望这样的输出:
<repl>bucharest</repl> sdfsadf
sofia sdf sdf dsf
<repl>vienna</repl> etc
etc
can
sdfds
22
rdf
fd
paris
<repl>Paris</repl>
但这就是我得到的:
bucharest sdfsadf
bucharest sdfsadf
<repl>bucharest</repl> sdfsadf
bucharest sdfsadf
bucharest sdfsadf
bucharest sdfsadf
bucharest sdfsadf
bucharest sdfsadf
<repl>buc</repl>harest sdfsadf
bucharest sdfsadf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
sofia sdf sdf dsf
vienna etc
vienna etc
vienna etc
vienna etc
<repl>vienna</repl> etc
vienna etc
vienna etc
vienna etc
vienna etc
vienna etc
etc
etc
etc
etc
etc
can
can
can
can
can
sdfds
sdfds
sdfds
sdfds
sdfds
22
22
22
22
22
rdf
rdf
rdf
rdf
rdf
fd
fd
fd
fd
fd
paris
paris
paris
paris
paris
ParisParisParis<repl>Paris</repl>Paris
此外,我在 list_names 中有“buc”字符串,但没有单词匹配该字符串,它仍在被插入到输出文件中。 如何执行此匹配和文件写入?谢谢!
请您参考如下方法:
在这里,您读取 input.txt 中的每一行 line
,如果您在给定的 list_names
中找到一个单词,则在 line
换一个新的。之后,将 line
保存到输出文件并继续检查:
# input files
input_file = r"....\input_txt_test.txt"
list_names = ["Bucharest", "bucharest", "vienna", "Paris", "buc"]
out_file = r"....\output_txt_test.txt"
# Perform replacement
with open(out_file, 'w') as outfile:
with open(input_file, 'r') as f:
text = f.readlines()
for line in text:
line_sp = line.split(" ")
for word in line_sp:
if word in list_names:
replaced_word = "<repl>{}</repl>".format(word)
line = line.replace(word, replaced_word)
outfile.write(line)