Skip to main content
 首页 » 编程设计

python之替换文本中匹配的字符串

2024年10月01日17zhoujg

从一个文本文件作为输入,我需要替换在输入列表中找到的单词。 输出是相同的文本文件,但找到的单词需要替换为:<repl>matached_word</repl> . 我为此构建了一系列循环,但我无法复制相同的文本文件。我尝试使用 20 行字符串文本文件,但输出有数百万行重复。

这是一个例子。 输入文本文件可以是:

bucharest sdfsadf 
sofia sdf sdf dsf  
vienna etc 
etc 
can 
sdfds 
22 
rdf 
 
fd 
paris 
Paris 

我试过的代码是:

# input files 
input_file = r"....\input_txt_test.txt" 
list_names = ["Bucharest", "bucharest", "vienna", "Paris", "buc"] 
out_file = r"....\output_txt_test.txt" 
 
# Perform replacement 
with open(out_file, 'w') as outfile: 
    with open(input_file, 'r') as f: 
        text = f.readlines() 
        for line in text: 
            line_sp = line.split(" ") 
            for name in list_names: 
                for word in line_sp: 
                    if name in word: 
                        strreplace = '''<repl>%s</repl>''' % name 
                        repl = line.replace(name, strreplace) 
                        outfile.write(repl) 
                    else: 
                        outfile.write(line) 

我期望这样的输出:

<repl>bucharest</repl> sdfsadf 
sofia sdf sdf dsf  
<repl>vienna</repl> etc 
etc 
can 
sdfds 
22 
rdf 
 
fd 
paris 
<repl>Paris</repl> 

但这就是我得到的:

bucharest sdfsadf 
bucharest sdfsadf 
<repl>bucharest</repl> sdfsadf 
bucharest sdfsadf 
bucharest sdfsadf 
bucharest sdfsadf 
bucharest sdfsadf 
bucharest sdfsadf 
<repl>buc</repl>harest sdfsadf 
bucharest sdfsadf 
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
sofia sdf sdf dsf  
vienna etc 
vienna etc 
vienna etc 
vienna etc 
<repl>vienna</repl> etc 
vienna etc 
vienna etc 
vienna etc 
vienna etc 
vienna etc 
etc 
etc 
etc 
etc 
etc 
can 
can 
can 
can 
can 
sdfds 
sdfds 
sdfds 
sdfds 
sdfds 
22 
22 
22 
22 
22 
rdf 
rdf 
rdf 
rdf 
rdf 
 
 
 
 
 
fd 
fd 
fd 
fd 
fd 
paris 
paris 
paris 
paris 
paris 
ParisParisParis<repl>Paris</repl>Paris 

此外,我在 list_names 中有“buc”字符串,但没有单词匹配该字符串,它仍在被插入到输出文件中。 如何执行此匹配和文件写入?谢谢!

请您参考如下方法:

在这里,您读取 input.txt 中的每一行 line,如果您在给定的 list_names 中找到一个单词,则在 line 换一个新的。之后,将 line 保存到输出文件并继续检查:

# input files 
input_file = r"....\input_txt_test.txt" 
list_names = ["Bucharest", "bucharest", "vienna", "Paris", "buc"] 
out_file = r"....\output_txt_test.txt" 
 
# Perform replacement 
with open(out_file, 'w') as outfile: 
    with open(input_file, 'r') as f: 
        text = f.readlines() 
        for line in text: 
            line_sp = line.split(" ") 
            for word in line_sp: 
                if word in list_names: 
                    replaced_word = "<repl>{}</repl>".format(word) 
                    line = line.replace(word, replaced_word) 
            outfile.write(line)