我在两个不同的文件夹中有两种具有相同文件名的文件,包含我需要预处理然后合并的不同信息。我一直在手动使用:
a = './location/ID01.csv'
df1 = pd.read_csv(a)
# and rest of codes to preprocess a
和其他文件
b = './log/ID01.csv'
df2 = pd.read_csv(b)
# and rest of codes to preprocess b
然后我手动合并每个使用
new_df = df2.merge(df1, on=['hour'], how='outer')
new_df.to_csv('merged.csv')
当然这很耗时。我怎样才能在循环中执行此操作,以便一次性处理两个文件夹中的所有文件?
请您参考如下方法:
你可以这样做:
import os
import pandas as pd
files_in_log = set(os.listdir('log'))
files_in_location = set(os.listdir('location'))
os.mkdir('results')
for filename in files_in_log & files_in_location:
df1 = pd.read_csv(os.path.join('log', filename))
df2 = pd.read_csv(os.path.join('location', filename))
new_df = df2.merge(df1, on=['hour'], how='outer')
new_df.to_csv(os.path.join('results', filename))