Skip to main content
 首页 » 编程设计

python之更改 Pandas 中的数据框样式

2025年04月02日20lyj

我有一个这样的数据框:

+----------+---------------+---------------+-------------+ 
| Old_City | New_City_Code | New_City_Name |Old_City_Code| 
+----------+---------------+---------------+-------------+ 
| a        | 101           | A             | 001         | 
+----------+---------------+---------------+-------------+ 
| b        | 101           | A             | 002         | 
+----------+---------------+---------------+-------------+ 
| c        | 102           | B             | 003         | 
+----------+---------------+---------------+-------------+ 
| d        | 103           | C             | 004         | 
+----------+---------------+---------------+-------------+ 
| e        | 103           | C             | 005         | 
+----------+---------------+---------------+-------------+ 
| f        | 103           | C             | 006         | 
+----------+---------------+---------------+-------------+ 

我想用 Pandas reshape 它。 reshape 后的表格应该是:

+---------------+---------------+-----------+-----------+-----------+-----------+-----------+-----------+ 
| New_City_Code | New_City_Name | Old_City1 | Old_City2 | Old_City3 | Old_Code1 | Old_Code2 | Old_Code3 | 
+---------------+---------------+-----------+-----------+-----------+-----------+-----------+-----------+ 
| 101           | A             | a         | b         |           | 001       | 002       |           | 
+---------------+---------------+-----------+-----------+-----------+-----------+-----------+-----------+ 
| 102           | B             | c         |           |           | 003       |           |           | 
+---------------+---------------+-----------+-----------+-----------+-----------+-----------+-----------+ 
| 103           | C             | d         | e         | f         | 004       | 005       | 006       | 
+---------------+---------------+-----------+-----------+-----------+-----------+-----------+-----------+ 

在 pandas 中是否有这种转换的方法(或者如果在 pandas 中没有,在 R 中)?我尝试了 pivot,但没有用(我收到错误 ValueError: cannot label index with a null key)。

请您参考如下方法:

您可以使用 groupbycumcount对于肌酸列 cols,然后是 pivot_table使用 aggfunc='first' 和最后一个 fillna通过 ''reset_index :

print df 
  Old_City  New_City_Code New_City_Name Old_City_Code 
0        a            101             A           001 
1        b            101             A           002 
2        c            102             B           003 
3        d            103             C           004 
4        e            103             C           005 
5        f            103             C           006 
 
#create columns names for pivoting 
df['cols'] = (df.groupby(['New_City_Name', 'New_City_Code']).cumcount() + 1).astype(str) 
 
print df   
  Old_City  New_City_Code New_City_Name Old_City_Code cols 
0        a            101             A           001    1 
1        b            101             A           002    2 
2        c            102             B           003    1 
3        d            103             C           004    1 
4        e            103             C           005    2 
5        f            103             C           006    3     
 
df = pd.pivot_table(df,  
                    index=['New_City_Name', 'New_City_Code'],  
                    columns=['cols'],  
                    values=['Old_City','Old_City_Code'],  
                    aggfunc='first') 
 
#remove multiindex in columns 
df.columns = [''.join(col) for col in df.columns.values] 
#replace NaN to '', reset index 
df = df.fillna('').reset_index() 
print df 
  New_City_Name  New_City_Code Old_City1 Old_City2 Old_City3 Old_City_Code1  \ 
0             A            101         a         b                      001    
1             B            102         c                                003    
2             C            103         d         e         f            004    
 
  Old_City_Code2 Old_City_Code3   
0            002                  
1                                 
2            005            006