import pandas as pd
df = pd.read_csv("data3.csv", index_col="DateTime")
df = df.reindex(pd.date_range("11-1-2014 12:00:00", "11-1-2014 12:10:00", freq="1min"), fill_value="NaN")
df.to_csv("test3.csv")
我正在读取的文件
NSERC_CB04_A0401
DateTime
11/1/2014 0:00 1.121889
11/1/2014 0:01 1.121889
11/1/2014 0:02 1.121889
11/1/2014 0:03 1.121889
11/1/2014 0:04 1.118503
11/1/2014 0:05 1.121889
11/1/2014 0:06 1.121889
11/1/2014 0:07 1.121889
11/1/2014 0:09 1.121889
11/1/2014 0:10 1.121889
我正在写的文件
NSERC_CB04_A0401
2014-11-01 12:00:00 NaN
2014-11-01 12:01:00 NaN
2014-11-01 12:02:00 NaN
2014-11-01 12:03:00 NaN
2014-11-01 12:04:00 NaN
2014-11-01 12:05:00 NaN
2014-11-01 12:06:00 NaN
2014-11-01 12:07:00 NaN
2014-11-01 12:08:00 NaN
2014-11-01 12:09:00 NaN
2014-11-01 12:10:00 NaN
我想要的是:
NSERC_CB04_A0401
DateTime
11/1/2014 0:00 1.121889
11/1/2014 0:01 1.121889
11/1/2014 0:02 1.121889
11/1/2014 0:03 1.121889
11/1/2014 0:04 1.118503
11/1/2014 0:05 1.121889
11/1/2014 0:06 1.121889
11/1/2014 0:07 1.121889
2014-11-01 12:08:00 NaN
11/1/2014 0:09 1.121889
11/1/2014 0:10 1.121889
您需要添加参数parse_dates = True为read_csv
才能将索引首先转换为DatetimIndex,然后再添加reindex
-从11-1-2014 12:00:00的开始时间更改为11-1-2014 00:00:00进行匹配,相似的结束时间.
另外,字符串NaN也不缺少值,您需要np.nan什么是重新索引中缺少数据的默认值.
df = pd.read_csv("data3.csv", index_col="DateTime", parse_dates=True)
df = df.reindex(pd.date_range("11-1-2014 00:00:00", "11-1-2014 00:10:00", freq="1min"))
print (df)
NSERC_CB04_A0401
2014-11-01 00:00:00 1.121889
2014-11-01 00:01:00 1.121889
2014-11-01 00:02:00 1.121889
2014-11-01 00:03:00 1.121889
2014-11-01 00:04:00 1.118503
2014-11-01 00:05:00 1.121889
2014-11-01 00:06:00 1.121889
2014-11-01 00:07:00 1.121889
2014-11-01 00:08:00 NaN
2014-11-01 00:09:00 1.121889
2014-11-01 00:10:00 1.121889
更通用的解决方案是按最小和最大日期时间重新索引,但这取决于您的数据:
df = df.reindex(pd.date_range(df.index.min(), df.index.max(), freq="1min"))
print (df)
NSERC_CB04_A0401
2014-11-01 00:00:00 1.121889
2014-11-01 00:01:00 1.121889
2014-11-01 00:02:00 1.121889
2014-11-01 00:03:00 1.121889
2014-11-01 00:04:00 1.118503
2014-11-01 00:05:00 1.121889
2014-11-01 00:06:00 1.121889
2014-11-01 00:07:00 1.121889
2014-11-01 00:08:00 NaN
2014-11-01 00:09:00 1.121889
2014-11-01 00:10:00 1.121889
如果索引解决方案中的重复项是resample
,且具有诸如平均值,总和之类的一些汇总函数,则还可能是resample docs:
print (df)
NSERC_CB04_A0401
DateTime
2014-11-01 00:00:00 1.121889
2014-11-01 00:01:00 1.121889
2014-11-01 00:02:00 1.121889
2014-11-01 00:03:00 1.121889
2014-11-01 00:04:00 1.118503
2014-11-01 00:05:00 1.121889
2014-11-01 00:06:00 1.121889
2014-11-01 00:07:00 1.121889 <- duplicates index
2014-11-01 00:07:00 1.121889 <- duplicates index
2014-11-01 00:09:00 1.121889
2014-11-01 00:10:00 1.121889
df = df.resample('1min').mean()
print (df)
NSERC_CB04_A0401
DateTime
2014-11-01 00:00:00 1.121889
2014-11-01 00:01:00 1.121889
2014-11-01 00:02:00 1.121889
2014-11-01 00:03:00 1.121889
2014-11-01 00:04:00 1.118503
2014-11-01 00:05:00 1.121889
2014-11-01 00:06:00 1.121889
2014-11-01 00:07:00 1.121889
2014-11-01 00:08:00 NaN
2014-11-01 00:09:00 1.121889
2014-11-01 00:10:00 1.121889