python - Speeding up Loading of Pandas Sparse DataFrame -
i have large pickled sparse dataframe generated, since big hold in memory, had incrementally append generated, follows: with open(data.pickle, 'ab') output: pickle.dump(df.to_sparse(), output, pickle.highest_protocol) then in order read file following: df_2 = pd.dataframe([]).to_sparse() open(data.pickle, 'rb') pickle_file: try: while true: test = pickle.load(pickle_file) df_2 = pd.concat([df_2, test], ignore_index= true) except eoferror: pass given size of file(20 gb), method works, takes really long time. possible parallelize pickle.load/pd.concat steps quicker loading time? or there other suggestions speeding process up, on loading part of code. note: generation step done on computer less resources, that's why load step, done on more powerful machine, can hold df in memory. thanks! don't concat in loop! note in docs, maybe should warning df_list = [] open(data.pickle, 'rb'