python - Parallelism Speed -
i'm trying handle on python parallelism. code i'm using
import time concurrent.futures import processpoolexecutor def listmaker(): in xrange(10000000): pass #without duo core start = time.time() listmaker() end = time.time() nocore = "total time, no core, %.3f" % (end- start) #with duo core start = time.time() pool = processpoolexecutor(max_workers=2) #i have 2 cores results = list(pool.map(listmaker())) end = time.time() core = "total time core, %.3f" % (end- start) print nocore print core
i under assumption because i'm using 2 cores speed be close double. when run code of time nocore
output faster core
output. true if change
def listmaker(): in xrange(10000000): pass
to
def listmaker(): in xrange(10000000): print
in fact in runs no core
run faster. shed light on issue? i'm setup correct? i'm doing wrong?
you're using pool.map()
incorrectly. take @ pool.map documentation. expects iterable argument, , pass each of items iterable pool individually. since function returns none
, there nothing do. however, you're still incurring overhead of spawning processes, takes time.
your usage of pool.map
should this:
results = pool.map(function_name, some_iterable)
notice couple of things:
- since you're using print statement rather function, i'm assuming you're using python2 variant. in python2,
pool.map
returns list anyway. no need convert list again. - the first argument should function name without parentheses. identifies function pool workers should execute. when include parentheses, function called right there, instead of in pool.
pool.map
intended call function on every item in iterable, test cases needs create iterable consume, instead of function takes no arguments current example.
try run trial again actual input function, , retrieve output. here's example:
import time concurrent.futures import processpoolexecutor def read_a_file(file_name): open(file_name) fi: text = fi.read() return text file_list = ['t1.txt', 't2.txt', 't3.txt'] #without duo core start = time.time() single_process_text_list = [] file_name in file_list: single_process_text_list.append(read_a_file(file_name)) end = time.time() nocore = "total time, no core, %.3f" % (end- start) #with duo core start = time.time() pool = processpoolexecutor(max_workers=2) #i have 2 cores multiprocess_text_list = pool.map(read_a_file, file_list) end = time.time() core = "total time core, %.3f" % (end- start) print(nocore) print(core)
results:
total time, no core, 0.047
total time core, 0.009
the text files 150,000 lines of gibberish each. notice how work had done before parallel processing worth it. when ran trial 10,000 lines in each file, single process approach still faster because didn't have overhead of spawning processes. work do, processes become worth effort.
and way, functionality available multiprocessing pools in python2, can avoid importing futures if want to.
Comments
Post a Comment