python - Parallelism Speed -


i'm trying handle on python parallelism. code i'm using

import time concurrent.futures import processpoolexecutor  def listmaker():     in xrange(10000000):         pass  #without duo core  start = time.time() listmaker() end = time.time() nocore = "total time, no core, %.3f" % (end- start)   #with duo core start = time.time() pool = processpoolexecutor(max_workers=2) #i have 2 cores results = list(pool.map(listmaker())) end = time.time() core = "total time core, %.3f" % (end- start)  print nocore print core 

i under assumption because i'm using 2 cores speed be close double. when run code of time nocore output faster core output. true if change

def listmaker():         in xrange(10000000):             pass 

to

def listmaker():         in xrange(10000000):             print 

in fact in runs no core run faster. shed light on issue? i'm setup correct? i'm doing wrong?

you're using pool.map() incorrectly. take @ pool.map documentation. expects iterable argument, , pass each of items iterable pool individually. since function returns none, there nothing do. however, you're still incurring overhead of spawning processes, takes time.

your usage of pool.map should this:

results = pool.map(function_name, some_iterable) 

notice couple of things:

  • since you're using print statement rather function, i'm assuming you're using python2 variant. in python2, pool.map returns list anyway. no need convert list again.
  • the first argument should function name without parentheses. identifies function pool workers should execute. when include parentheses, function called right there, instead of in pool.
  • pool.map intended call function on every item in iterable, test cases needs create iterable consume, instead of function takes no arguments current example.

try run trial again actual input function, , retrieve output. here's example:

import time concurrent.futures import processpoolexecutor  def read_a_file(file_name):     open(file_name) fi:         text = fi.read()     return text  file_list = ['t1.txt', 't2.txt', 't3.txt']  #without duo core start = time.time() single_process_text_list = [] file_name in file_list:     single_process_text_list.append(read_a_file(file_name)) end = time.time() nocore = "total time, no core, %.3f" % (end- start)   #with duo core start = time.time() pool = processpoolexecutor(max_workers=2) #i have 2 cores multiprocess_text_list = pool.map(read_a_file, file_list) end = time.time() core = "total time core, %.3f" % (end- start)  print(nocore) print(core) 

results:

total time, no core, 0.047
total time core, 0.009

the text files 150,000 lines of gibberish each. notice how work had done before parallel processing worth it. when ran trial 10,000 lines in each file, single process approach still faster because didn't have overhead of spawning processes. work do, processes become worth effort.

and way, functionality available multiprocessing pools in python2, can avoid importing futures if want to.


Comments

Popular posts from this blog

java - Could not locate OpenAL library -

c++ - Delete matches in OpenCV (Keypoints and descriptors) -

sorting - opencl Bitonic sort with 64 bits keys -