python - Idiomatic way to process lists and dicts -


recently i've been discovering "the python way" of processing lists, dicts, sets, etc. extent i've modified function calculate first n prime numbers, let's call 'pure dict' version:

def findprimeuptog(x):     n_primes = {}      in range(2, x):          if not in n_primes.values():             n_primes[i]=i          k, v in n_primes.items():             if > v:                 n_primes[k] += k      return sorted(n_primes) 

this function works follows:

  1. maintain list of primes , list of integer multiples of same primes in dict

  2. those multiples should bigger or equal integer i

  3. if number not present in list of integer multiples of existing primes must prime , added list of primes

  4. increase i starting 2 (smallest prime), x

  5. return list of primes

i've rewritten function several times using lists, sets, version seems idiomatic one. short , reads easily.

if kind enough let me know if written more clearly, please comment love read it.

and question: first version of function not clean , more c-like:

def findprimeupto(x):     primes = []     n_primes = []      in range(2, x):          if not in n_primes:             primes.append(i)             n_primes.append(i)          j in range(len(primes)):             if > n_primes[j]:                 n_primes[j] += primes[j]      return primes 

but first version absolutely fastest when run pypy compiler:

python3:

primes to: 100000  algo: clean version       , primes: 9592, time: 102.74523687362671  algo: dict, set           , primes: 9592, time: 58.230621337890625  algo: **firstversion**        , primes: 9592, time: 59.945680379867554  algo: list of list[1]        , primes: 9592, time: 71.41077852249146  algo: list of mutablenum  , primes: 9592, time: 292.3777365684509  algo: **pure dict**           , primes: 9592, time: 56.381882667541504 

pypy (ver 2.3.1):

primes to: 100000  algo: clean version       , primes: 9592, time: 29.3849189281  algo: dict, set           , primes: 9592, time: 85.8557658195  algo: **firstversion**        , primes: 9592, time: 1.11557507515  algo: list of list        , primes: 9592, time: 42.2394959927  algo: list of mutablenum  , primes: 9592, time: 38.764893055  algo: **pure dict**           , primes: 9592, time: 110.416568995 

i understand performance hit 'pure dict' version got due fact did not use iterators in loops, still speedup 'firstversion' got phenomenal.

as of our code end being compiled in production, should write code in more c-like fashion , not idiomatic python?

edit:

to remove confusion whether should have used lists instead of dict i'm submitting version of function call 'clean version'. version uses no direct access nth element of list, instead iterates on lists in believe pythonistic way (btw version similar lisp version of same code :)

def findprimeuptob(x):     primes = []     n_primes = []      in range(2, x):          if not (i in n_primes):             primes.append(i)             n_primes.append(i)          new_n_primes = []          prime, n_prime in zip(primes, n_primes):             if > n_prime:                 new_n_primes.append(prime + n_prime)             else:                 new_n_primes.append(n_prime)          n_primes = new_n_primes      return primes 

yes, if care performance, 'first version' way go. can see what's going on using cprofile.

for reference, on pypy 2.5.0, running python -m cprofile -s cumulative x.py 'first version' gives me:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)      1    0.000    0.000    0.727    0.727 x.py:1(<module>)      1    0.724    0.724    0.727    0.727 x.py:29(findprimeupto)  99999    0.002    0.000    0.002    0.000 {len}  99999    0.001    0.000    0.001    0.000 {range}  19184    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}      1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.profiler' objects} 

otoh, 'pure dict', get:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)      1    0.000    0.000   16.864   16.864 x.py:1(<module>)      1    1.441    1.441   16.864   16.864 x.py:1(findprimeuptog)  99998   12.376    0.000   12.376    0.000 {method 'items' of 'dict' objects}  99998    3.047    0.000    3.047    0.000 {method 'values' of 'dict' objects}      1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.profiler' objects}      1    0.000    0.000    0.000    0.000 {len}      1    0.000    0.000    0.000    0.000 {range} 

which shows of time wasted creating temporary lists n_primes.items() , n_primes.values().

now, there easy fix this: replacing .items() , .values() respective iterator versions, .iteritems() , .itervalues(). however, result still slower list version, because dicts more complicated structure lists, , low-level dict operations therefore slower equivalent list operations:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)      1    0.000    0.000    3.155    3.155 x.py:1(<module>)      1    3.147    3.147    3.155    3.155 x.py:15(findprimeuptoh)  99998    0.006    0.000    0.006    0.000 {method 'itervalues' of 'dict' objects}  99998    0.002    0.000    0.002    0.000 {method 'iteritems' of 'dict' objects}      1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.profiler' objects}      1    0.000    0.000    0.000    0.000 {len}      1    0.000    0.000    0.000    0.000 {range} 

finally, 'clean version' rather bad, since creates new n_primes list on every iteration. indeed, time @ 21.795 seconds.

conclusion: creating new containers (lists, dicts, ...) slow, avoid whenever can. also, dicts slower lists. in problem, don't need dicts, should use lists.


Comments

Popular posts from this blog

c++ - Delete matches in OpenCV (Keypoints and descriptors) -

java - Could not locate OpenAL library -

sorting - opencl Bitonic sort with 64 bits keys -