python - Idiomatic way to process lists and dicts -
recently i've been discovering "the python way" of processing lists, dicts, sets, etc. extent i've modified function calculate first n
prime numbers, let's call 'pure dict' version:
def findprimeuptog(x): n_primes = {} in range(2, x): if not in n_primes.values(): n_primes[i]=i k, v in n_primes.items(): if > v: n_primes[k] += k return sorted(n_primes)
this function works follows:
maintain list of primes , list of integer multiples of same primes in dict
those multiples should bigger or equal integer
i
if number not present in list of integer multiples of existing primes must prime , added list of primes
increase
i
starting2
(smallest prime),x
return list of primes
i've rewritten function several times using lists, sets, version seems idiomatic one. short , reads easily.
if kind enough let me know if written more clearly, please comment love read it.
and question: first version of function not clean , more c-like:
def findprimeupto(x): primes = [] n_primes = [] in range(2, x): if not in n_primes: primes.append(i) n_primes.append(i) j in range(len(primes)): if > n_primes[j]: n_primes[j] += primes[j] return primes
but first version absolutely fastest when run pypy
compiler:
python3:
primes to: 100000 algo: clean version , primes: 9592, time: 102.74523687362671 algo: dict, set , primes: 9592, time: 58.230621337890625 algo: **firstversion** , primes: 9592, time: 59.945680379867554 algo: list of list[1] , primes: 9592, time: 71.41077852249146 algo: list of mutablenum , primes: 9592, time: 292.3777365684509 algo: **pure dict** , primes: 9592, time: 56.381882667541504
pypy (ver 2.3.1):
primes to: 100000 algo: clean version , primes: 9592, time: 29.3849189281 algo: dict, set , primes: 9592, time: 85.8557658195 algo: **firstversion** , primes: 9592, time: 1.11557507515 algo: list of list , primes: 9592, time: 42.2394959927 algo: list of mutablenum , primes: 9592, time: 38.764893055 algo: **pure dict** , primes: 9592, time: 110.416568995
i understand performance hit 'pure dict' version got due fact did not use iterators in loops, still speedup 'firstversion' got phenomenal.
as of our code end being compiled in production, should write code in more c-like fashion , not idiomatic python?
edit:
to remove confusion whether should have used lists instead of dict i'm submitting version of function call 'clean version'. version uses no direct access nth element of list, instead iterates on lists in believe pythonistic way (btw version similar lisp version of same code :)
def findprimeuptob(x): primes = [] n_primes = [] in range(2, x): if not (i in n_primes): primes.append(i) n_primes.append(i) new_n_primes = [] prime, n_prime in zip(primes, n_primes): if > n_prime: new_n_primes.append(prime + n_prime) else: new_n_primes.append(n_prime) n_primes = new_n_primes return primes
yes, if care performance, 'first version' way go. can see what's going on using cprofile.
for reference, on pypy 2.5.0, running python -m cprofile -s cumulative x.py
'first version' gives me:
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.727 0.727 x.py:1(<module>) 1 0.724 0.724 0.727 0.727 x.py:29(findprimeupto) 99999 0.002 0.000 0.002 0.000 {len} 99999 0.001 0.000 0.001 0.000 {range} 19184 0.001 0.000 0.001 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.profiler' objects}
otoh, 'pure dict', get:
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 16.864 16.864 x.py:1(<module>) 1 1.441 1.441 16.864 16.864 x.py:1(findprimeuptog) 99998 12.376 0.000 12.376 0.000 {method 'items' of 'dict' objects} 99998 3.047 0.000 3.047 0.000 {method 'values' of 'dict' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.profiler' objects} 1 0.000 0.000 0.000 0.000 {len} 1 0.000 0.000 0.000 0.000 {range}
which shows of time wasted creating temporary lists n_primes.items()
, n_primes.values()
.
now, there easy fix this: replacing .items()
, .values()
respective iterator versions, .iteritems()
, .itervalues()
. however, result still slower list version, because dicts more complicated structure lists, , low-level dict operations therefore slower equivalent list operations:
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 3.155 3.155 x.py:1(<module>) 1 3.147 3.147 3.155 3.155 x.py:15(findprimeuptoh) 99998 0.006 0.000 0.006 0.000 {method 'itervalues' of 'dict' objects} 99998 0.002 0.000 0.002 0.000 {method 'iteritems' of 'dict' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.profiler' objects} 1 0.000 0.000 0.000 0.000 {len} 1 0.000 0.000 0.000 0.000 {range}
finally, 'clean version' rather bad, since creates new n_primes
list on every iteration. indeed, time @ 21.795 seconds.
conclusion: creating new containers (lists, dicts, ...) slow, avoid whenever can. also, dicts slower lists. in problem, don't need dicts, should use lists.
Comments
Post a Comment