python - How do I load data in many-to-many relation using pony orm? -


here entities:

class article(db.entity):     id = primarykey(int, auto=true)     creation_time = required(datetime)     last_modification_time = optional(datetime, default=datetime.now)     title = required(str)     contents = required(str)     authors = set('author')   class author(db.entity):     id = primarykey(int, auto=true)     first_name = required(str)     last_name = required(str)     articles = set(article) 

and here code i'm using data:

return left_join((article, author) article in entities.article                  author in article.authors).prefetch(entities.author)[:] 

whether i'm using prefetch method or not, generated sql looks same:

select distinct "article"."id", "t-1"."author" "article" "article"   left join "article_author" "t-1"     on "article"."id" = "t-1"."article" 

and when iterated on results, pony issuing yet query (queries):

select "id", "creation_time", "last_modification_time", "title", "contents" "article" "id" = %(p1)s  select "id", "first_name", "last_name" "author" "id" in (%(p1)s, %(p2)s) 

the desired behavior me if orm issue 1 query load data needed. how achieve that?

author of ponyorm here. don't want load objects using 1 query, because inefficient.

the benefit of using single query load many-to-many relation reduce number of round-trips database. if replace 3 queries one, not major improvement. when database server located near application server these round-trips fast, comparing processing resulted data in python.

on other side, when both sides of many-to-many relation loaded using same query, inevitable same object's data repeated on , on in multiple rows. has many drawbacks:

  1. the size of data transferred database became larger compared situation when no duplicate information transferred. in example, if have ten articles, , each written 3 authors, single query return thirty rows, large fields article.contents duplicated multiple times. separate queries transfer minimum amount of data possible, difference in size may order of magnitude depending on specific many-to-many relation.

  2. the database server written in compiled language c , works fast. same true networking layer. python code interpreted, , time consumed python code (contrary opinions) more time spent in database. can see profiling tests performed sqlalchemy author mike bayer after came conclusion:

    a great misconception seem encounter notion communication database takes majority of time spent in database-centric python application. perhaps common wisdom in compiled languages such c or maybe java, not in python. python slow, compared such systems (...) whether database driver (dbapi) written in pure python or in c incur significant additional python-level overhead. dbapi alone, can as order of magnitude slower.

    when data of many-to-many relation loaded using same query , same data repeated in many rows, it necessary parse of repeated data in python throw out of them. python slowest part of process, such "optimization" may lead decreased performance.

    as support words can point django orm. orm has 2 methods can used query optimization. first one, called select_related loads related objects in single query, while more added method called prefetch_related loads objects in way pony default. according django users second method works much faster:

    in scenarios, have found 30% speed improvement.

  3. the database required perform joins consume precious resources of database server.

    while python code slowest part when processing single request, database server cpu time shared resource used parallel requests. can scale python code starting multiple python processes on different servers, harder scale database. because of this, in high-load application better offload useful work database server application server, work can done in parallel multiple application servers.

    when database performs join needs spend additional time doing it. pony irrelevant if database make join or not, because in case object interlinked inside orm identity map. so work database doing when perform join useless spend of database time. on other hand, using identity map pattern pony can link objects equally fast regardless of whether provided in same database row or not.

returning number of round-trips, pony have dedicated mechanism eliminate "n+1 query" problem. "n+1 query" anti-pattern arises when orm sends hundreds of similar queries each of them loads separate object database. many orms suffers problem. pony can detect , replace repeated n queries single query loads necessary objects @ once. mechanism efficient , can reduce number of round-trips. when speak loading many-to-many relation, there no n queries here, there 3 queries more efficient when executed separately, there no benefit in trying execute single query instead.

to summarize, need orm performance important us, pony orm developers. , because of that, don't want implement loading many-to-many relation in single query, slower our current solution.

so, answer question, cannot load both side of many-to-many relation in single query. , think thing.


Comments

Popular posts from this blog

c++ - Delete matches in OpenCV (Keypoints and descriptors) -

java - Could not locate OpenAL library -

sorting - opencl Bitonic sort with 64 bits keys -