Neo4j cypher query efficiency and syntax -


i attempting query ontology of health represented acyclic, directed graph in neo4j v2.1.5. database consists of 2 million nodes , 5 million edges/relationships. following query identifies nodes subsumed disease concept , caused particular bacteria or of bacteria subtypes follows:

match p = (a:objectconcept{disease}) <-[:isa*]- (b:objectconcept),  q=(c:objectconcept{bacteria})<-[:isa*]-(d:objectconcept)  not (b)-->()--(c) , not (b)-->()-->(d)  return distinct b.sctid, b.fsn 

this query runs in < 1 second , returns correct answers. however, adding 1 additional parameter adds substantial time (20 minutes). example:

match p = (a:objectconcept{disease}) <-[:isa*]- (b:objectconcept),  q=(c:objectconcept{bacteria})<-[:isa*]-(d:objectconcept), t=(e:objectconcept{bacteria})<-[:isa*]-(f:objectconcept),  not (b)-->()--(c)   , not (b)-->()-->(d)   , not (b)-->()-->(e)   , not (b)-->()-->(f) return distinct b.sctid, b.fsn 

i new cypher coding, have imagine there better way write query more efficient. how collections improve this?

thanks

i answered on google group:

hi scott,

i presume created indexes or constraints :objectconcept(name) ?

i working acyclic, directed graph (an ontology) models human health , needing identify diseases (example: pneumonia) infectious not caused bacteria (staph or streptococcus). concepts nodes defined objectconcepts. objectconcepts connected relationships such [isa], [pathological_process], [causative_agent], etc.

the query requires:

a) identification of concepts subsumed concept pneumonia follows:

match p = (a:objectconcept{pneumonia}) <-[:isa*]- (b:objectconcept) returns number of paths, potentially millions, can check match p = (a:objectconcept{pneumonia}) <-[:isa*]- (b:objectconcept) return count(*)  

b) identification of concepts subsumed genus staph , genus strep (including concept genus staph , genus strep) follows. note:

with b match (b) q = (c:objectconcept{strep})<-[:isa*]-(d:objectconcept), h = (e:objectconcept{staph})<-[:isa*]-(f:objectconcept)

this cross product of paths "p", "q" , "h", e.g. if 3 of them return 1000 paths, you're @ 1bn paths !!

c) identify nodes(p) not have causative agent of strep (i.e., nodes(q)) or staph (nodes(h)) follows:

with b,c,d,e,f match (b),(c),(d),(e),(f) (b)--()-->(c) or (b)-->()-->(d) or (b)-->()-->(e) or (b)-->()-->(f) return distinct b.name;

you don't need or match (b),(c),(d),(e),(f)

what connections there between b , other nodes ? have concrete ones? first there missing 1 direction.

the clause can problem, in general want show perhaps query better reproduced union of simpler matches

e.g

match (a:objectconcept{pneumonia}) <-[:isa*]- (b:objectconcept)-->()-->(c:objectconcept{name:strep}) return b.name union match (a:objectconcept{pneumonia}) <-[:isa*]- (b:objectconcept)-->()-->(e:objectconcept{name:staph}) return b.name union match (a:objectconcept{pneumonia}) <-[:isa*]- (b:objectconcept)-->()-->(d:objectconcept)-[:isa*]->(c:objectconcept{name:strep}) return b.name union match (a:objectconcept{pneumonia}) <-[:isa*]- (b:objectconcept)-->()-->(d:objectconcept)-[:isa*]->(c:objectconcept{name:staph}) return b.name 

another option utilize shortestpath() function find 1 or shortest path(s) between pneumonia , bacteria rel-types , direction.

perhaps can share dataset , expected result.


Comments

Popular posts from this blog

c++ - Delete matches in OpenCV (Keypoints and descriptors) -

java - Could not locate OpenAL library -

sorting - opencl Bitonic sort with 64 bits keys -