sql - Number of palindromes in character strings -
i'm trying gather list of 6 letter palindromes , number of times occur using postgres 9.3.5.
this query i've tried:
select word, count(*) ( select regexp_split_to_table(read_sequence, '([atcg])([atcg])([atcg])(\3)(\2)(\1)') word reads ) t group word;
however brings results a) aren't palindromic , b) greater or less 6 letters long.
\d reads table "public.reads" column | type | modifiers --------------+---------+----------- read_header | text | not null read_sequence | text | option | text | quality_score | text | pair_end | text | not null species_id | integer | indexes: "reads_pkey" primary key, btree (read_header, pair_end)
read_sequence
contains dna sequences, 'atgctgatgcggcgtagctggatcga'
example.
i'd see number of palindromes in each sequence example contain 1 sequence have 4 3 , on.
count per row:
select read_header, pair_end, substr(read_sequence, i, 6) word, count(*) ct reads r , generate_series(1, length(r.read_sequence) - 5 ) substr(read_sequence, i, 6) ~ '([atcg])([atcg])([atcg])\3\2\1' group 1,2,3 order 1,2,3,4 desc;
count per read_header
, palindrome:
select read_header, substr(read_sequence, i, 6) word, count(*) ct ... group 1,2 order 1,2,3 desc;
count per read_header
:
select read_header, count(*) ct ... group 1 order 1,2 desc;
count per palindrome:
select substr(read_sequence, i, 6) word, count(*) ct ... group 1 order 1,2 desc;
explain
a palindrome start @ position 5 characters shy of end allow length of 6. , palindromes can overlap. so:
generate list of possible starting positions
generate_series()
inlateral
join, , based on possible 6-character strings.test palindrome regular expression references, similar had,
regexp_split_to_table()
not right function here. use regular expression match (~
).aggregate, depending on want.
Comments
Post a Comment