parsing - How can I extract some data out of the middle of a noisy file using Perl 6? -


i using idiomatic perl 6.

i found wonderful contiguous chunk of data buried in noisy output file.

i print out header line starting cluster unique , of lines following it, to, not including, first occurrence of empty line. here's file looks like:

</path/to/projects/projectname/parametersweep/1000.1.7.dir> used working directory. ....  cluster unique sequences    reads   rpm 1   31  3539    3539 2   25  2797    2797 3   17  1679    1679 4   21  1636    1636 5   14  1568    1568 6   13  1548    1548 7   7   1439    1439  input file: "../../filename.count.fa" ... 

here's want parsed out:

cluster unique sequences    reads   rpm 1   31  3539    3539 2   25  2797    2797 3   17  1679    1679 4   21  1636    1636 5   14  1568    1568 6   13  1548    1548 7   7   1439    1439 

i using idiomatic perl 6.

in perl, idiomatic way locate chunk in file read file in paragraph mode, stop reading file when find chunk interested in. if reading 10gb file, , chunk found @ top of file, it's inefficient continue reading rest of file--much less perform if test on every line in file.

in perl 6, can read paragraph @ time this:

my $fname = 'data.txt';  $infile = open(     $fname,      nl => "\n\n",   #set perl considers end of line. );  #removed die() per brad gilbert's comment.   $infile.lines() -> $para {       if $para ~~ /^ 'cluster unique'/ {         $para.chomp;         last;   #quit reading file.     } }  $infile.close;  #    ^                   match start of string. #   'cluster unique'     default, whitespace insignificant in perl6 regex. quotes 1 way make whitespace significant.    

however, in perl6 rakudo/moarvm open() function not read nl argument correctly, can't set paragraph mode.

also, there idioms considered bad practice, like:

  1. postfix if statements, e.g. say 'hello' if $y == 0.

  2. relying on implicit $_ variable in code, e.g. .say

so, depending on side of fence live on, considered bad practice in perl.


Comments

Popular posts from this blog

c++ - Delete matches in OpenCV (Keypoints and descriptors) -

java - Could not locate OpenAL library -

sorting - opencl Bitonic sort with 64 bits keys -