parsing - How can I extract some data out of the middle of a noisy file using Perl 6? -
i using idiomatic perl 6.
i found wonderful contiguous chunk of data buried in noisy output file.
i print out header line starting cluster unique
, of lines following it, to, not including, first occurrence of empty line. here's file looks like:
</path/to/projects/projectname/parametersweep/1000.1.7.dir> used working directory. .... cluster unique sequences reads rpm 1 31 3539 3539 2 25 2797 2797 3 17 1679 1679 4 21 1636 1636 5 14 1568 1568 6 13 1548 1548 7 7 1439 1439 input file: "../../filename.count.fa" ...
here's want parsed out:
cluster unique sequences reads rpm 1 31 3539 3539 2 25 2797 2797 3 17 1679 1679 4 21 1636 1636 5 14 1568 1568 6 13 1548 1548 7 7 1439 1439
i using idiomatic perl 6.
in perl, idiomatic way locate chunk in file read file in paragraph mode, stop reading file when find chunk interested in. if reading 10gb file, , chunk found @ top of file, it's inefficient continue reading rest of file--much less perform if test on every line in file.
in perl 6, can read paragraph @ time this:
my $fname = 'data.txt'; $infile = open( $fname, nl => "\n\n", #set perl considers end of line. ); #removed die() per brad gilbert's comment. $infile.lines() -> $para { if $para ~~ /^ 'cluster unique'/ { $para.chomp; last; #quit reading file. } } $infile.close; # ^ match start of string. # 'cluster unique' default, whitespace insignificant in perl6 regex. quotes 1 way make whitespace significant.
however, in perl6 rakudo/moarvm
open()
function not read nl
argument correctly, can't set paragraph mode.
also, there idioms considered bad practice, like:
postfix if statements, e.g.
say 'hello' if $y == 0
.relying on implicit
$_
variable in code, e.g..say
so, depending on side of fence live on, considered bad practice in perl.
Comments
Post a Comment