Sed to replace variable length string between 2 known patterns -
i'd able replace string between 2 known patterns. catch want replace string of same length composed of 'x'.
let's have file containing:
hello.stringtobereplaced.secondstring hello.shortstring.secondstring i'd output this:
hello.xxxxxxxxxxxxxxxxxx.secondstring hello.xxxxxxxxxxx.secondstring
using sed loops
you can use sed, though thinking required not wholly obvious:
sed ':a;s/^\(hello\.x*\)[^x]\(.*\.secondstring\)/\1x\2/;t a' this gnu sed; bsd (mac os x) sed , other versions may fussier , require:
sed -e ':a' -e 's/^\(hello\.x*\)[^x]\(.*\.secondstring\)/\1x\2/' -e 't a' the logic identical in both:
- create label
a - substitute lead string , sequence of
x's (capture 1), followed non-x, , arbitrary other data plus second string (capture 2), , replace contents of capture 1,x, content of capture 2. - if
s///command made change, go labela.
it stops substituting when there no non-x's between 2 marker strings.
two tweaks regex allow code recognize 2 copies of pattern on single line. lose ^ anchors match beginning of line, , change .* [^.]* (so regex not quite greedy):
$ echo hello.stringtobereplaced.secondstring hello.stringtobereplaced.secondstring | > sed ':a;s/\(hello\.x*\)[^x]\([^.]*\.secondstring\)/\1x\2/;t a' hello.xxxxxxxxxxxxxxxxxx.secondstring hello.xxxxxxxxxxxxxxxxxx.secondstring $ using hold space
hek2mgl suggests alternative approach in sed using hold space. can implemented using:
$ echo hello.stringtobereplaced.secondstring | > sed 's/^\(hello\.\)\([^.]\{1,\}\)\(\.secondstring\)/\1@\3@@\2/ > h > s/.*@@// > s/./x/g > g > s/\(x*\)\n\([^@]*\)@\([^@]*\)@@.*/\2\1\3/ > ' hello.xxxxxxxxxxxxxxxxxx.secondstring $ this script not robust looping version works ok written when each line matches lead-middle-tail pattern. first splits line 3 sections: first marker, bit mangled, , second marker. reorganizes 2 markers separated @, followed @@ , bit mangled. h copies result hold space. remove , including @@; replace each character in bit mangled x, copy material in hold space after x's in pattern space, newline separating them. finally, recognize , capture x's, lead marker, , tail marker, ignoring newline, @ , @@ plus trailing material, , reassemble lead marker, x's, , tail marker.
to make robust, you'd recognize pattern , group commands shown inside { , } group them they're executed when pattern recognized:
sed '/^\(hello\.\)\([^.]\{1,\}\)\(\.secondstring\)/{ s/^\(hello\.\)\([^.]\{1,\}\)\(\.secondstring\)/\1@\3@@\2/ h s/.*@@// s/./x/g g s/\(x*\)\n\([^@]*\)@\([^@]*\)@@.*/\2\1\3/ }' adjust suit needs...
adjusting suit needs
[i tried 1 of solutions , worked fine.] when try replace 'hello' real string (which '
1.2.840.') , second string (which dot '.'), things stop working. guess these dots confusesedcommand. try achieve transform '1.2.840.10008.' '1.2.840.xxxxx.'and pattern happens several times in file variable number of characters replaced between '
1.2.840.' , next dot '.'
there times when important question close enough real scenario — may 1 such. dot metacharacter in sed regular expressions (and in other dialects of regular expression — shell globbing being noticeable exception). if 'bit mangled' digits, can tighten regular expressions, though (when @ code ahead) tightening isn't imposing in way of restriction.
pretty solution using regular expressions balancing act has pit convenience , abbreviation against reliability , precision.
revised code plus data
cat <<eof | transform '1.2.840.10008.' '1.2.840.xxxxx.' ok, , hence 1.2.840.21. , 1.2.840.20992. should lose 21 , 20992. eof sed ':a;s/\(1\.2\.840\.x*\)[^x.]\([^.]*\.\)/\1x\2/;t a' example output:
transform '1.2.840.xxxxx.' '1.2.840.xxxxx.' ok, , hence 1.2.840.xx. , 1.2.840.xxxxx. should lose 21 , 20992. the changes in script are:
sed ':a;s/\(1\.2\.840\.x*\)[^x.]\([^.]*\.\)/\1x\2/;t a' - add
1\.2\.840\.start pattern. - revise 'character replace' expression 'not
xor.'. - use
\.tail pattern.
you replace [^x.] [0-9] if you're sure want digits matched, in case won't have worry spaces discussed below.
you may decide don't want spaces matched casual comment like:
the net prefix 1.2.840. , there other prefixes too. does not end as:
the net prefix 1.2.840.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. in case, need use:
sed ':a;s/\(1\.2\.840\.x*\)[^x. ]\([^ .]*\.\)/\1x\2/;t a' and changes continue until you've got precise enough want without doing don't want on current data set. writing bullet-proof regular expressions requires precise specification of want matched, , can quite hard.
Comments
Post a Comment