Sed to replace variable length string between 2 known patterns -
i'd able replace string between 2 known patterns. catch want replace string of same length composed of 'x'.
let's have file containing:
hello.stringtobereplaced.secondstring hello.shortstring.secondstring
i'd output this:
hello.xxxxxxxxxxxxxxxxxx.secondstring hello.xxxxxxxxxxx.secondstring
using sed
loops
you can use sed
, though thinking required not wholly obvious:
sed ':a;s/^\(hello\.x*\)[^x]\(.*\.secondstring\)/\1x\2/;t a'
this gnu sed
; bsd (mac os x) sed
, other versions may fussier , require:
sed -e ':a' -e 's/^\(hello\.x*\)[^x]\(.*\.secondstring\)/\1x\2/' -e 't a'
the logic identical in both:
- create label
a
- substitute lead string , sequence of
x
's (capture 1), followed non-x
, , arbitrary other data plus second string (capture 2), , replace contents of capture 1,x
, content of capture 2. - if
s///
command made change, go labela
.
it stops substituting when there no non-x
's between 2 marker strings.
two tweaks regex allow code recognize 2 copies of pattern on single line. lose ^
anchors match beginning of line, , change .*
[^.]*
(so regex not quite greedy):
$ echo hello.stringtobereplaced.secondstring hello.stringtobereplaced.secondstring | > sed ':a;s/\(hello\.x*\)[^x]\([^.]*\.secondstring\)/\1x\2/;t a' hello.xxxxxxxxxxxxxxxxxx.secondstring hello.xxxxxxxxxxxxxxxxxx.secondstring $
using hold space
hek2mgl suggests alternative approach in sed
using hold space. can implemented using:
$ echo hello.stringtobereplaced.secondstring | > sed 's/^\(hello\.\)\([^.]\{1,\}\)\(\.secondstring\)/\1@\3@@\2/ > h > s/.*@@// > s/./x/g > g > s/\(x*\)\n\([^@]*\)@\([^@]*\)@@.*/\2\1\3/ > ' hello.xxxxxxxxxxxxxxxxxx.secondstring $
this script not robust looping version works ok written when each line matches lead-middle-tail pattern. first splits line 3 sections: first marker, bit mangled, , second marker. reorganizes 2 markers separated @
, followed @@
, bit mangled. h
copies result hold space. remove , including @@
; replace each character in bit mangled x
, copy material in hold space after x
's in pattern space, newline separating them. finally, recognize , capture x
's, lead marker, , tail marker, ignoring newline, @
, @@
plus trailing material, , reassemble lead marker, x
's, , tail marker.
to make robust, you'd recognize pattern , group commands shown inside {
, }
group them they're executed when pattern recognized:
sed '/^\(hello\.\)\([^.]\{1,\}\)\(\.secondstring\)/{ s/^\(hello\.\)\([^.]\{1,\}\)\(\.secondstring\)/\1@\3@@\2/ h s/.*@@// s/./x/g g s/\(x*\)\n\([^@]*\)@\([^@]*\)@@.*/\2\1\3/ }'
adjust suit needs...
adjusting suit needs
[i tried 1 of solutions , worked fine.] when try replace 'hello' real string (which '
1.2.840.
') , second string (which dot '.
'), things stop working. guess these dots confusesed
command. try achieve transform '1.2.840.10008.
' '1.2.840.xxxxx.
'and pattern happens several times in file variable number of characters replaced between '
1.2.840.
' , next dot '.
'
there times when important question close enough real scenario — may 1 such. dot metacharacter in sed
regular expressions (and in other dialects of regular expression — shell globbing being noticeable exception). if 'bit mangled' digits, can tighten regular expressions, though (when @ code ahead) tightening isn't imposing in way of restriction.
pretty solution using regular expressions balancing act has pit convenience , abbreviation against reliability , precision.
revised code plus data
cat <<eof | transform '1.2.840.10008.' '1.2.840.xxxxx.' ok, , hence 1.2.840.21. , 1.2.840.20992. should lose 21 , 20992. eof sed ':a;s/\(1\.2\.840\.x*\)[^x.]\([^.]*\.\)/\1x\2/;t a'
example output:
transform '1.2.840.xxxxx.' '1.2.840.xxxxx.' ok, , hence 1.2.840.xx. , 1.2.840.xxxxx. should lose 21 , 20992.
the changes in script are:
sed ':a;s/\(1\.2\.840\.x*\)[^x.]\([^.]*\.\)/\1x\2/;t a'
- add
1\.2\.840\.
start pattern. - revise 'character replace' expression 'not
x
or.
'. - use
\.
tail pattern.
you replace [^x.]
[0-9]
if you're sure want digits matched, in case won't have worry spaces discussed below.
you may decide don't want spaces matched casual comment like:
the net prefix 1.2.840. , there other prefixes too.
does not end as:
the net prefix 1.2.840.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.
in case, need use:
sed ':a;s/\(1\.2\.840\.x*\)[^x. ]\([^ .]*\.\)/\1x\2/;t a'
and changes continue until you've got precise enough want without doing don't want on current data set. writing bullet-proof regular expressions requires precise specification of want matched, , can quite hard.
Comments
Post a Comment