Have a file eg. Inventory.conf with lines like:
Int/domain—home.dir=/etc/int
I need to replace /
and —
before the =
but not after.
Result should be:
Int_domain_home_dir=/etc/int
I have tried several sed
commands but none seem to fit my need.
With GNU sed:
Output:
See:
man sed
. I assume you want to replace dots too.Sed with a
t
loop (BRE):When one of the
-/—.
character is found, it's replaced with a_
. Following text up to=
is captured and output using backreference. If the previous substitution succeeds, thet
command loops to label:a
to check for further replacements.Edit:
If you're under BSD/Mac OSX (thanks @mklement0):
If
perl
solution is okay:^[^=]+
string matching from start of line up to but not including the first occurrence of=
$&=~s|[/.-]|_|gr
perform another substitution on matched string/
or.
or-
characters with_
r
modifier would return the modified stringe
modifier allows to use expression instead of string in replacement section#
is used as delimiter to avoid having to escape/
inside the character class[/.-]
Also, as suggested by @mklement0, we can use translate instead of inner substitute
Note that I've changed sample input,
-
is used instead of—
which is what OP seems to want based on commentsYou're asking for a
sed
solution, but anawk
solution is simpler and performs better in this case, because you can easily split the line into 2 fields by=
and then selectively applygsub()
to only the 1st field in order to replace the characters of interest:-F=
tellsawk
to split the input into fields by=
, which with the input at hand results in$1
(1st field) containing the first half of the line, before the=
, and$2
(2nd field) the 2nd half, after the=
; using the-F
option sets variableFS
, the input field separator.gsub("[./-]", "_", $1)
globally replaces all characters in set[./-]
with_
in$1
- i.e., all occurrences of either.
,/
or-
in the 1st field are replaced with a_
each.print $1 FS $2
prints the result: the modified 1st field ($1
), followed byFS
(which is=
), followed by the (unmodified) 2nd field ($2
).Note that I've used ASCII char.
-
(HYPHEN-MINUS, codepoint0x2d
) in theawk
script, even though your sample input contains the Unicode char.—
(EM DASH,U+2014
, UTF-8 encoding0xe2 0x80 0x94
).If you really want to match that, simply substitute it in the command above, but note that the
awk
version on macOS won't handle that properly.Another option is to use
iconv
with ASCII transliteration, which tranlates the em dash into a regular ASCII-
:perl
allows for an elegant solution too:-F=
, just like with Awk, tells Perl to use=
as the separator when splitting lines into fields-ane
activates field splitting (a
), turns off implicit output (n
), ande
tells Perl that the next argument is an expression (command string) to execute.The fields that each line is split into is stored in array
@F
, where$F[0]
refers to the 1st field.$F[0] =~ tr|-/.|-|
translates (replaces) all occurrences of-
,/
, and.
to_
.print join("=", @F)
rebuilds the input line from the fields - with the 1st field now modified - and prints the result.Depending on the Awk implementation used, this may actually be faster (see below).
That
sed
isn't the best tool for this job is also reflected in the relative performance of the solutions:Sample timings from my macOS 10.12 machine (GNU
sed
4.2.2, Mawkawk
1.3.4,perl
v5.18.2, using input filefile
, which contains 1 million copies of the sample input line) - take them with a grain of salt, but the ratios of the numbers are of interest; fastest solutions first:As you can see, the
awk
solution is fastest by far, with the line-internal-loopsed
solution predictably performing worst, by a factor of about 12.