I want to take an XML file and replace an element's value. For example if my XML file looks like this:
<abc>
<xyz>original</xyz>
</abc>
I want to replace the xyz element's original value, whatever it may be, with another string so that the resulting file looks like this:
<abc>
<xyz>replacement</xyz>
</abc>
How would you do this? I know I could write a Java program to do this but I assume that that's overkill for replacing a single element's value and that this could be easily done using sed to do a substitution using a regular expression. However I'm less than novice with that command and I'm hoping some kind soul reading this will be able to spoon feed me the correct regular expression for the job.
One idea is to do something like this:
sed s/\<xyz\>.*\<\\xyz\>/\<xyz\>replacement\<\\xyz\>/ <original.xml >new.xml
Maybe it's better for me to just replace the entire line of the file with what I want it to be, since I will know the element name and the new value I want to use? But this assumes that the element in question is on a single line and that no other XML data is on the same line. I'd rather have a command which will basically replace element xyz's value with a new string that I specify and not have to worry if the element is all on one line or not, etc.
If sed is not the best tool for this job then please dial me in to a better approach.
If anyone can steer me in the right direction I'll really appreciate it, you'll probably save me hours of trial and error. Thanks in advance!
--James
OK so I bit the bullet and took the time to write a Java program which does what I want. Below is the operative method called by my main() method which does the work, in case this will be helpful to someone else in the future:
I run the program like so:
I hate to be a naysayer, but XML is anything but regular. A regular expression will probably be more trouble than what it worth. See here for more insight: Using C# Regular expression to replace XML element content
Your thought of a simple Java program might be nice after all. An XSLT transform may be easier if you know XSLT pretty well. If you know Perl ... that's the way to go IMHO.
Having said that, if you choose to go with a Regex and your version of sed supports extended regular expressions, you can make it multiline with /g. In other words, put /g at the end of the regex and it will match your pattern even if they're on multiple lines.
Also. the Regex you proposed is "greedy". It will grab the biggest group of characters it can because the "." will match from the first occurrence of to the last . You can make it "lazy" by changing the wildcard to ".?". Putting the question mark after the asterisk will tell it to match only one set of to .
I was trying to do the same thing and came across this [gu]awk script that achieves it.
sed
is not going to be a easy tool to use for multi-line replacements. It's possible to implement them using itsN
command and some recursion, checking after reading in each line if the close of the tag has been found... but it's not pretty and you'll never remember it.Of course, actually parsing the xml and replacing tags is going to be the safest thing, but if you know you won't run into any problems, you could try this:
Breaking this down:
-p
tells it to loop through the input and print-0777
tells it to use the end of file as the input separator, so that it gets the whole thing in in one slurp-e
means here comes the stuff I want you to doAnd the substitution itself:
@
as a delimiter so you don't have to escape/
*?
, the non-greedy version, to match as little as possible, so we don't go all the way to the last occurrence of</xyz>
in the files
modifier to let.
match newlines (to get the multiple-line tag values)g
modifier to match the pattern multiple timesTada! This prints the result to stdout - once you verify it does what you want, add the
-i
option to tell it to edit the file in place.