I'm not very good at RegEx, can someone give me a regex (to use in Java) that will select all whitespace that isn't between two quotes? I am trying to remove all such whitespace from a string, so any solution to do so will work.
For example:
(this is a test "sentence for the regex")
should become
(thisisatest"sentence for the regex")
I have absolutely no idea how the top voted answer works and the regex is huge, so I'm submitting this somewhat simpler answer:
It (in theory) works by using a lookahead match to ensure single quotes (') are balanced all the way to the end of the string before testing to see if the whitespace is a valid place to break.
This image shows it executing, and it does, but pretty slowly. As other answers will likely have noted, using such an expression to split a potentially quoted string is using a hammer to remove a rivet. In my case, I'm inputting this regex to a program that takes a regex to split on (fzf).
This just isn't something regexes are good at. Search-and-replace functions with regexes are always a bit limited, and any sort of nesting/containment at all becomes difficult and/or impossible.
I'd suggest an alternate approach: Split your string on quote characters. Go through the resulting array of strings, and strip the spaces from every other substring (whether you start with the first or second depends on whether you string started with a quote or not). Then join them back together, using quotes as separators. That should produce the results you're looking for.
Hope that helps!
PS: Note that this won't handle nested strings, but since you can't make nested strings with the ASCII double-qutoe character, I'm gonna assume you don't need that behaviour.
PPS: Once you're dealing with your substrings, then it's a good time to use regexes to kill those spaces - no containing quotes to worry about. Just remember to use the
/.../g
modifier to make sure it's a global replacement and not just the first match.Groups of whitespace outside of quotes are separated by stuff that's a) not whitespace, or b) inside quotes.
Perhaps something like:
The first part matches a sequence of spaces; the second part matches non-spaces (and non-quotes), or some stuff in quotes, either repeated any number of times. The second part is the separator.
This will give you two groups for each item in the result; just ignore the second element. (We need the parentheses for precidence rather than match grouping there.) Or, you could say, concatenate all the second elements -- though you need to match the first non-space word too, or in this example, make the spaces optional:
(I haven't done much regex in Java so expect bugs.)
Finally This is how I'd do it if regexes were compulsory. ;-)
As well as Xavier's technique, you could simply do it the way you'd do it in C: just iterate over the input characters, and copy each to the new string if either it's non-space, or you've counted an odd number of quotes up to that point.
Here's a single regex-replace that works:
which will replace:
with:
Note that if the quotes can be escaped, the even more verbose regex will do the trick:
which replaces the input:
with:
(note that it also works with escaped backspaces:
(thisisatest"sentence \\\"for the regex"foobar)
)Needless to say (?), this really shouldn't be used to perform such a task: it makes ones eyes bleed, and it performs its task in quadratic time, while a simple linear solution exists.
EDIT
A quick demo:
Here is the regex which works for both single & double quotes (assuming that all strings are delimited properly)
It won't work with the strings which has quotes inside.
This isn't an exact solution, but you can accomplish your goal by doing the following:
STEP 1: Match the two segments
STEP 2: remove spaces
STEP 3: rebuild your string