Possible Duplicate:
How to do unfolding RFC 822
Parsing e-mail-like headers (similar to RFC822)
I have some input data that is similar to e-mail data, in that long lines are wrapped to the next line. For example:
robot-useragent: ABCdatos BotLink/1.0.2 (test links)
robot-language: basic
robot-description: This robot is used to verify availability of the ABCdatos
directory entries (http://www.abcdatos.com), checking
HTTP HEAD. Robot runs twice a week. Under HTTP 5xx
error responses or unable to connect, it repeats
verification some hours later, verifiying if that was a
temporary situation.
The robot-description
field is "too long" for one line, and is wrapped to the next. For aid in parsing this data, I would like to come up with a RegEx that can be used with preg_replace()
to replace with the following conditions:
- New line characters followed by whitespace
- Not replacing new line characters followed by additional new line characters
Example output:
robot-description: This robot is used to verify availability of the ABCdatos directory entries (http://www.abcdatos.com), checking HTTP HEAD. Robot runs twice a week. Under HTTP 5xx error responses or unable to connect, it repeats verification some hours later, verifiying if that was a temporary situation.
I am new to RegEx. How can I build such an expression? If you choose to answer, please include a brief explanation of the components in the expression. I'd really like to learn how to do these.
I've started with this: \n([^\S])*
It is close. http://codepad.org/iMObpgFX
Maybe you could try:
(\r|\n)\s+
Try it!