The contents of stdin is getting corrupted with word wrapping and trailing "=" throughout which obviously breaks the URL that I need to post.
I need to extract a URL/link from an email then post the URL. So, I'm piping my email to a php script in cpanel using a standard code snip I've seen all over the internet:
$fd = fopen("php://stdin", "r");
$email = ""; // This will be the variable holding the data.
while (!feof($fd)) { $email .= trim(fread($fd, 1024)); }
fclose($fd);
Then dumping the contents of the email to a file "pipemail.txt" for now to inspect it and make sure it's all working properly.
$fdw = fopen("pipemail.txt", "w+");
fwrite($fdw, $email);
fclose($fdw);
The output is looking like this:
...
<table style=3D"width:100%" cellpadding=3D"0" cellspacing=3D"0" border=3D"0=
"><tbody><tr><td><table style=3D"background-color:#ffffff;color:#3c445a;fon=
t-family:arial;font-size:10px;font-weight:bold;width:100%" cellpadding=3D"0=
" cellspacing=3D"0">
...
I have been working on this for over a day now and I'm completely stumped. I've tried trimming the trailing "=" from incoming lines and it does not give me the expected result. Instead it seems to remove random "=" from seemingly random locations in the content. I am guessing that it is not random but it only seems so because it's not what I expect. It's probably only removing it if it happens to be the last character of the 1024 k/char line but if that is true then where else is the wordwrapping coming from? I don't know enough about how this works to trouble shoot this myself.
Why is it wrapping? Where are the "=" coming from? Does anyone have any suggestions?
Emails are commonly encoded in the quoted printable format (http://en.wikipedia.org/wiki/Quoted-printable)
You can decode it using
quoted_printable_decode()
- this is done automatically by your email client, which is why it looks like php is adding those character.http://www.php.net/manual/en/function.quoted-printable-decode.php
Your data is in quoted-printable format, use
quoted_printable_decode
to decode it:and
means that the email is probably quoted-printable encoded. You need to parse the message properly and run the body part through
quoted_printable_decode()
.There are
Content-Type
headers that will tell you what kind of encoding is used, and probably MIME parts and boundaries that you'll need to deal with.