iCalendar RFC 2445 Section 4.1 Content Folding

2019-07-31 08:04发布

I am creating a simple iCalendar using C# and found that the Content Folding per Section 4.1 of RFC 2445 to be quite a headache (for me :-).

http://www.apps.ietf.org/rfc/rfc2445.html#sec-4.1

For long lines, you are to escape some characters (backslash, semi-colon, comma and newline, I believe) and then fold it so that no line is longer than 75 octets. I found several straight forward way of doing this on the web. The simplest is to replace the characters in question with escaped version and then insert CRLF at every 75th character. Something like:

// too simple, could break at an escape sequence boundary or multi-byte character may overflow 75 octets
txt = txt.Replace(@"\", "\\\\").Replace(";", "\\;").Replace(",", "\\,").Replace("\r\n", "\\n");
var regex = new System.Text.RegularExpressions.Regex( ".{75}");
var escape_and_folded = regex.Replace( txt, "$0\r\n ");

I see two issues. It’s possible that the CRLF is inserted into an escaped sequence. For example, if insertion occurs such that an escaped new line sequence “\n” becomes “\CRLF” (then the “n” will be on the next line). The second issue is when there are multi-byte characters. Since calculation is per characters it’s possible that the line may become longer than 75 octets.

A simple solution is to walk the string character by character and escape and fold but this seems rather brute force. Does anybody have a more elegant solution?

3条回答
手持菜刀,她持情操
2楼-- · 2019-07-31 08:26

I tried your solution - it works except for the fact that it also folded some lines, whose lengths were less than 75 octets. Therefore, I rewrote the code traditionally (i.e. without using regular expressions - I do miss them) as shown below.

    public static string FoldLines(this string value, int max, string newline = "\r\n")
    {
        var lines = value.Split(new string[]{newline}, System.StringSplitOptions.RemoveEmptyEntries);
        using (var ms = new System.IO.MemoryStream(value.Length))
        {
            var crlf = Encoding.UTF8.GetBytes(newline); //CRLF
            var crlfs = Encoding.UTF8.GetBytes(string.Format("{0} ", newline)); //CRLF and SPACE
            foreach (var line in lines)
            {
                var bytes = Encoding.UTF8.GetBytes(line);
                var len = Encoding.UTF8.GetByteCount(line);
                if (len <= max)
                {
                    ms.Write(bytes, 0, len);
                    ms.Write(crlf, 0, crlf.Length); 
                }
                else
                {
                    var blen = len / max; //calculate block length
                    var rlen = len % max; //calculate remaining length
                    var b = 0;
                    while (b < blen)
                    {
                        ms.Write(bytes, (b++) * max, max);
                        ms.Write(crlfs, 0, crlfs.Length); 
                    }
                    if (rlen > 0)
                    {
                        ms.Write(bytes, blen * max, rlen);
                        ms.Write(crlf, 0, crlf.Length);
                    }
                }
            }

            return Encoding.UTF8.GetString(ms.ToArray());
        }
    }

Notes:

  1. I tried as much as possible to be elegant - i.e. I did not parse the string character-wise but in blocks of octets (determined by max).
  2. The function is best called on the resulting VCALENDAR object such that all content lines are checked for folding and wrapped if necessary.
  3. Escaping of special literals is carried out only in the TEXT - related properties like DESCRIPTION, SUMMARY etc. These are implemented in the following extension methods:

    public static string Replace(this string value, IEnumerable<Tuple<string, string>> pairs)
    {
        foreach (var pair in pairs) value = value.Replace(pair.Item1, pair.Item2);
        return value;
    }
    
    public static string EscapeStrings(this string value)
    {
        return value.Replace(new List<Tuple<string, string>> 
        { 
            new Tuple<string, string>(@"\", "\\\\"),
            new Tuple<string, string>(";",  @"\;"),
            new Tuple<string, string>(",",  @"\,"),
            new Tuple<string, string>("\r\n",  @"\n"),
        });
    }
    
查看更多
3楼-- · 2019-07-31 08:49

reexmonkey's solution writes 76 characters on the middle folded lines, because it doesn't subtract the extra space character added with crlfs

I rewrote the folding function to correct this:

public static string FoldLines(string value, int max, string newline = "\r\n")
{
    var lines = value.Split(new string[] { newline }, System.StringSplitOptions.RemoveEmptyEntries);
    using (var ms = new System.IO.MemoryStream(value.Length))
    {
        var crlf = Encoding.UTF8.GetBytes(newline); //CRLF
        var crlfs = Encoding.UTF8.GetBytes(string.Format("{0} ", newline)); //CRLF and SPACE
        foreach (var line in lines)
        {
            var bytes = Encoding.UTF8.GetBytes(line);
            var len = Encoding.UTF8.GetByteCount(line);
            if (len <= max)
            {
                ms.Write(bytes, 0, len);
                ms.Write(crlf, 0, crlf.Length);
            }
            else
            {
                var offset = 0; //current offset position
                var count = max; //characters to take
                while (offset + count < len)
                {
                    ms.Write(bytes, offset, count);
                    ms.Write(crlfs, 0, crlfs.Length);
                    offset += count;
                    count = max - 1;
                }
                count = len - offset; //remaining characters
                if (count > 0)
                {
                    ms.Write(bytes, offset, count);
                    ms.Write(crlf, 0, crlf.Length);
                }
            }
        }

        return Encoding.UTF8.GetString(ms.ToArray());
    }
}

Also I added an extra Tuple in the EscapeStrings function:

public static string ReplaceText(string value, IEnumerable<Tuple<string, string>> pairs)
{
    foreach (var pair in pairs) value = value.Replace(pair.Item1, pair.Item2);
    return value;
}
public static string EscapeStrings(string value)
{
    return ReplaceText(value, new List <Tuple<string, string>>
    {
        new Tuple<string, string>(@"\", "\\\\"),
        new Tuple<string, string>(";",  @"\;"),
        new Tuple<string, string>(",",  @"\,"),
        new Tuple<string, string>("\r\n",  @"\n"),
        new Tuple<string, string>("\n",  @"\n"),
    });
}
查看更多
The star\"
4楼-- · 2019-07-31 08:52

First off, make sure you look at RFC5545 instead. RFC2445 is obsolete. You can find my PHP implementation here:

https://github.com/fruux/sabre-vobject/blob/master/lib/Property.php#L252

In php we have the mb_strcut function. I'm not sure if there's a .NET equivalent, but that would at the very least make things a lot simpler. I've had no issues so far with folding escape sequences (\) in half. A good parser will first unfold the lines, and only then deal with unescaping. Especially since which characters must be escaped, depends on the actual property. (sometimes , or ; gets escaped, sometimes they don't).

查看更多
登录 后发表回答