How to split string preserving whole words?

2019-01-15 00:58发布

问题:

I need to split long sentence into parts preserving whole words. Each part should have given maximum number of characters (including space, dots etc.). For example:

int partLenght = 35;
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon."

Output:

1 part: "Silver badges are awarded for"
2 part: "longer term goals. Silver badges are"
3 part: "uncommon."

回答1:

Try this:

    static void Main(string[] args)
    {
        int partLength = 35;
        string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
        string[] words = sentence.Split(' ');
        var parts = new Dictionary<int, string>();
        string part = string.Empty;
        int partCounter = 0;
        foreach (var word in words)
        {
            if (part.Length + word.Length < partLength)
            {
                part += string.IsNullOrEmpty(part) ? word : " " + word;
            }
            else
            {
                parts.Add(partCounter, part);
                part = word;
                partCounter++;
            }
        }
        parts.Add(partCounter, part);
        foreach (var item in parts)
        {
            Console.WriteLine("Part {0} (length = {2}): {1}", item.Key, item.Value, item.Value.Length);
        }
        Console.ReadLine();
    }


回答2:

I knew there had to be a nice LINQ-y way of doing this, so here it is for the fun of it:

var input = "The quick brown fox jumps over the lazy dog.";
var charCount = 0;
var maxLineLength = 11;

var lines = input.Split(' ', StringSplitOptions.RemoveEmptyEntries)
    .GroupBy(w => (charCount += w.Length + 1) / maxLineLength)
    .Select(g => string.Join(" ", g));

// That's all :)

foreach (var line in lines) {
    Console.WriteLine(line);
}

Obviously this code works only as long as the query is not parallel, since it depends on charCount to be incremented "in word order".



回答3:

I've been testing Jon's and Lessan's answers, but they don't work properly if your max length needs to be absolute, rather than approximate. As their counter increments, it doesn't count the empty space left at the end of a line.

Running their code against the OP's example, you get:

1 part: "Silver badges are awarded for " - 29 Characters
2 part: "longer term goals. Silver badges are" - 36 Characters
3 part: "uncommon. " - 13 Characters

The "are" on line two, should be on line three. This happens because the counter does not include the 6 characters from the end of line one.

I came up with the following modification of Lessan's answer to account for this:

public static class ExtensionMethods
{
    public static string[] Wrap(this string text, int max)
    {
        var charCount = 0;
        var lines = text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
        return lines.GroupBy(w => (charCount += (((charCount % max) + w.Length + 1 >= max) 
                        ? max - (charCount % max) : 0) + w.Length + 1) / max)
                    .Select(g => string.Join(" ", g.ToArray()))
                    .ToArray();
    }
}


回答4:

Split the string with a (space), that build up new strings from the resulting array, stopping before your limit for each new segment.

Untested pseudo-code:

string[] words = sentence.Split(new char[] {' '});
IList<string> sentenceParts = new List<string>();
sentenceParts.Add(string.Empty);

int partCounter = 0;    

foreach (var word in words)
{
  if(sentenceParts[partCounter].Length + word.Length > myLimit)
  {
     partCounter++;
     sentenceParts.Add(string.Empty);
  }

  sentenceParts[partCounter] += word + " ";
}


回答5:

At first I was thinking this might be a Regex kind of thing but here's my shot at it:

List<string> parts = new List<string>();
int partLength = 35;
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";

string[] pieces = sentence.Split(' ');
StringBuilder tempString = new StringBuilder("");

foreach(var piece in pieces)
{
    if(piece.Length + tempString.Length + 1 > partLength) 
    {
        parts.Add(tempString.ToString());
        tempString.Clear();        
    }
    tempString.Append(" " + piece); 
}


回答6:

Expanding on jon's answer above; I needed to switch g with g.toArray(), and also change max to (max + 2) to get an exact wrapping on the max'th character.

public static class ExtensionMethods
{
    public static string[] Wrap(this string text, int max)
    {
        var charCount = 0;
        var lines = text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
        return lines.GroupBy(w => (charCount += w.Length + 1) / (max + 2))
                    .Select(g => string.Join(" ", g.ToArray()))
                    .ToArray();
    }
}

And here is sample usage as NUnit tests:

[Test]
public void TestWrap()
{
    Assert.AreEqual(2, "A B C".Wrap(4).Length);
    Assert.AreEqual(1, "A B C".Wrap(5).Length);

    Assert.AreEqual(2, "AA BB CC".Wrap(7).Length);
    Assert.AreEqual(1, "AA BB CC".Wrap(8).Length);

    Assert.AreEqual(2, "TEST TEST TEST TEST".Wrap(10).Length);
    Assert.AreEqual(2, "  TEST TEST TEST TEST  ".Wrap(10).Length);
    Assert.AreEqual("TEST TEST", "  TEST TEST TEST TEST  ".Wrap(10)[0]);
}


回答7:

Joel there is a little bug in your code that I've corrected here:

public static string[] StringSplitWrap(string sentence, int MaxLength)
{
        List<string> parts = new List<string>();
        string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";

        string[] pieces = sentence.Split(' ');
        StringBuilder tempString = new StringBuilder("");

        foreach (var piece in pieces)
        {
            if (piece.Length + tempString.Length + 1 > MaxLength)
            {
                parts.Add(tempString.ToString());
                tempString.Clear();
            }
            tempString.Append((tempString.Length == 0 ? "" : " ") + piece);
        }

        if (tempString.Length>0)
            parts.Add(tempString.ToString());

        return parts.ToArray();
}


回答8:

This works:

int partLength = 35;
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
List<string> lines =
    sentence
        .Split(' ')
        .Aggregate(new [] { "" }.ToList(), (a, x) =>
        {
            var last = a[a.Count - 1];
            if ((last + " " + x).Length > partLength)
            {
                a.Add(x);
            }
            else
            {
                a[a.Count - 1] = (last + " " + x).Trim();
            }
            return a;
        });

It gives me:

Silver badges are awarded for 
longer term goals. Silver badges 
are uncommon. 


回答9:

While CsConsoleFormat† was primarily designed to format text for console, it supports generating plain text as well.

var doc = new Document().AddChildren(
  new Div("Silver badges are awarded for longer term goals. Silver badges are uncommon.") {
    TextWrap = TextWrapping.WordWrap
  }
);
var bounds = new Rect(0, 0, 35, Size.Infinity);
string text = ConsoleRenderer.RenderDocumentToText(doc, new TextRenderTarget(), bounds);

And, if you actually need trimmed strings like in your question:

List<string> lines = text.Trim()
  .Split(new[] { Environment.NewLine }, StringSplitOptions.None)
  .Select(s => s.Trim())
  .ToList();

In addition to word wrap on spaces, you get proper handling of hyphens, zero-width spaces, no-break spaces etc.

† CsConsoleFormat was developed by me.