Ways to break text after a certain number of words

2019-05-11 16:15发布

问题:

Given a string:

"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut", break it after

  1. 4 words
  2. 40 characters

using a maximum language version of C# 4 (in order to be compatible with the Mono platform).


Update/Edit:

Regex Implementations:

ad #2 - split after 40 characters (see this gist)

using System;
using System.Text.RegularExpressions;
Regex.Split(
"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut"
, "(.{40})"
, RegexOptions.Multiline)
.Where(s => !string.IsNullOrEmpty(s))
.ToArray();

This post serves as a community wiki.

回答1:

4 words

As O. R. Mapper said in his comment, this really depends on your ability to define a "word" in a given string and what the delimiters are between words. However, assuming you can define the delimiter as whitespace, then this should work:

using System.Text.RegularExpressions;

string delimiterPattern = @"\s+"; // I'm using whitespace as a delimiter here

// find all spaces between words
MatchCollection matches = Regex.Matches(text, delimiterPattern);

// if we found at least 4 delimiters, cut off the string at the 4th (index = 3)
// delimiter. Else, just keep the original string
string firstFourWords = (matches.Count >= 4)
    ? (text.Substring(0, matches[3].Index))
    : (text);

40 characters

string firstFortyCharacters = text.Substring(0, Math.Min(text.Length, 40));

Both

Combining both, we can get the shorter one:

using System.Text.RegularExpressions;

string delimiterPattern = @"\s+"; // I'm using whitespace as a delimiter here

// find all spaces between words
MatchCollection matches = Regex.Matches(text, delimiterPattern);

// if we found at least 4 delimiters, cut off the string at the 4th (index = 3)
// delimiter. Else, just keep the original string
string firstFourWords = (matches.Count >= 4)
    ? (text.Substring(0, matches[3].Index))
    : (text);

string firstFortyCharacters = text.Substring(0, Math.Min(text.Length, 40));

string result = (firstFourWords.Length > 40) ? (firstFortyCharacters) : (firstFourWords);


回答2:

Answer to your question #2: Place this in a static class and you get a nice extension method that inserts a string at given intervals in another string

public static string InsertAtIntervals(this string s, int interval, string value)
{
    if (s == null || s.Length <= interval) {
        return s;
    }
    var sb = new StringBuilder(s);
    for (int i = interval * ((s.Length - 1) / interval); i > 0; i -= interval) {
        sb.Insert(i, value);
    }
    return sb.ToString();
}