I have a string with multiple sentences. How do I Capitalize the first letter of first word in every sentence. Something like paragraph formatting in word.
eg ."this is some code. the code is in C#. " The ouput must be "This is some code. The code is in C#".
one way would be to split the string based on '.' and then capitalize the first letter and then rejoin.
Is there a better solution?
I don't know why, but I decided to give yield return a try, based on what LBushkin had suggested. Just for fun.
To use it:
Here's a regex solution that uses the punctuation category to avoid having to specify .!?" etc. although you should certainly check if it covers your needs or set them explicitly. Read up on the "P" category under the "Supported Unicode General Categories" section located on the MSDN Character Classes page.
If you decide not to use the
\p{P}
class you would have to specify the characters yourself, similar to:EDIT: below is an updated example to demonstrate 3 patterns. The first shows how all punctuations affect casing. The second shows how to pick and choose certain punctuation categories by using class subtraction. It uses all punctuations while removing specific punctuation groups. The third is similar to the 2nd but using different groups.
The MSDN link doesn't spell out what some of the punctuation categories refer to, so here's a breakdown:
_
-
(
[
{
)
]
}
,
,:
,;
,\
,/
Carefully compare how the results are affected by these groups. This should grant you a great degree of flexibility. If this doesn't seem desirable then you may use specific characters in a character class as shown earlier.
Notice that "Dash" is not capitalized using the last pattern and it's on a new line. One way to make it capitalized is to use the
RegexOptions.Multiline
option. Try the above snippet with that to see if it meets your desired result.Also, for the sake of example, I didn't use RegexOptions.Compiled in the above loop. To use both options OR them together:
RegexOptions.Compiled | RegexOptions.Multiline
.In my opinion, when it comes to potentially complex rules-based string matching and replacing - you can't get much better than a Regex-based solution (despite the fact that they are so hard to read!). This offers the best performance and memory efficiency, in my opinion - you'll be surprised at just how fast this'll be.
I'd use the Regex.Replace overload that accepts an input string, regex pattern and a MatchEvaluator delegate. A MatchEvaluator is a function that accepts a
Match
object as input and returns a string replacement.Here's the code:
The regex uses the (?<=) construct (zero-width positive lookbehind) to restrict captures only to a-z characters preceded by the start of the string, or the punctuation marks you want. In the
[.;:]
bit you can add the extra ones you want (e.g.[.;:?."]
to add ? and " characters.This means, also, that your MatchEvaluator doesn't have to do any unnecessary string joining (which you want to avoid for performance reasons).
All the other stuff mentioned by one of the other answerers about using the RegexOptions.Compiled is also relevant from a performance point of view. The static Regex.Replace method does offer very similar performance benefits, though (there's just an additional dictionary lookup).
Like I say - I'll be surprised if any of the other non-regex solutions here will work better and be as fast.
EDIT
Have put this solution up against Ahmad's as he quite rightly pointed out that a look-around might be less efficient than doing it his way.
Here's the crude benchmark I did:
In a release build, the my solution was about 12% faster than the Ahmad's (1.48 seconds as opposed to 1.68 seconds).
Interestingly, however, if it was done through the static Regex.Replace method, both were about 80% slower, and my solution was slower than Ahmad's.
You have a few different options:
IEnumerable<char>
with the first letter after a period in upper case. May offer benefit of a streaming solution.The code below uses an iterator: