I'm looking for a function to convert a string of text that is in UpperCase to SentenceCase. All the examples I can find turn the text into TitleCase.
Sentence case in a general sense describes the way that capitalization is used within a sentence. Sentence case also describes the standard capitalization of an English sentence, i.e. the first letter of the sentence is capitalized, with the rest being lower case (unless requiring capitalization for a specific reason, e.g. proper nouns, acronyms, etc.).
Can anyone point me in the direction of a script or function for SentenceCase?
If your input string is not a sentence, but many sentences, this becomes a very difficult problem.
Regular expressions will prove an invaluable tool, but (1) you'll have to know them quite well to be effective, and (2) they might not be up to doing the job entirely on their own.
Consider this sentence
This sentence doesn't start with a letter, it has a digit, various punctuation, a proper name, and a
.
in the middle.The complexities are enormous, and this is one sentence.
One of the most important things when using RegEx is to "know your data." If you know the breadth of types of sentences you'll be dealing with, your task will be more manageable.
In any event, you'll have to toy with your implementation until you are satisfied with your results. I suggest writing some automated tests with some sample input -- as you work on your implementation, you can run the tests regularly to see where you're getting close and where you're still missing the mark.
There is a built in
ToTitleCase()
function that will be extended to support multiple cultures in future.Example from MSDN:
While it is generally useful it has some important limitations:
Source: http://msdn.microsoft.com/en-us/library/system.globalization.textinfo.totitlecase.aspx
A solution in F#:
There isn't anything built in to .NET - however, this is one of those cases where regular expression processing actually may work well. I would start by first converting the entire string to lower case, and then, as a first approximation, you could use regex to find all sequences like
[a-z]\.\s+(.)
, and useToUpper()
to convert the captured group to upper case. TheRegEx
class has an overloadedReplace()
method which accepts aMatchEvaluator
delegate, which allows you to define how to replace the matched value.Here's a code example of this at work:
This could be refined in a number of different ways to better match a broader variety of sentence patterns (not just those ending in a letter+period).
I found this sample on MSDN.