How to Upper Lower Title Sentence Case and More with JavaScript
A few weeks ago I shared an article on how to control text casing using the CSS text-transform property. Today I want to continue the topic, but looking at how to use JavaScript to manage text casing, like lower, upper, title case and more.
JavaScript has become the dominant programming language in recent years. This means it is used more and more to manage data, specifically data transformation of upper and lower case letters.
For years a best practice has been to include some sort of CSS reset in your application's styles. This is done to normalize the base styles, eliminating the differences between different browser default style sheets.
You should do the same with your application's data. One area common to all applications is text capitalization.
This may sound unimportant, but can make your life easier as your application changes. Plus strings may be used in multiple places and have a variety of capitalization requirements.
Some common presentation options are:
- lower case
- upper case
- sentence case
- title case
But presentation is not the only place capitalization matters. You often need to compare string values to trigger logic flows. For example I recently shared how to use the JavaScript switch statement.
Since JavaScript is a case sensitive language an inequality can arise, even when the text is 'the same'. This is why you want to normalize the text casing before comparing.
Also by persisting normalized string values you have control over how they are rendered for presentation. You can adjust the casing as needed before they are used to render content. This is where the string prototype methods and extra techniques included in this article can make your applications better.
Characters are represented by numerical values. These values are managed as a table. Not only does a character's lower and upper case versions have different numeric values, specific languages often include unique characters.
Today we also see custom fonts, like FontAwesome and the material design glyphs, and emojiis using unicode characters and code points to represent glyphs and pictures in text.
JavaScript toLowerCase
The toLowerCase() method converts all of a string's characters to lowercase. There are no parameters, and this method does just what you would think.
The toLowerCase() method converts all string characters with decimal Unicode value between 65 ("A") and 90 ("Z") to their lowercase equivalent (by adding 32 to their decimal Unicode value, from "a" at 97, to "z" at 122). If you have ever worked with the ANSII character set then this should be familiar.
var testString = "i AM a CrAzy StrINg, MAkE mE nORMal!"; testString.toLowerCase(); // output - "i am a crazy string, make me normal!"
The toLocaleLowerCase() method returns the calling string value converted to lower case, according to any locale-specific case mappings. For the most part you should get the same value as toLowerCase.
The difference is some languages, like Turkish, do not follow the Unicode character codes exactly.
By default the method uses the host system's default language. You can also pass a specific language to the string to get the local variation:
'İ'.toLocaleLowerCase('tr') === 'i'; \true 'İ'.toLocaleLowerCase('en-US') === 'i'; \false
JavaScript toUpperCase
JavaScript's toUpperCase() method converts all of a string's characters to uppercase. There are no parameters, and this method also does just what you would think. This is an all caps converting method.
The toUpperCase() method does the opposite of the toLowerCase method. It converts all string characters with decimal Unicode value between 97 ("a") and 122 ("z") to their uppercase equivalent (by substracting 32 from their decimal Unicode value, from "A" at 65, to "Z" at 90).
var testString = "i AM a CrAzy StrINg, MAkE mE nORMal!"; testString.toUpperCase(); "I AM A CRAZY STRING, MAKE ME NORMAL!"
The toLocaleUpperCase() method returns the calling string value converted to upper case, according to any locale-specific case mappings, just like it's lower case counterpart.
The toLowerCase and toUpperCase methods can serve as your base to perform more specific case manipulations.
Normalize
The normalize() method returns a string as a Unicode Normalization Form. If the value isn't a string, the method converts it to a string before normalizing.
The conversion to Unicode does not affect the string's value, so the data is not corrupted.
Unicode characters are a way to 'normalize' the difference between language character sets.
The Unicode Standard provides a unique number for every character, no matter what platform, device, application or language. It has been adopted by all modern software providers and now allows data to be transported through many different platforms, devices and applications without corruption.
There are four Unicode normalization forms:
- NFC: Characters are decomposed and then recomposed by canonical equivalence.
- NFD: Characters are decomposed by canonical equivalence, and multiple combining characters are arranged in a specific order.
- NFKD: Characters are decomposed by compatibility, and multiple combining characters are arranged in a specific order.
- NFKC: Characters are decomposed by compatibility, then recomposed by canonical equivalence.
The normalize method has a single, optional, parameter used to determine what form the characters are normalized to. If no value is supplied the default is NFC. If a value other than the four options is supplied a RangeError is thrown.
You can see how the normalize method works with this simple demonstration:
var cafe4= 'café'; var cafe5= 'café'; console.log ( cafe4+' '+cafe4.length, // café 4 cafe5+' '+cafe5.length, // café 5 cafe4 === cafe5, // false cafe4.normalize(), cafe5.normalize(), cafe4.normalize() === cafe5.normalize() // true ); //output: café 4 café 5 false café café true
This feature was added in ECMAScript 6.
The normalize method is not supported by Internet Explorer or the old stock Android browser, which should be expected since neither of these browsers have been updated in recent years. I realize there may be cases you might want to use this in the browser, but I really feel like the normalize method would be used more in the nodejs space to, well, normalize data.
By normalizing the characters you are flattening potential character differences, which means you have more control over how you format your text. Capitalization is just one of the areas properly normalized Unicode are helpful.
It might be a good idea to normalize the characters to Unicode before using either toLowerCase or toUpperCase to ensure you have a standard character set.
JavaScript Title Case
Title case is a stylized form of capitalization used mainly for titles. This is where the first letter of most words are capitalized.
This means you may need to transform an object's title field or any string value to title case if you plan on rendering it as a title, subtitle, headline or heading. Generally only 'major' words are capitalized, but there is debate as to what a major word is, and I will leave that to the grammar folks to fight about.
Let's assume you want to title case every word in a string. There are several ways you can achieve this goal. Most involve normalization, splitting the string array and manipulating individual characters.
I am a fan of using regular expressions to make manipulating strings concise.
In this example, the titleCase method uses JavaScript to convert a string to title case by matching the first letter of each word and replacing it with its upper case equivalent.
function titleCase(str) { return str.replace(/\w\S/g, function(t) { return t.toUpperCase() }); }
The only problem with this method is the potential for mixed case words. But I am not sure this method should be responsible for that level of character conversion.
For example, if the sentence included McDonalds, lower casing all the characters prior to title casing would create a spelling error.
This is why managing string casing can be complex.
JavaScript Sentence Case
Sentence casing is where the first word of a sentence and proper nouns are capitalized. Again we could make a complex solution. But let's just focus on the task at hand.
Again I chose to use a regular expression, but this time a function is used to perform the actual character replacement. The function replaces the first character with an upper case version. You should provide a string that has been 'cleaned' before hand. That could mean a lot of things, for my purposes I chose to just lower case the source string.
function sentenceCase (str) { return str.replace(/[a-z]/i, function (letter) { return letter.toUpperCase(); }).trim(); } sentenceCase(testString.toLowerCase()); //"I am a crazy string, make me normal!"
If you are using node, then might consider the to-sentence-case module.
JavaScript Invert Casing
I can't think of a practical scenario for this, but let's say you wanted to invert a string's casing. This can be done by either looping through the characters or the array's map method.
The logic tests each character's casing. It then applies the opposite casing and appends it to a new string.
function caseAlter(str){ var output = ""; for(var i = 0; i < str.length; i++){ var ch = str[i]; if(ch === ch.toUpperCase()){ output += ch.toLowerCase(); }else{ output += ch.toUpperCase(); } } return output; }
The following is a little more succinct. It uses the array map method to loop through the characters. It inverts the casing with a ternary operator.
For my money, this is the cooler way to do it!
var a = "Hi, Progressive Web Apps Rock!";var ans = a.split('').map(function(c){ return c === c.toUpperCase() ? c.toLowerCase() : c.toUpperCase(); }).join('');
Can You Use JavaScript to Auto-Correct?
Of course you can and it involves one of those fancy, academic algorithms. Rather than make your own wheel I think a node node module is the perfect solution for most cases. That is what autocorrect can do for you. This could be your answer to proper capitalization for those 'non-standard' words like McDonald.
You could apply one the casing above functions, then run the string through a module like autocorrect to fix any remaining capitalization issues.
Summary
You would think upper and lower casing would be simple. And in most cases it is. However, in today's modern world there are many potential characters you need to account for. The examples here should help you in most cases.
However, as I demonstrated words can have inner-character capitalization. You may need to account for these cases, which can be quite complex. This is how spell check and auto-correct systems work. They know common words that have varied capitalization requirements.