Lets look at the simple example of a string containing just one character, the HEAR-NO EVIL MONKEY chracter: ðŸ™‰.
String.length property returns the number of characters in the string. However, reading the MDN docs on String.length, the property actually returns the “number of code units in the string.” Non-BMP characters are stored as two code units and so the
length property will return 2 instead of 1. Try it out for yourself.
When iterating a string, you are actually iterating code points. This will cause your code to have unexpected results if you aren’t careful. Take for example this code that reverses a string.
For strings that contain only single code point characters (BMP characters) the function works fine. But when we have a character that is represented as two code points, reversing them puts the surrogate pair in the wrong order and the character becomes unreadable. The key to reversing strings correctly is to detect non-BPM chracters, reverse their order first, then reverse the entire string. Check out @mathias’s great esrever module inspired by rapper/computer scientist Missy Elliot.
Want to work on interesting problems like these? Email firstname.lastname@example.org and let’s grab coffee!