Character
A single extended grapheme cluster that approximates a user-perceived character.
@frozen struct Character
Overview
The Character
type represents a character made up of one or more Unicode scalar values, grouped by a Unicode boundary algorithm. Generally, a Character
instance matches what the reader of a string will perceive as a single character. Strings are collections of Character
instances, so the number of visible characters is generally the most natural way to count the length of a string.
let greeting = "Hello! đ„"
print("Length: \(greeting.count)")
// Prints "Length: 8"
Because each character in a string can be made up of one or more Unicode scalar values, the number of characters in a string may not match the length of the Unicode scalar value representation or the length of the string in a particular binary representation.
print("Unicode scalar value count: \(greeting.unicodeScalars.count)")
// Prints "Unicode scalar value count: 8"
print("UTF-8 representation count: \(greeting.utf8.count)")
// Prints "UTF-8 representation count: 11"
Every Character
instance is composed of one or more Unicode scalar values that are grouped together as an extended grapheme cluster. The way these scalar values are grouped is defined by a canonical, localized, or otherwise tailored Unicode segmentation algorithm.
For example, a countryâs Unicode flag character is made up of two regional indicator scalar values that correspond to that countryâs ISO 3166-1 alpha-2 code. The alpha-2 code for The United States is âUSâ, so its flag character is made up of the Unicode scalar values "\u{1F1FA}"
(REGIONAL INDICATOR SYMBOL LETTER U) and "\u{1F1F8}"
(REGIONAL INDICATOR SYMBOL LETTER S). When placed next to each other in a string literal, these two scalar values are combined into a single grapheme cluster, represented by a Character
instance in Swift.
let usFlag: Character = "\u{1F1FA}\u{1F1F8}"
print(usFlag)
// Prints "đșđž"
For more information about the Unicode terms used in this discussion, see the Unicode.org glossary. In particular, this discussion mentions extended grapheme clusters and Unicode scalar values.