In this article, we will explore how to manipulate strings in Go, the internal representation of characters, the challenges of working with UTF-8 characters, and how Go resolves these challenges with the concept of runes.
The article is divided into the following sections:
- Basic String.
- Internal Representation of Characters.
- UTF-8 and Runes.
- Immutability.
- Length.
- Comparison.
- Concatenation.
If you’re new to the world of Go, I recommend the following previous articles:
- Go basics
- Go functions
- Go unit tests
- Go pointers
- Go interfaces
- Go structs
- Go generics
- Go errors
- Go routines
Basic String:
A string is nothing more than a
slice
of bytes in Go, and they can be created by writing a set of characters between double quotes: " "
. Let’s see a simple example.
package main
import (
"fmt"
)
func main() {
name := "Hello World"
fmt.Println(name)
}
If we run the code, we will see the following output:
Hello World
Internal Representation of Characters:
The character data of strings is stored byte by byte, so we can iterate over the string to obtain each of these. In the following example, we show the representation as chars and the hexadecimal representation of each of them.
package main
import (
"fmt"
)
func printChars(s string) {
fmt.Printf("Characters: ")
for i := 0; i < len(s); i++ {
fmt.Printf("%c ", s[i])
}
fmt.Printf("\n")
}
func printBytes(s string) {
fmt.Printf("Bytes: ")
for i := 0; i < len(s); i++ {
fmt.Printf("%x ", s[i])
}
fmt.Printf("\n")
}
func main() {
name := "Hello World"
fmt.Printf("String: %s\n", name)
printChars(name)
printBytes(name)
}
If we run the code, we will see the following output:
String: Hello World
Characters: H e l l o W o r l d
Bytes: 48 65 6c 6c 6f 20 57 6f 72 6c 64
UTF-8 and Runes:
Characters occupy 1 byte, but if they are UTF-8 (any non-English character), they can occupy 1, 2, 3, or 4 bytes. If we try to display these strings character by character, it will not work correctly.
package main
import (
"fmt"
)
func printChars(s string) {
fmt.Printf("Characters: ")
for i := 0; i < len(s); i++ {
fmt.Printf("%c ", s[i])
}
fmt.Printf("\n")
}
func printBytes(s string) {
fmt.Printf("Bytes: ")
for i := 0; i < len(s); i++ {
fmt.Printf("%x ", s[i])
}
fmt.Printf("\n")
}
func main() {
name := "Hello señor"
fmt.Printf("String: %s\n", name)
printChars(name)
printBytes(name)
}
If we run the code, we will see the following output:
String: Hello señor
Characters: H e l l o s e à ± o r
Bytes: 48 65 6c 6c 6f 20 73 65 c3 b1 6f 72
We can solve this problem using runes. In the following code, we generate a slice of runes from a string, and when we iterate over each of the characters, we are actually iterating over each of the runes.
package main
import (
"fmt"
)
func printChars(s string) {
fmt.Printf("Characters: ")
runes := []rune(s)
for i := 0; i < len(runes); i++ {
fmt.Printf("%c ", runes[i])
}
fmt.Printf("\n")
}
func printBytes(s string) {
fmt.Printf("Bytes: ")
for i := 0; i < len(s); i++ {
fmt.Printf("%x ", s[i])
}
fmt.Printf("\n")
}
func main() {
name := "Hello World"
fmt.Printf("String: %s\n", name)
printChars(name)
printBytes(name)
fmt.Printf("\n\n")
name = "Señor"
fmt.Printf("String: %s\n", name)
printChars(name)
printBytes(name)
}
If we run the code, we will see the following output:
String: Hello World
Characters: H e l l o W o r l d
Bytes: 48 65 6c 6c 6f 20 57 6f 72 6c 64
String: Señor
Characters: S e ñ o r
Bytes: 53 65 c3 b1 6f 72
If we use a Go for range
, it is not necessary to use a rune slice since the for loop itself will handle the conversion for us.
package main
import (
"fmt"
)
func printChars(s string) {
fmt.Printf("Characters: ")
for _, char := range s {
fmt.Printf("%c", char)
}
fmt.Printf("\n")
}
func printBytes(s string) {
fmt.Printf("Bytes: ")
for i := 0; i < len(s); i++ {
fmt.Printf("%x ", s[i])
}
fmt.Printf("\n")
}
func main() {
name := "Hello señor"
fmt.Printf("String: %s\n", name)
printChars(name)
printBytes(name)
}
If we run the code, we will see the following output:
String: Hello señor
Characters: Hello señor
Bytes: 48 65 6c 6c 6f 20 73 65 c3 b1 6f 72
As a general rule, it is always advisable to use runes to avoid unpleasant surprises.
Immutability:
Strings in Go are immutable, meaning their content cannot be altered. For example, this function would produce an error:
func mutate(s string) string {
s[0] = 'a'
return s
}
./strings05.go:8:5: cannot assign to s[0] (neither addressable nor a map index expression)
If we want to modify the content, we need to work with runes.
package main
import (
"fmt"
)
func mutate(s string) string {
runes := []rune(s)
runes[0] = 'a'
return string(runes)
}
func main() {
s := "hello"
fmt.Println(mutate(s))
}
If we run the code, we will see the following output:
aello
Length:
Another issue we might encounter is counting the length of strings. By viewing the hexadecimal representation of the characters, we can see that “señor” occupies 6 chars when it should occupy 5.
name := "Hello World"
Bytes: 48 65 6c 6c 6f 20 57 6f 72 6c 64
"Hello " -> 48 65 6c 6c 6f 20
"World" -> 57 6f 72 6c 64
name := "Hello señor"
Bytes: 48 65 6c 6c 6f 20 73 65 c3 b1 6f 72
"Hello " -> 48 65 6c 6c 6f 20
"señor" -> 73 65 c3 b1 6f 72
The len() function will always show the number of bytes in a string, in this case, len(48 65 6c 6c 6f 20 73 65 c3 b1 6f 72). If we want to obtain the number of runes, we must use utf8.RuneCountInString.
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
name := "Hello señor"
fmt.Printf("String len: %v\n", len(name))
fmt.Printf("String utf8.RuneCountInString: %v\n", utf8.RuneCountInString(name))
}
If we run the code, we will see the following output:
String len: 12
String utf8.RuneCountInString: 11
Comparison:
String comparison is straightforward; we only need to use the ==
operator.
package main
import (
"fmt"
)
func compareStrings(str1 string, str2 string) {
if str1 == str2 {
fmt.Printf("%s and %s are equal\n", str1, str2)
return
}
fmt.Printf("%s and %s are not equal\n", str1, str2)
}
func main() {
string1 := "Go"
string2 := "Go"
compareStrings(string1, string2)
string3 := "hello"
string4 := "world"
compareStrings(string3, string4)
}
If we run the code, we will see the following output:
Go and Go are equal
hello and world are not equal
Concatenation:
Concatenation is also very simple; we just need to join the two strings using the +
operator.
package main
import (
"fmt"
)
func main() {
string1 := "Go"
string2 := "is awesome"
result := string1 + " " + string2
fmt.Println(result)
}
If we run the code, we will see the following output:
Go is awesome