type _string struct {
elements *byte // underlying bytes
len int // number of bytes
}
byte
sequence wrapper. In fact, we can really view a string as an (element-immutable) byte slice.
byte
is a built-in alias of type uint8
.
""
or   in literal.
+
and +=
operators.
==
and !=
operators). And like integer and floating-point values, two values of the same string type can also be compared with >
, <
, >=
and <=
operators. When comparing two strings, their underlying bytes will be compared, one byte by one byte. If one string is a prefix of the other one and the other one is longer, then the other one will be viewed as the larger one.
package main
import "fmt"
func main() {
const World = "world"
var hello = "hello"
// Concatenate strings.
var helloWorld = hello + " " + World
helloWorld += "!"
fmt.Println(helloWorld) // hello world!
// Compare strings.
fmt.Println(hello == "hello") // true
fmt.Println(hello > helloWorld) // false
}
string
type has no methods (just like most other built-in types in Go), but we can
strings
standard package to do all kinds of string manipulations.
len
function to get the length of a string (number of bytes stored in the string).
aString[i]
introduced in container element accesses to get the ith byte
value stored in aString
. The expression aString[i]
is not addressable. In other words, value aString[i]
can't be modified.
aString[start:end]
to get a substring of aString
. Here, start
and end
are both indexes of bytes stored in aString
.
aString[start:end]
also shares the same underlying byte sequence with the base string aString
in memory.
package main
import (
"fmt"
"strings"
)
func main() {
var helloWorld = "hello world!"
var hello = helloWorld[:5] // substring
// 104 is the ASCII code (and Unicode) of char 'h'.
fmt.Println(hello[0]) // 104
fmt.Printf("%T \n", hello[0]) // uint8
// hello[0] is unaddressable and immutable,
// so the following two lines fail to compile.
/*
hello[0] = 'H' // error
fmt.Println(&hello[0]) // error
*/
// The next statement prints: 5 12 true
fmt.Println(len(hello), len(helloWorld),
strings.HasPrefix(helloWorld, hello))
}
aString
and the indexes in expressions aString[i]
and aString[start:end]
are all constants, then out-of-range constant indexes will make compilations fail. And please note that the evaluation results of such expressions are always non-constants (this might or might not change since a later Go version). For example, the following program will print 4 0
.
package main
import "fmt"
const s = "Go101.org" // len(s) == 9
// len(s) is a constant expression,
// whereas len(s[:]) is not.
var a byte = 1 << len(s) / 128
var b byte = 1 << len(s[:]) / 128
func main() {
fmt.Println(a, b) // 4 0
}
a
and b
are evaluated to different values, please read the special type deduction rule in bitwise shift operator operation nd which function calls are evaluated at compile time。
[]byte
.
[]rune
.
0xFFFD
, the code point for the Unicode replacement character. 0xFFFD
will be UTF-8 encoded as three bytes (0xef 0xbf 0xbd
).
0xFFFD
.
Runes
function in the bytes standard package to convert a []byte
value to a []rune
value. There is not a function in this package to convert a rune slice to byte slice.
package main
import (
"bytes"
"unicode/utf8"
)
func Runes2Bytes(rs []rune) []byte {
n := 0
for _, r := range rs {
n += utf8.RuneLen(r)
}
n, bs := 0, make([]byte, n)
for _, r := range rs {
n += utf8.EncodeRune(bs[n:], r)
}
return bs
}
func main() {
s := "Color Infection is a fun game."
bs := []byte(s) // string -> []byte
s = string(bs) // []byte -> string
rs := []rune(s) // string -> []rune
s = string(rs) // []rune -> string
rs = bytes.Runes(bs) // []byte -> []rune
bs = Runes2Bytes(rs) // []rune -> []byte
}
range
keyword in a for-range
loop.
package main
import "fmt"
func main() {
var str = "world"
// Here, the []byte(str) conversion will
// not copy the underlying bytes of str.
for i, b := range []byte(str) {
fmt.Println(i, ":", b)
}
key := []byte{'k', 'e', 'y'}
m := map[string]string{}
// The string(key) conversion copys the bytes in key.
m[string(key)] = "value"
// Here, this string(key) conversion doesn't copy
// the bytes in key. The optimization will be still
// made, even if key is a package-level variable.
fmt.Println(m[string(key)]) // value (very possible)
}
value
if there are data races in evaluating string(key)
. However, such data races will never cause panics.
package main
import "fmt"
import "testing"
var s string
var x = []byte{1023: 'x'}
var y = []byte{1023: 'y'}
func fc() {
// None of the below 4 conversions will
// copy the underlying bytes of x and y.
// Surely, the underlying bytes of x and y will
// be still copied in the string concatenation.
if string(x) != string(y) {
s = (" " + string(x) + string(y))[1:]
}
}
func fd() {
// Only the two conversions in the comparison
// will not copy the underlying bytes of x and y.
if string(x) != string(y) {
// Please note the difference between the
// following concatenation and the one in fc.
s = string(x) + string(y)
}
}
func main() {
fmt.Println(testing.AllocsPerRun(1, fc)) // 1
fmt.Println(testing.AllocsPerRun(1, fd)) // 3
}
for-range
on Strings
for-range
loop control flow applies to strings. But please note, for-range
will iterate the Unicode code points (as rune
values), instead of bytes, in a string. Bad UTF-8 encoding representations in the string will be interpreted as rune
value 0xFFFD
.
package main
import "fmt"
func main() {
s := "éक्षिaπ囧"
for i, rn := range s {
fmt.Printf("%2v: 0x%x %v \n", i, rn, string(rn))
}
fmt.Println(len(s))
}
0: 0x65 e 1: 0x301 ́ 3: 0x915 क 6: 0x94d ् 9: 0x937 ष 12: 0x93f ि 15: 0x61 a 16: 0x3c0 π 18: 0x56e7 囧 21
é
, is composed of two runes (3 bytes total)
क्षि
, is composed of four runes (12 bytes total).
a
, is composed of one rune (1 byte).
π
, is composed of one rune (2 bytes).
囧
, is composed of one rune (3 bytes).
package main
import "fmt"
func main() {
s := "éक्षिaπ囧"
for i := 0; i < len(s); i++ {
fmt.Printf("The byte at index %v: 0x%x \n", i, s[i])
}
}
package main
import "fmt"
func main() {
s := "éक्षिaπ囧"
// Here, the underlying bytes of s are not copied.
for i, b := range []byte(s) {
fmt.Printf("The byte at index %v: 0x%x \n", i, b)
}
}
len(s)
will return the number of bytes in string s
. The time complexity of len(s)
is %% O %%(1)
. How to get the number of runes in a string? Using a for-range
loop to iterate and count all runes is a way, and using the RuneCountInString function in the unicode/utf8
standard package is another way. The efficiencies of the two ways are almost the same. The third way is to use len([]rune(s))
to get the count of runes in string s
. Since Go Toolchain 1.11, the standard Go compiler makes an optimization for the third way to avoid an unnecessary deep copy so that it is as efficient as the former two ways. Please note that the time complexities of these ways are all %% O %%(n)
.
+
operator to concatenate strings, we can also use following ways to concatenate strings.
Sprintf
/Sprint
/Sprintln
functions in the fmt
standard package can be used to concatenate values of any types, including string types.
Join
function in the strings
standard package.
Buffer
type in the bytes
standard package (or the built-in copy
function) can be used to build byte slices, which afterwards can be converted to string values.
Builder
type in the strings
standard package can be used to build strings. Comparing with bytes.Buffer
way, this way avoids making an unnecessary duplicated copy of underlying bytes for the result string.
+
operator. So generally, using +
operator to concatenate strings is convenient and efficient if all of the concatenated strings may present in a concatenation statement.
copy
and append
functions to copy and append slice elements. In fact, as a special case, if the first argument of a call to either of the two functions is a byte slice, then the second argument can be a string (if the call is an append
call, then the string argument must be followed by three dots ...
). In other words, a string can be used as a byte slice for the special case.
package main
import "fmt"
func main() {
hello := []byte("Hello ")
world := "world!"
// The normal way:
// helloWorld := append(hello, []byte(world)...)
helloWorld := append(hello, world...) // sugar way
fmt.Println(string(helloWorld))
helloWorld2 := make([]byte, len(hello) + len(world))
copy(helloWorld2, hello)
// The normal way:
// copy(helloWorld2[len(hello):], []byte(world))
copy(helloWorld2[len(hello):], world) // sugar way
fmt.Println(string(helloWorld2))
}
==
and !=
comparisons, if the lengths of the compared two strings are not equal, then the two strings must be also not equal (no needs to compare their bytes).
O(1)
, otherwise, the time complexity is O(n)
, where n
is the length of the two strings.
package main
import (
"fmt"
"time"
)
func main() {
bs := make([]byte, 1<<26)
s0 := string(bs)
s1 := string(bs)
s2 := s1
// s0, s1 and s2 are three equal strings.
// The underlying bytes of s0 is a copy of bs.
// The underlying bytes of s1 is also a copy of bs.
// The underlying bytes of s0 and s1 are two
// different copies of bs.
// s2 shares the same underlying bytes with s1.
startTime := time.Now()
_ = s0 == s1
duration := time.Now().Sub(startTime)
fmt.Println("duration for (s0 == s1):", duration)
startTime = time.Now()
_ = s1 == s2
duration = time.Now().Sub(startTime)
fmt.Println("duration for (s1 == s2):", duration)
}
duration for (s0 == s1): 10.462075ms duration for (s1 == s2): 136ns
The Go 101 project is hosted on Github. Welcome to improve Go 101 articles by submitting corrections for all kinds of mistakes, such as typos, grammar errors, wording inaccuracies, description flaws, code bugs and broken links.
If you would like to learn some Go details and facts every serveral days, please follow Go 101's official Twitter account @zigo_101.
reflect
standard package.sync
standard package.sync/atomic
standard package.