Three new books, Go Optimizations 101, Go Details & Tips 101 and Go Generics 101 are published now. It is most cost-effective to buy all of them through this book bundle in the Leanpub book store.

How to canonicalize strings to save memory?

At run time of Go programs, sometimes, some equal strings don't share underlying bytes memory blocks, even if they can share a single common bytes memory block.

The process to let them share a single common bytes memory block is called string canonicalization. There are several ways in Go to implement string canonicalization.

Way 1: canonicalize two strings when they are found equal

The logic and implementation is simple:

	if str1 == str2 {
		str1 = str2 // give up str2's underlying bytes memory block
	}

Here is a clumsy implementation to canonicalize the strings in a slice:

func CanonicalizeStrings(ss []string) {
	type S struct {
		str   string
		index int
	}
	var temp = make([]S, len(ss))
	for i := range temp {
		temp[i] = S {
			str: ss[i],
			index: i,
		}
	}
	
	for i := 0; i < len(temp); {
		var k = i+1
		for j := k; j < len(temp); j++ {
			if temp[j].str == temp[i].str {
				temp[j].str = temp[i].str
				temp[k], temp[j] = temp[j], temp[k]
				k++
			}
		}
		i = k
	}
	
	for i := range temp {
		ss[temp[i].index] = temp[i].str
	}
}

The way is more performant than the following way for the specified case (canonicalize all strings in a slice).

Way 2: use Go 1.23 introduced unique.Handle

Go 1.23 introduced unique.Handle is a convenient way to canonicalize strings.

import "unique"

func CanonicalizeString(s string) string {
	return unique.Make(s).Value()
}

func CanonicalizeStrings(ss []string) {
	for i, s := range ss {
		ss[i] = CanonicalizeString(s)
	}
}

The way is more flexible. Just apply the above CanonicalizeString function for every string used at run time, then all equal strings will share the same underlying bytes memory blocks.

Note: the unique.Make way is not always suitable for every situation. The unique.Make function will allocate a backing bytes memory blcok for each distinct string. So if some unequal strings to be canonicalized share the same backing bytes memory block, the unique.Make function will allocate a new backing byte sequence memory block for each of the strings. Doing this actually allocates more memory (than using a single memory block).


The Go 101 project is hosted on Github. Welcome to improve Go 101 articles by submitting corrections for all kinds of mistakes, such as typos, grammar errors, wording inaccuracies, description flaws, code bugs and broken links.

If you would like to learn some Go details and facts every serveral days, please follow Go 101's official Twitter account @zigo_101.

Tapir, the author of Go 101, has been on writing the Go 101 series books and maintaining the go101.org website since 2016 July. New contents will be continually added to the book and the website from time to time. Tapir is also an indie game developer. You can also support Go 101 by playing Tapir's games (made for both Android and iPhone/iPad):
Individual donations via PayPal are also welcome.