Chase Mao's blog

Exploring Go: Concurrency Issues with Entire Map Replacement

2023-05-08

Introduction

The map is a commonly used caching data structure in Go. If you perform an entire assignment to a map while concurrently reading from it, will there be concurrency issues? This article aims to explore this question.

Problem Statement

This issue was initially raised during a code review (CR). The simplified version of the source code is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
type impl struct {
    cache map[int]int
}

func (i *impl) Read(key int) int {
    return i.cache[key]
}

func (i *impl) Fresh() {
    newCache := make(map[int]int)
    i.cache = newCache
} 

Concurrent execution occurs as follows:

The Read method concurrently reads data from the cache. The Fresh method concurrently assigns new values to the entire cache. Given this scenario, could there be concurrency issues causing the program to run abnormally? With this question in mind, the author conducted a series of investigations, which are recorded below.

Analysis Process

What is Replaced During an Entire Map Replacement

In Go, a map is actually a pointer to an hmap type. The source code is listed below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// runtime/map.go
func makemap(t *maptype, hint int, h *hmap) *hmap {
    mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
    if overflow || mem > maxAlloc {
        hint = 0
    }

    // initialize Hmap
    if h == nil {
        h = new(hmap)
    }
    h.hash0 = fastrand()

    // Find the size parameter B which will hold the requested # of elements.
    // For hint < 0 overLoadFactor returns false since hint < bucketCnt.
    B := uint8(0)
    for overLoadFactor(hint, B) {
        B++
    }
    h.B = B

    // allocate initial hash table
    // if B == 0, the buckets field is allocated lazily later (in mapassign)
    // If hint is large zeroing this memory could take a while.
    if h.B != 0 {
        var nextOverflow *bmap
        h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
        if nextOverflow != nil {
            h.extra = new(mapextra)
            h.extra.nextOverflow = nextOverflow
        }
    }

    return h
} 

Therefore, a map is not an hmap structure itself but a pointer to an hmap structure. Assigning a value to the entire map is essentially assigning a value to the pointer. The next question is whether this kind of assignment is atomic.

Atomicity of Pointer Assignment

Assuming a 64-bit system, a map is effectively an 8-byte pointer. The author used a piece of test code to investigate the concurrency issues with map, as shown below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
type test struct {
    m map[int]int
}

func main() {
    m := make(map[int]int)
    t := &test{m:make(map[int]int)}
    go func() {
        for {
            _ = m[1]
            _ = t.m[1]
        }
    }()
    go func() {
        for {
            m = make(map[int]int)
            t.m = make(map[int]int)
        }
    }()
    time.Sleep(time.Minute)
} 

Here, separate tests were conducted on a standalone map and a map within a struct field. One goroutine reads data from the map while another goroutine performs an entire map replacement. After multiple tests, no runtime exceptions were encountered. Therefore, it seems that map assignment is atomic.

However, the author came across some discussions indicating that atomicity of assignment is not guaranteed absolutely at the hardware level. Here’s an excerpt from Intel’s documentation.

The documentation states that Intel P6 processors (released in 1995) and newer CPUs guarantee atomicity for reads and writes of 1, 2, 4, and 8 bytes within the same cache line. However, atomicity is not guaranteed for assignments that span cache lines. With this in mind, the author devised a test case with the following approach:

  1. Initialize a byte array of three cache lines in size. Note that on the author’s 64-bit system, a cache line is 64 bytes.
  2. Identify the boundary between cache lines within the byte array to obtain an address offset by 4 bytes.
  3. Initialize an 8-byte map (essentially a pointer) at this offset address, spanning across cache lines.
  4. Concurrently replace and read from the map to observe if any runtime exceptions occur.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
const (
	cacheLineSize = 64
	// alignFromCacheLine distance from the cache line boundary
	alignFromCacheLine = 4
)

func main() {
	// Since reproducing concurrent replacement and reading of a single map is rare,
	// spawning 8 goroutines increases reproducibility.
	for i := 0; i < 8; i++ {
		go testMap()
	}
	time.Sleep(time.Minute)
}

func testMap() {
	// Disable garbage collection to prevent the map from being reclaimed.
	debug.SetGCPercent(-1)

	// Allocate enough memory to store three cache lines.
	mem := make([]byte, cacheLineSize*3)

	// Calculate the starting address that spans cache line boundaries.
	endAddr := uintptr(unsafe.Pointer(&mem[cacheLineSize*3-1]))
	addr := endAddr - (endAddr % cacheLineSize) - cacheLineSize - alignFromCacheLine

	// Check if the address crosses a cache line boundary.
	if addr/cacheLineSize != (addr+7)/cacheLineSize {
		fmt.Println("The map address crosses a cache line boundary.")
	} else {
		fmt.Println("The map address does not cross a cache line boundary.")
	}

	// Construct a map in memory at this address.
	// Explanation: mapPointer is a pointer to a map, pointing to address addr.
	// Assigning *mapPointer to make(map[int]int) constructs a map in memory at addr.
	mapPointer := (*(map[int]int))(unsafe.Pointer(addr))
	*mapPointer = make(map[int]int)

	// Concurrently read from the map.
	go func() {
		for {
			_ = (*mapPointer)[1]
		}
	}()

	// Concurrently replace the map.
	go func() {
		for {
			*mapPointer = make(map[int]int)
		}
	}()

	// Prevent exit.
	time.Sleep(time.Minute)
}

In the above test case, the memory address located 4 bytes from the cache line boundary was identified. A map was initialized at this address, which, as mentioned earlier, is effectively an 8-byte pointer. Consequently, the map spanned across cache lines. The results is like below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# indicate all 8 test instances successfully accessed addresses that crossed cache line boundaries
The map addr crosses a cache line boundary.
The map addr crosses a cache line boundary.
The map addr crosses a cache line boundary.
The map addr crosses a cache line boundary.
The map addr crosses a cache line boundary.
The map addr crosses a cache line boundary.
The map addr crosses a cache line boundary.
The map addr crosses a cache line boundary.
# indicate accessing incorrect addresses
unexpected fault address 0xc14c322330
fatal error: fault
[signal 0xc0000005 code=0x0 addr=0xc14c322330 pc=0x76fa49]

goroutine 82 [running]:
runtime.throw({0x8084f1?, 0xc0ca368840?})
        C:/Program Files/Go/src/runtime/panic.go:1047 +0x65 fp=0xc0005a9f48 sp=0xc0005a9f18 pc=0x794c45
runtime.sigpanic()
        C:/Program Files/Go/src/runtime/signal_windows.go:261 +0x125 fp=0xc0005a9f90 sp=0xc0005a9f48 pc=0x7a6725

The results indicate that the program accessed the address 0xc14c322330, resulting in an error accessing an invalid address. This suggests that assigning an 8-byte pointer at an address spanning cache lines can indeed lead to non-atomicity and runtime anomalies.

To verify this hypothesis, I changed alignFromCacheLine to 8. This prevented the map from spanning cache lines, and upon testing, the error no longer occurred. This essentially confirms the hypothesis.

To more accurately identify scenarios where non-atomicity may cause issues, I replaced the map with uint64 and repeated the previous process. The code is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
const (
	cacheLineSize = 64
	// alignFromCacheLine distance from the cache line boundary
	alignFromCacheLine = 4
)

func main() {
	// Allocate enough memory to store two cache lines
	mem := make([]byte, cacheLineSize*3)

	// Calculate the starting address of uint64 that spans cache line boundaries
	endAddr := uintptr(unsafe.Pointer(&mem[cacheLineSize*3-1]))
	addr := endAddr - (endAddr % cacheLineSize) - cacheLineSize - alignFromCacheLine

	// Check if the address of uint64 crosses a cache line boundary
	if addr/cacheLineSize != (addr+7)/cacheLineSize {
		fmt.Println("The address crosses a cache line boundary.")
	} else {
		fmt.Println("The address does not cross a cache line boundary.")
	}

	// Convert the address that crosses cache line boundaries to uint64
	uint64Addr := (*uint64)(unsafe.Pointer(addr))

	// Initialize uint64
	nums := []uint64{0x1122334455667788, 0x9900112233445566}
	*uint64Addr = nums[0]

	// Concurrently read uint64
	go func() {
		for {
			res := *uint64Addr
			if res != nums[0] && res != nums[1] {
				fmt.Printf("res:%#x\n", res)
			}
		}
	}()

	// Concurrently modify uint64
	go func() {
		for {
			for _, num := range nums {
				*uint64Addr = num
			}
		}
	}()

	time.Sleep(time.Minute)
}

In the previous example, two concurrent operations were observed:

  1. Continuously assigning uint64 values 0x1122334455667788 or 0x9900112233445566.
  2. Reading the uint64 value and printing it if it does not match these two valid values.

Result is like below.

1
2
3
4
res:0x1122334433445566
res:0x9900112255667788
res:0x9900112255667788
res:0x1122334433445566 

The results included values like 0x1122334433445566 and 0x9900112255667788, which are intermediate states where the bytes of the two valid values are swapped. Changing alignFromCacheLine to 8, thus preventing crossing cache lines, eliminated these intermediate states from being read. This confirms the findings from testing with maps — assigning across cache lines results in non-atomic assignments, allowing concurrent reads to access intermediate illegal data values, leading to the earlier observed errors of accessing invalid addresses.

Map Initialization Memory Location

According to the previous analysis, initializing a map at a memory location that crosses cache lines results in non-atomic map replacements. The earlier process used the unsafe package to forcefully initialize maps at such locations. Under normal circumstances, does map initialization naturally result in crossing cache lines?

At this point, Go’s memory alignment mechanism needs consideration. Go’s memory alignment coefficient can be obtained as follows: every variable must align according to its alignment coefficient, ensuring that the variable’s starting position is a multiple of its alignment coefficient.

1
2
3
4
5
6
7
8
func main() {
    fmt.Println(unsafe.Alignof(uint64(1)))    // 8
    fmt.Println(unsafe.Alignof(float32(32))) // 4
    fmt.Println(unsafe.Alignof(""))          // 8
    fmt.Println(unsafe.Alignof([]int{}))     // 8
    fmt.Println(unsafe.Alignof([2]int64{}))  // 8
    fmt.Println(unsafe.Alignof(map[int]int{})) // 8
}

Both maps and uint64 values are aligned on 8-byte boundaries, ensuring that they do not span across 64-bit cache lines. Therefore, they do not encounter the concurrency issues observed earlier.

The author conducted a simple test to verify the 8-byte alignment of maps:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
func main() {
    for {
        m := make(map[int]int)
        t := &test{m:make(map[int]int)}
        mAddr := *(*uint64)(unsafe.Pointer(&m))
        if mAddr % 8 != 0 {
            fmt.Println("map addr is not aligned to 8")
        }
        tmAddr := *(*uint64)(unsafe.Pointer(&t.m))
        if tmAddr % 8 != 0 {
            fmt.Println("test struct map addr is not aligned to 8")
        }
    }
} 

The results did not reveal any maps that were not aligned on an 8-byte boundary, indicating that Go’s memory mechanism ensures maps do not span cache lines. In fact, Go’s memory alignment mechanism guarantees that all basic types do not span cache lines, while only structs and arrays have the potential to span cache lines. For further exploration, interested individuals can refer to the Go official documentation.

Conclusion

Without using the unsafe package (which is not recommended in business logic), concurrent operations of entire map replacement and reading using maps do not pose concurrency issues. However, if maps are forcefully cast using unsafe operations, there is no guarantee that map addresses comply with Go’s memory alignment mechanism, potentially leading to concurrency issues.

However, these conclusions are based solely on testing conducted on the 64-bit Intel platform. Further testing on other hardware platforms is necessary to confirm these findings.