Go 并发和内存模型

1. 为什么需要锁#

想象两个 goroutine 同时修改同一个变量：

var count int

go func() { count++ }()
go func() { count++ }()

count++ 在底层是三步操作：读取 → 加一 → 写回。如果两个 goroutine 同时执行，可能发生这种情况：

goroutine 1: 读取 count = 0
goroutine 2: 读取 count = 0
goroutine 1: 写回 count = 1
goroutine 2: 写回 count = 1  ← 应该是 2，但结果是 1

plaintext

这就是数据竞争（data race）。

2. sync.Mutex：互斥锁#

var mu sync.Mutex
var count int

func increment() {
    mu.Lock()
    defer mu.Unlock()
    count++
}

mu.Lock() 保证同一时刻只有一个 goroutine 能执行锁保护的代码。defer mu.Unlock() 确保函数返回时一定会解锁（即使中途 panic 也会解锁）。

在 Raft 里的用法：

type Raft struct {
    mu   sync.Mutex
    // ... 其他字段
}

func (rf *Raft) GetState() (int, bool) {
    rf.mu.Lock()
    defer rf.mu.Unlock()
    return rf.currentTerm, rf.state == Leader
}

3. 锁的纪律：什么时候加锁#

规则：所有被多个 goroutine 访问的共享状态，读和写都要在锁保护下进行。

在 Raft 里，rf.currentTerm、rf.votedFor、rf.log、rf.state 这些字段都是共享状态，都要加锁。

常见错误：只在写的时候加锁，读的时候不加锁。

// 错误：读没有加锁
func (rf *Raft) isLeader() bool {
    return rf.state == Leader  // 没有锁！
}

// 正确
func (rf *Raft) isLeader() bool {
    rf.mu.Lock()
    defer rf.mu.Unlock()
    return rf.state == Leader
}

4. 死锁：最常见的并发 bug#

死锁发生在两个 goroutine 互相等待对方释放锁：

// goroutine 1
mu1.Lock()
mu2.Lock()  // 等待 goroutine 2 释放 mu2

// goroutine 2
mu2.Lock()
mu1.Lock()  // 等待 goroutine 1 释放 mu1

两个 goroutine 都在等对方，程序永远卡住。

在 Lab 里更常见的死锁：在持有锁的情况下调用 RPC，而 RPC 的处理函数也需要同一把锁。

// 错误示例
func (rf *Raft) sendHeartbeat() {
    rf.mu.Lock()
    // ... 准备参数
    rf.peers[i].Call("Raft.AppendEntries", args, reply)  // 发 RPC
    // RPC 处理函数也需要 rf.mu，死锁！
    rf.mu.Unlock()
}

正确做法：在发 RPC 之前释放锁，收到回复后再重新加锁。

func (rf *Raft) sendHeartbeat(server int) {
    rf.mu.Lock()
    args := rf.buildArgs()  // 准备参数
    rf.mu.Unlock()          // 发 RPC 前释放锁

    reply := &AppendEntriesReply{}
    ok := rf.peers[server].Call("Raft.AppendEntries", args, reply)

    if ok {
        rf.mu.Lock()
        // 处理回复
        rf.mu.Unlock()
    }
}

5. channel：goroutine 之间的通信#

channel 是 goroutine 之间传递数据的管道：

ch := make(chan int)

go func() {
    ch <- 42  // 发送数据
}()

value := <-ch  // 接收数据
fmt.Println(value)  // 42

带缓冲的 channel：

ch := make(chan int, 10)  // 可以存 10 个数据，不会立即阻塞

在 Lab 里的用法：Raft 用 applyCh 把已提交的日志条目发给上层应用：

// 在 Make() 里
go func() {
    for {
        // 当有新的日志条目被提交时
        rf.applyCh <- ApplyMsg{
            CommandValid: true,
            Command:      entry.Command,
            CommandIndex: index,
        }
    }
}()

6. sync.Cond：条件变量#

有时候你需要等待某个条件成立，再继续执行。sync.Cond 就是用来做这个的。

var mu sync.Mutex
cond := sync.NewCond(&mu)
var ready bool

// goroutine 1：等待条件
go func() {
    mu.Lock()
    for !ready {
        cond.Wait()  // 释放锁，等待通知，被唤醒后重新加锁
    }
    // 条件成立，继续执行
    mu.Unlock()
}()

// goroutine 2：满足条件后通知
mu.Lock()
ready = true
cond.Signal()  // 唤醒一个等待的 goroutine
mu.Unlock()

注意：cond.Wait() 必须在循环里调用（for !ready），因为可能被虚假唤醒。

7. Go 内存模型：happens-before#

Go 的内存模型规定了什么时候一个 goroutine 的写操作对另一个 goroutine 可见。

关键规则：如果没有同步机制（锁、channel），一个 goroutine 的写操作对另一个 goroutine 不一定可见。

var x int
var done bool

go func() {
    x = 42
    done = true
}()

for !done {}
fmt.Println(x)  // 可能打印 0！因为没有同步

正确做法：用锁或 channel 来同步。

8. 用 -race 检测数据竞争#

Go 提供了内置的竞争检测器：

go test -race ./...

bash

加了 -race 之后，程序运行时会检测所有数据竞争，发现问题立即报告。

Lab 的所有测试都要用 -race 跑：

cd labs/src/raft
go test -race -run 2A

bash

如果有数据竞争，你会看到类似这样的输出：

WARNING: DATA RACE
Write at 0x... by goroutine 7:
  main.(*Raft).RequestVote(...)
Read at 0x... by goroutine 12:
  main.(*Raft).ticker(...)

plaintext

这告诉你哪两个 goroutine 在竞争哪个变量。

快速检验#

为什么在持有锁的情况下发 RPC 会死锁？
cond.Wait() 为什么要放在 for 循环里，而不是 if 里？
如果不加 -race 跑测试，可能会漏掉什么问题？

参考答案

1. 持有锁时发 RPC，当前 goroutine 阻塞等待网络响应。如果对方的 RPC handler 也需要获取同一把锁（比如 Raft 的 AppendEntries handler），就形成循环等待：A 等 B 完成，B 等 A 释放锁，死锁。解决方案：在锁外发 RPC，收到回复后再加锁处理结果。

2. Wait() 被唤醒不代表条件一定满足——可能是虚假唤醒，也可能条件被其他 goroutine 抢先消费了。用 for 循环确保每次被唤醒后都重新检查条件，只有条件真正满足才继续执行。

3. 数据竞争（data race）。-race 在运行时检测并发读写冲突。不加的话，数据竞争不会立即崩溃，而是产生难以复现的随机错误，可能跑 100 次才出现一次。

本系列目录