COMP5270 Week 13 总结：Recap 与 Final Exam Review（Sample Exam

Posted on 2026-05-27 In Course Notes , COMP5270 Word count in article: 3.5k Reading time ≈ 13 mins.

课程: COMP5270 - Randomness, Probability, and Algorithms
学期: S1 2026
来源: Week 13 - Recap, Week 13 transcript, Sample Exam #1

这节课在讲什么

Week 13 是全课 recap 和 exam review。主要内容有四部分：

Final exam 的结构、允许材料和答题策略
Sample Exam #1 的 MCQ 讲解
Sample Exam #1 的 Problem 2-4 证明题思路
最后用 two apples 的故事引出 zero-knowledge proof

核心复习建议是：不要背完整证明，而是要知道每个工具什么时候用、怎么开头、怎样拿到部分分数。

Exam Information

Final Exam Structure

Final exam:

日期：Thursday, June 18, 2026, 5pm
写作时间：2.5 hours
阅读时间：10 minutes
总分：60 marks
占总成绩：60%
题型结构：和 sample exam 一样，4 problems
School of CS policy：final exam 有 40% exam barrier

也就是说，final exam 至少需要：

所以需要至少 24/60 才能通过 exam barrier。

What Is Examinable?

原则上：

lectures
tutorials
quizzes
assignments

都可以考。

但 lecturer 强调不是要你把 162 页 notes 全部背下来，而是要：

知道重要概念是什么
知道工具什么时候用
看到题目时能判断应使用哪类方法
能写出清楚、短而正确的证明

例如：

看到：想到 probability amplification / majority vote
看到 expectation + variance：想到 Chebyshev
看到 many events：想到 union bound
看到 expected count：想到 indicator variables + linearity of expectation
看到 repeated sampling until all seen：想到 coupon collector

Restricted Open Book Rules

允许带：

官方 printed lecture notes，不能 annotate
一张 double-sided A4 notes，可以手写、打印，或两者混合

不允许带：

slides
tutorial solutions
assignment solutions
extra materials
electronic devices

答题时：

写 student ID，不写 name
不要 verbatim copy lecture notes
即使参考 allowed material，也要用自己的话写

Exam Strategy

1. 先读完整份卷子

阅读时间里不要只盯第一题。先看四个 problems：

哪些题能稳拿分
哪些题需要时间推导
哪些小问可以独立做

sample exam 里很多小问是可以独立拿分的。例如 LP relaxation 这种题，即使前面没有完全做出，也能直接拿。

2. 先做容易拿分的部分

不要线性硬做到卡住。优先写：

定义题
简短解释题
直接套公式题
previous subquestion 的直接 deduction

如果一个小问要用前面结果，即使前面没证明出来，也可以写：

Assuming the previous claim, we get ...

这样仍然能拿后续分数。

3. 写清楚，不要堆所有知识点

Lecturer 强调：分数来自正确点，不来自写得多。

如果不确定完整证明，可以写：

应该计算 expectation
再 bound variance
然后用 Chebyshev
或者用 Chernoff / union bound / linearity

这比乱写一大堆不相关内容更有价值。

Problem 1: MCQ 重点

Sample Exam #1 的 MCQ 是 10 分：

correct answer: 1 mark
wrong answer: 0 mark
no answer: 0.5 mark

所以如果一道题有 3 个选项，随机猜的期望分数是：

因此不确定时，留空比随机猜更好。

MCQ Knowledge Points

题目主题	关键答案 / 复习点
Randomised algorithms	worst-case runtime bounded but may err = Monte Carlo
Universal hashing	universal 不保证 uniform；strongly universal 才有更强性质
Coupon collector	看到所有个 distinct balls 的期望 draws 是
Karger's algorithm	用于 Min-Cut
CountMinSketch / CountSketch / Misra-Gries	用于 Frequent Elements
Bloom filter	可能有 false positives，不会有 false negatives
LP relaxation	可以高效求解，但不一定给出 ILP 的 same optimal solution
Strongly universal hashing	不保证 no collisions
LSH	approximation factor 变小表示要求更强，会变大，结构更贵

Problem 2: MIS 的 LP Relaxation 与 Randomised Rounding

题目背景

Maximum Independent Set (MIS):

给定无向图，找最大点集，使得中任意两个点之间没有边。

也就是：

正确的 ILP

正确的是左边的 ILP：

subject to:

解释：

表示 vertex 被选进 independent set
对每条边，不能同时选和
objective 最大化选中顶点数

MIS 与 ILP 的互相转换

从 independent set 到 ILP solution：

检查：

若，因为 independent，所以
value 相同：

从 ILP solution 到 independent set：

若且，则：

矛盾。因此是 independent set。

LP Relaxation

把 binary constraint 放宽：

变成：

这就是 LP relaxation。其他 constraints 保持不变。

Algorithm 思路

算法先求 LP relaxation 的最优解，value 为。

若：

直接返回任意一个 vertex。

否则随机生成集合：

然后对每条 edge ，若都在中，就把它们加入 removal set ，最后输出：

为什么输出一定是 independent set?

要分两种情况：

Line 3 返回单个 vertex：单点集合一定 independent
Line 10 返回：所有仍连接着 edge 的 pair 都被移除了，所以剩下的集合没有内部边

Line 3 的 approximation

如果算法在 Line 3 返回一个点，则。

因为 LP 是 relaxation：

而进入 Line 3 说明：

所以：

因此：

即得到 -approximation。

Expected Size of

用 linearity of expectation：

Bounding Removed Vertices

若 edge 两端都被选入，最多会导致移除两个点。

所以：

这里通常不是 equality，因为可能 double count。例如 triangle 中三个点都被选入时，一个 vertex 可能因多条边被重复计入，但实际只能移除一次。

对每条 edge ：

原因：

第一不等式是 AM-GM /
第二步用，所以
第三步用 LP constraint

因此：

Final Bound

当执行 Lines 5-10 时，已知：

所以：

于是：

又因为：

得到：

所以算法在 expectation 下给出 -approximation。

本题知识点

知识点	用法
ILP modelling	用 binary variable 表示 vertex 是否被选
LP relaxation	把放宽成
Randomised rounding	按 LP 解给出的概率随机选点
Linearity of expectation	算
Double counting	bound removal set，而不是精确等式
AM-GM /	把 product 转成 sum constraint

Problem 3: Triangle Counting with Query Access

Query Model

给定图：

知道 vertex set ，
知道 edge 数
不知道完整 edge set

可以使用三种 query：

Edge query: 返回 uniformly random edge
Edge check: 给定，检查
Vertex query: 返回 uniformly random vertex

目标是估计 triangle 数量。

Baseline 1: Exact Algorithm

对所有 unordered pairs 做 edge check。

这样可以恢复整张图，然后精确数 triangle。

query complexity:

Baseline 2: High-Probability Exact Algorithm

用 edge query 重复抽样，直到高概率见过所有 edges。

这是 coupon collector：

次抽样后，以高概率收集到所有 edges，然后恢复图并精确数 triangle。

Fixing an Order Without Knowing

题目要求先 fix an order on edges。

即使不知道，也可以对所有 possible vertex pairs 使用 lexicographic order：

例如先比较较小端点，再比较较大端点。这个顺序只依赖 vertex labels，不需要 query。

Sampling Estimator

Algorithm 1:

固定 edge order
uniformly sample edge
uniformly sample vertex
如果：
- 是该 triangle 三条边里 order 最小的边
返回：

否则返回：

为什么 ?

对每个 triangle：

先 sample 到它的一条边：概率
再 sample 到第三个 vertex：概率
sample 到三条边中 order 最小的那条：概率

所以每个 triangle 被成功识别的概率是：

总共有个 triangles，所以成功事件概率：

因此：

这个 estimator 是 unbiased estimator。

Variance Bound

因为只取两个值：

成功时
失败时

所以：

而：

所以：

Repetition and Chebyshev

重复 Algorithm 1 共次，取平均：

则：

用 Chebyshev：

若已知 lower bound ，令：

就可以让失败概率变成常数级，例如至多。

Dense Graph Case

若图是 dense：

且 triangle 数：

要得到 2-approximation，即。

三种算法 query complexity：

算法	Query Complexity
exact all-pairs
coupon collector edges
sampling estimator

所以 dense graph 且 triangle 很多时，使用 sampling estimator 最好。

本题知识点

知识点	用法
Coupon collector	用 random edge query 收集所有 edges
Unbiased estimator	设计使
Symmetry breaking	用固定 order 避免 triangle 被三条边重复计数
Variance bound	用
Chebyshev	用 repetition + averaging 降低误差概率

Problem 4: Streaming Reservoir Sampling

Problem Setting

图以 edge stream 的形式到来：

每条 edge 恰好出现一次。

已知：

vertex set
一个很粗的 upper bound

未知：

edge set
stream length

目标：one-pass streaming algorithm，用很少内存，在 stream 结束时输出一条 uniformly random edge。

If Were Known

如果知道，可以一开始随机选：

然后输出第条 edge。

space complexity:

存： bits
存 edge： bits

总空间：

Unknown : Reservoir Sampling

算法：

存第一条 edge：
对
抛一个 Bernoulli coin：

若，则替换：

最后输出。

所以题目里的应该是：

不是，也不是或。

Correctness Proof by Induction

要证明：在任意 step 结束时，已看过的每条 edge 都以概率被存为当前 candidate。

Base case:

只有，概率为。

Induction step:

新 edge 被选中的概率：

旧 edge ，，在 step 后仍被保留，需要：

step 后它是 candidate
step 不替换

所以：

因此在 stream 结束时，，每条 edge 被输出的概率都是。

如何用 fair random bits 生成 Bernoulli?

一种做法：

用个 fair bits 生成一个整数，近似 uniform over
若，输出
否则输出

若不是 2 的幂，可以生成到最近的 2 的幂范围里，再用 rejection sampling。

期望 random bits:

Space Complexity

需要存：

当前 candidate edge： bits
counter ： bits

所以总空间：

用 Problem 3 做 Streaming Triangle Approximation

Problem 3 的 triangle estimator 需要：

sample random edge
sample random vertex
edge check

在 streaming setting 中：

random vertex 可以直接从已知中抽
random edge 可以用 reservoir sampling 实现
edge check 可以在 stream 中观察目标 edge 是否出现

所以可以运行 Problem 3 中需要的个 independent estimator，每个 estimator 用 reservoir sampling 维护需要的 random edge。

若：

则空间大约是：

再加上每个 estimator 需要保存常数个 vertices / edges / counters，仍然是同阶。

本题知识点

知识点	用法
Reservoir sampling	不知道 stream length 时 uniform sample 一个元素
Induction invariant	step 后每个 seen item 概率都是
Random bits simulation	用 fair bits 生成 Bernoulli
Streaming space complexity	只存 candidate、counter 和少量状态
Reduction	用 streaming edge sampler 实现 Problem 3 的 edge query

Zero-Knowledge Proof 引子

最后 lecturer 用 two apples 的故事引出 zero-knowledge proof。

故事：

有两颗苹果，一红一绿
verifier 是色盲，看不出颜色区别
prover 声称自己知道哪颗是红、哪颗是绿
目标是让 verifier 相信两颗苹果颜色不同，但不透露哪颗是红、哪颗是绿

核心思想：

verifier 可以把两颗苹果拿到背后，随机决定是否交换，然后让 prover 判断是否被交换。若 prover 真能分辨颜色，就总能答对；若不能分辨，只能猜。

重复多轮后：

如果两颗苹果颜色不同，诚实 prover 能让 verifier 高概率相信
如果两颗苹果颜色相同，骗子 prover 不能持续骗过 verifier
verifier 最后仍不知道哪颗是红、哪颗是绿

这就是 zero knowledge 的味道：

convince someone that a statement is true, without revealing the proof itself.

Final Revision Checklist

考前建议重点复习：

Linearity of expectation：indicator variables
Markov / Chebyshev / Chernoff bounds
Union bound
Coupon collector
Universal vs strongly universal hashing
Bloom filter false positive
CountMinSketch / CountSketch / Misra-Gries
LSH and
LP relaxation
Randomised rounding
Streaming algorithms
Reservoir sampling
Approximation via unbiased estimator + variance reduction

答题模板：

先写清楚变量定义
如果是期望题，用 indicator + linearity
如果要 high probability，看是否需要 Markov / Chebyshev / Chernoff
如果是 approximation，明确比较对象是、还是真实 optimum
如果用了 previous result，直接说明 “by previous part”
不要照抄 notes，用自己的话写短证明

Written for COMP5270 S1 2026