Introduction to modern Cryptography

中英混杂的笔记。大部分照抄PPT+少部分注释
课程概览 | Coursera
关键词大概是：密码协议or安全性证明

Module 1

1.1 以凯撒密码引入

本课程使用术语：private-key, Alice, Bob

private-key encryption 不仅可以加密两个个体之间的消息，还可以加密自己给自己的消息（例如用户将密码加密后存储在自己的笔记本电脑上）

本课程的几个符号：

←-：表示函数的输出是不确定的
:=：表示函数的输出是确定的
=：表示数学相等性，与分配相反
在本课程，我们假设 Enc 函数的输出是不确定的，Dec 函数的输出是确定的

从形式上讲，加密算法是由：

a message space M: 消息空间 M
Gen (key-generation algorithm): generates k. 密钥产生算法
Enc (encryption algorithm): input-key/message, output-ciphertext.
c <- Enc_k (m) （我们允许加密算法对同一个明文的多次加密结果是不同的）
Dec (decryption algorithm): input-key/ciphertext, output-message or "error"
m := Dec_k (c)

对于任意加密算法，都需要满足一个最基本的条件：对于消息空间 M 中的任意消息 m，使用由密钥产生算法的任意 key 加密，并且使用相同 key 进行解密后，必须得到和最初 m 相同的 m'

例如，对于凯撒密码来讲：

不过这个加密方法有大问题：因为密钥空间太小，很容易通过遍历来攻击它

Kerchkhoffs's principle: 密码系统的安全性应该建立在密钥上，而不是加密算法上
（a cryptosystem should be secure, even if everything about the system, except the key, is public knowledge.）

前面我们有提到，凯撒密码因为密钥空间太小，很容易遭遇遍历攻击，一个直观的想法就是扩大密钥空间来提高安全性。
但是密钥空间大，不意味着加密算法安全。因为遍历攻击不是唯一的攻击方法。

1.2 Hex and ASCII

Hex：

Hex （16进制）是一种描述整数的方式，跟十进制相似
Hex 包括0~9，A~F
一个 Hex 数字，刚好对应4个Bits，也就是一个nibble（半字节）
一个字节(word)= 8 Bits
通常在 Hex 数字前加上 0x 来表示这是一个 Hex 的数字
eg: 0xAF，0x10

ASCII：

ASCII 用于表示英文字母或字符。在ASCII表示中，每个字符都用一个字节表示，或者说是 8 bits，2个Hex数字
字符“1”和数字"1"不是一回事

1.3 现代密码学原理的三个核心

现代密码学的三个核心原则：

安全性定义 (Formal definitions)：提供了一个精确的数学模型，以及安全性含义的正式定义
假设 (Assumptions)
安全性证明 (Proofs of security)

密码学定义通常有两个组成部分：

威胁模型 (Threat model)：假设真实世界的攻击者的能力
安全保证 (Security guarantee/goal)：协议的安全目标

几种威胁模型（攻击强度逐渐增加）：

Ciphertext-only attack（唯密文攻击）
- One ciphertext or many?
- one key or many keys?
Known-plaintext attack（已知明文攻击）
- 攻击者具有唯密文攻击的能力，即观察到一个或多个密文
- 攻击者还能够获得一堆使用相同密钥以及对相同明文加密得到的密文
Chosen-plaintext attack（选定明文攻击）
- 攻击者具有已知明文攻击的能力
- 攻击者还能够获得使用相同密钥以及攻击者选定明文加密得到的密文
Chosen-ciphertext attack（选定密文攻击）
- 攻击者具有选定明文攻击的能力
- 攻击者还能够让参与方解密攻击者选定的密文，得到明文

1.4 完全保密(Perfect Secrecy)

Secure Goal： Regardless of any prior info. the attacker has about the plaintext, the ciphertext should leak no additional information about the plaintext.

也就是说，只要看到密文不会让攻击者更容易猜出明文的字符，则不应被视为违反安全规定。

概率论复习：

random variable (随机变量)
event (事件)：Pr[E] = 事件E 发生的概率
Conditional probability (条件概率)：Pr[A | B] = Pr[A and B]/Pr[B]，即，在事件B 发生的情况下，事件A 发生的概率
X, Y are independent：Pr[X=x | Y=y] = Pr[X=x]
E₁, ... , E_n are a partition of all possibilities：

Recall：

a message space M: 消息空间 M
Gen (key-generation algorithm): generates k. 密钥产生算法
Enc (encryption algorithm): input-key/message, output-ciphertext.
c <- Enc_k (m) （我们允许加密算法对同一个明文的多次加密结果是不同的）
Dec (decryption algorithm): input-key/ciphertext, output-message or "error"
m := Dec_k (c)
K (key space): set of all possible keys
C (ciphertext space): set of all possible ciphertexts

Random variables M and K are independent.

完美保密(Perfect secrecy)的定义：

Informal: Perfect secrecy means that obseving the ciphertext should not change the attacker's knowledge about the distribution of the plaintext.
观察密文不能更改攻击者对明文概率分布的了解，即观察密文之前，攻击者对明文的分布有一定的假设，观察密文后，攻击者对明文的分布有一定的假设，这两个分布应该完全相等。
Formal: Encryption scheme (Gen, Enc, Dec) with message space M and ciphertext space C is perfectly secret if for every distribution over M, every m ∈ M, and every c ∈ C with Pr[C=c]>0, it holds that: $Pr[M=m | C=c] = Pr[M=m]$

以 Shift Cipher 为例：不符合完美保密

以 One-Time Pad 为例：符合完美保密

1.5 实现 One-Time Pad

Data representation :

Plaintext - ASCII
Key/ciphertext - hex digits, written in ASCII

Random-number generation :

Continually collect a "pool" of high-entropy (i.e., "unpredictable") data
- External inputs: Delays between network events/ Hard-disk access times/ Keystroke/ mouse movements/Other...
- Hardware random-number generation: e.g., Intel)
When random bits needed, process this data to generate independent, uniform sequence of bits
- May "block" if insufficient entropy available

how to generation :

Precise details depend on the system - Unix: /dev/random
Crypto libraries

Encryption :

Plaintext = sequence of ASCII characters
Key = sequence of hex digits, written in ASCII
Read them; XOR them to get the ciphertext

Decryption :

Reverse encryption
Read ciphertext and key; XOR them to recover the message

‍

Module 2

2.1 One-Time Pad的缺陷

limitations: Parties must share keys of (total) length equal to the (total) length of all the messages they send.

The key is as long as the message
Only secure if each key is used to encrypt a single message

但一个很不符合直觉的事情是：在这个定义下，One-Time Pad在密钥长度上已经是最佳方案了。

即，任何 key length < M length 的方案都不可能符合这个定义的安全性要求。证明如下：

‍

但这太不符合实际了，对吧。
因此，接下来我们要做的是放宽完美保密的定义，来获得一个更有意义的、在实践中有用的，并且规避了 One-Time Pad 缺陷的完美保密

‍

2.2 计算保密(Computational secrecy)

Would be ok if a scheme leaked information with tiny probability ** to eavesdroppers with bounded computational resources. i.e., relax **perfect secrecy by :

Allowing security to fail with tiny probability
- Say security fails with probability 2^-60
- Something that occurs with probability 2^-60/sec is expected to occur once every 100 billion years.
- So you have bigger problems to worry about than failure of the encryption scheme.
Only considering "efficient" attackers
- Consider brute-force search of key space
- Supercomputer ~ 2¹¹² keys/year, while Modern key space ~ 2¹²⁸ keys or more...

完美保密太严格了，其实可以“在很小的概率上泄露一些信息给计算资源有限的攻击者”，换句话说，我们可以放宽完美保密的定义：

在非常小的概率上允许安全性失效
只考虑计算资源有限的攻击者

在定义 Computational Secrecy 之前，我们先要定义完美不可区分和计算不可区分 (Perfect/Computational indistinguishability)

Perfect indistinguishability :

Let π=(Gen, Enc, Dec) be an encryption scheme, and A as an adversary
define a randomized exp't PrivK_A,π :
1. A outputs m₀, m₁ ∈ M
2. k <- Gen, b <- {0,1}, c <- Enc_k(m_b)
3. b' <- A(c); A succeeds if b' = b, and A will outputs 1 in this case
it's easy to succeed with probability 0.5
π is perfectly indistinguishable if for all attackers A, it holds that Pr[ PrivK_A,π =1 ] = 0.5
we Claim: π is perfectly indistinguishable if and only if it is perfectly secret

i.e., perfect indistinguishability is just an alternate definition of perfect secrecy

Computational indistinguishability (concrete version): π is (t, ε) - indistinguishable if for all attackers A running in time at most t, it holds that: $Pr[ PrivK_{A,π}=1] ≤ \frac{1}{2}+ε$

t 是攻击者的运行时间，ε 是一个非常小的值。它们都可以被随机选择
虽然这样定义，使用户能够自由调节 secrecy level
但这个定义有一点点缺陷：在现实世界中 t 和 ε 的选择太多种多样了，因此我们希望能改进这个理论，让用户不受某种 t 和 ε 的特定选择的约束。

Asymptotic Security(渐进安全性) : 一个加密算法或协议被认为是渐近安全的，意味着随着攻击者的计算能力增加或攻击技术的进步，该算法或协议仍然能够保护数据的机密性和完整性。

引入安全参数"n" - security parameter n∈Z⁺:

Fixed by honest parties at initialization
represent the security level: in this case, can view as the key length
Known by adversary
渐近定义(Asymptotic Definition)： Asymptotic Definition 通常用于描述函数在某种极限情况下的行为。它关注的是当某个变量（如 n）趋近于某个值（如无穷大）时，函数的行为。在这种情况下，我们不关心具体的数值，而是更注重在特定条件下（如 n→∞）的相对增长率。
具体定义(Concrete Definition)： Concrete Definition 则是针对某个对象或概念的精确定义，通常是明确的数值或公式。这种定义通常会包含具体的条件和属性，用于清晰地描述一个数学对象。例如，定义一个集合、一个函数的性质，或是某个数学结构的特征。

Computational indistinguishability (asymptotic):

Security may fail with probability negligible in n
- A function f: Z⁺ -> R^+,0 is negligible if for every polynomial p there is an N such that $f(n)<\frac{1}{p(n)}$ for n>N
- $\frac{1}{p(n)}$ = 多项式 p(n) 的倒数。意味着当n->∞时，可忽略函数趋近于0
- f(n) 要小于所有多项式函数的倒数，才是可忽略函数
- Typical example: f(n) = poly(n) · 2^-cn
- negligible function has some properties: Poly negligible = negligible*
Restrict attention to attackers running in time polynomial in n

Computational indistinguishability (asymptotic version):

Fix π, A
define a randomized exp't PrivK_A,π(n) :
1. A(1ⁿ) outputs m₀, m₁ ∈ {0,1} of equal length*
2. k <- Gen(1ⁿ), b <- {0,1}, c <- Enc_k(m_b)
3. b' <- A(c); A succeeds if b' = b, and A will outputs 1 in this case
π is indistinguishable if for all PPT attackers A, there is a negligible function ε such that: $Pr[ PrivK_{A,π}(n)=1] ≤ \frac{1}{2}+ε(n)$
- ε(n) is negligible
- Example: A runs in time t(n), then the probability of success is $Pr[ PrivK_{A,π}(n)=1] ≤ \frac{1}{2}+\frac{t(n)}{2^n}$. So this schema is indistinguishable (Poly negligible = negligible*)
- PPT = "Probabilistic Polynomial Time"，“概率多项式时间”。
This definition require the attacke to output two messages of equal length
- In general, encryption does not hide the plaintext length (short message -> short ciphertext): in general, we are willing to allow the attacker to learn that information.
- This definition takes this into account by restricting the attackers to outputting equal length messages. so our definition is limited.
- Beware that leaking plaintext length can often lead to problems: e.g. - database searches

From now on, we will assume the computational setting by default (usually, the asymptotic setting)

2.3 伪随机性(Pseudorandomness)

What does "random" mean? (or "uniform" mean?)

"Randomness" is not a property of a string, but a property of a distribution
A distribution on n-bit strings is a function D: {0,1}ⁿ -> [0,1] such that ∑_xD(x) = 1
The uniform distribution on n-bit strings is the distribution U_n where U_n(x) = 2^-n for all x ∈ {0,1}ⁿ

What does "pseudorandom" mean?

Informal: cannot be distinguished from uniform (i.e, random)
Pseudorandomness is a property of a distribution

Pseudorandomness (take 1) :

Fix some distribution D on n-bit strings: x<- D * means "sample x according to D"*
Historically, D was considered pseudorandom if it "passed a bunch of statistical tests"
- Pr_x<-D[1^st bit of x is 1] ≈ 0.5
- Pr_x<-D[parity bit of x is 1] ≈ 0.5
- Pr_x<-D[A_i(x)=1] ≈ Pr_x<-Un[A_i(x)=1] for i=1,...,20
But this is not sufficient in cryptography (We donnot know what statistical test an attacker will come up with, maybe the attacker can find a specific test that our distinguishes cannot pass).

Pseudorandomness (take 2) :

Cryptographic def'n of pseudorandomness: D is pseudorandom if it passes all efficient statistical test

Pseudorandomness (concrete) :

D be a distribution on n-bit strings
D is (t,ε)-pseudorandom if for all A running in time≤t :
$|Pr{x\leftarrow D}[A(x)=1] - Pr{x\leftarrow {U_n}}[A(x)=1]|≤ε$

Pseudorandomness (asymptotic) :

Introduction of the security parameter n, polynomial p
Let D_n be a distribution over p(n)-bit strings. e.g., p(n)=n²
Pseudorandomness is a property of a sequence of distributions {D_n} = {D₁, D₂, ...}
伪随机性是分布序列 Dn 的一个属性
{D_n} is pseudorandom if for every probabilistic, polynomial-time A, there is a negligible funcion ε such that:
$|Pr_{x\leftarrow {Dn}}[A(x)=1] - Pr{x\leftarrow {U_p(n)}}[A(x)=1]|≤ε(n)$

2.4 伪随机生成器(PRGs)

Pseudorandom (number) generators (PRGs/PRNGs) :

A PRG is an efficient, deterministic algorithm that expands a short, uniform seed into a longer, pseudorandom output

Useful whenever you have a "small" number of true random bits, and want lots of "random-looking" bits
Let G be a deterministic, poly-time algorithm
G is expanding: $|G(x)|=p(|x|) ＞|x|$
G defines a sequence of ditributions :
- D_n = the distribution on p(n)-bit strings defined by choosing x<-U_n and outputting G(x)
1.从均匀分布 U_n 中选择一个字符串 x; 2.使用一个函数或算法 G 对选择的字符串 x 进行处理，输出结果 G(x)
- Pr_Dn[y] = Pr_Un[G(x) = y] = ∑_x:G(x)=yPr_Un[x]
  = ∑_x:G(x)=y2^-n
  = | {x:G(x)=y} |/2ⁿ

PRGs :

G is a pseudorandom generator if {D_n} is pseudorandom
i.e., for all PPT attackers A, there is a negligible function ε such that:
$|Pr_{x\leftarrow {Un}}[A(G(x))=1] - Pr{y\leftarrow {U_p(n)}}[A(y)=1]|≤ε(n)$
i.e., no efficient A can distinguish whether it is given G(x) (for uniform x) or a uniform string y

Do PRGs exist ? :

We don't know, because this is one of those things that we can't hope to prove the existence of without resolving
In practice, we can assume certain G's are PRGs
Can construct PRGs from weaker assumptions (we don't know why either)

2.5 “Pseudo” one-time pad 的计算保密性证明

"Pseudo" one-time pad 加密方案：

Let G be a deterministic, poly-time function, with |G(k)| = p(|k|)
Gen(1ⁿ): output uniform n-bit key k

Security parameter n => message space {0,1}^p(n)
Enc_k(m): output G(k) ⊕ m
Dec_k(m): output G(k) ⊕ c

Our goal is to prove that the pseudo OTP meets the definition we've defined before (Computational secrecy)

Unfortunately, we are unable to prove this unconditionally. Anyway, security depends on G.

Can hope to prove security assuming that G is a pseudorandom generator

Proof by reduction(归约证明): 基本思想是通过将一个问题简化或转化为另一个已知问题来进行证明。

Assume G is a pseudorandom generator
Say there is an efficient attacker A who 'breaks' the pseudo-OTP scheme
Use A as a subroutine to build an efficient D that 'breaks' pseudorandomness of G
subroutine = 子程序
But we know that no such D exists! => No such A can exist!

归约：构造生成器D 到 attacker A 的映射，如果存在attacker A，则存在能破坏 G 的生成器 D

Alternately :

Assume G is a pseudorandom generator
Fix some arbitrary, efficient A attacking the pseudo-OTP scheme
Fix some arbitrary = 确定某个任意的
Use A as a subroutine to build an efficient D attacking G (Relate the distinguish probability of D to the success probability of A)
By assumption, the distinguishing probability of D must be negligible (Bound the success probability of A)

Security theorem(安全定理): IF G is a pseudorandom generator, then the pseudo one-time pad is (computationally) indistinguishable

Analysis:

The only way the scheme can be broken is:

If a weakness is found in G
If the definition isn't sufficiently strong...

‍

Module 3

3.1 CPA-security

Recall that security definitions have two parts:

Security goal
Threat model

不可区分性根据定义可以很很容易发现，是针对单条信息的定义。显然不符合实际需求。因此我们需要更强大的威胁模型和安全性定义。

‍

Multiple-message secrecy :

We are not going to formally define a notion of multiple-message secrecy
Instead, define something stronger : security against chosen-plaintext attacks (CPA-security)

Nowadays, this is the minimal notion of security an encryption scheme should satisfy.

Is CPA-security too strong?

In practice, there are many ways an attacker can influence what gets encrypted
- Chosen-plaintext attacks encompass any such influence
- Unfortunately, not clear how best to model
Moreover, in certain cases an attacker may have significant control over what is encrypted

CPA-security :

Fix π, A
Define a randomized experiment PrivCPA_A,π (n):
1. k <- Gen(1ⁿ)
2. A(1ⁿ) interacts with an encryption oracle Enc_k(·), and then outputs m₀, m₁ of the same length
3. b <- {0,1}, c <- Enc_k(m_b), give c to A
4. A can continue to interact with Enc_k(·)
5. A outputs b'; A succeeds if b=b', and experiment evaluates to 1 in this case
π is secure against chosen-plaintext attacks(CPA-secure) if for all PPT attackers A, there is a negligible function ε such that :
$Pr[ PrivCPA_{A,π}(n)=1] ≤ \frac{1}{2}+ε(n)$

the exp't is Impossible?

Consider the following attacker A :
- Obtain c₀=Enc_k(m₀) and c₁=Enc_k(m₁) using chosen-plaintext attack
- Output m₀, m₁; get challenge ciphertext c
- IF c=c₀ output '0'; if c=c₁ output '1'
- A succeeds with probability 1?
This attack only works if encryption is deterministic!
Moral: randomized encryption must be used!

Randomized encryption :

The issue is not an artifact of our definition
It really is a problem if an attacker can tell when the same message is encrypted twice

3.2 PRFs (Pseudorandom functions，伪随机函数)

Pseudorandom functions :

Informally, a pseudorandom function "looks like" a random funtion
Func_n = all funtions mapping {0,1}ⁿ to {0,1}ⁿ
How big is Func_n?
- Can represent a function in Func_n using n·2ⁿ bits => $| Func_n | =2^{n*2^n}$
- $| Func_n | =|Y|^{|X|}$，X和Y分别是输入和输出空间

Random function :

Uniform function
Choose uniform f ∈ Func_n
Equivalently, for each x ∈ {0,1}ⁿ, choose f(x) uniformly in {0,1}ⁿ
- I.e., fill up the function table with uniform values
- Can also think of this being done "on-the-fly"

Keyed functions :

As in our discussion of PRGs, it does not make sense to talk about any fixed function being pseudorandom. We look instead at keyed functions.
Let F: {0,1} x {0,1} -> {0,1}* be an efficiently computable function
- Define F_k(x) = F(k, x)
- The first input is called the key
Assume F is length preserving: F(k, x) only defined if |k| = |x|, in which case | F(k, x) | = | k | = | x |
Choosing a uniform k ∈ {0,1}ⁿ is equivalent to choosing the function F_k: {0,1}ⁿ -> {0,1}ⁿ

I.e., F defines a distribution over functions in Func_n

PRF的安全模型：

F is a pseudorandom function if F_k, for uniform key k ∈ {0,1}ⁿ, is indistinguishable from a uniform function f ∈ Func_n

也就是说，攻击者不能区分 F_k 是伪随机函数，还是一个真正的随机函数
攻击者模型：
- 考虑两个挑战者，每个挑战者都控制着一个函数 f，只不过不同的挑战者选择f时候的方法不同：
- 一个挑战者随机选择一个密钥 k，令 f := F (k, ·)；
- 另一个挑战者控制的f则是一个随机函数 (从 Funs[X, Y] 中随机选择一个函数并令之为 f)。
- 攻击者A可以进行多次探测：A 可以向挑战者发送一个元素 x_i∈X，挑战者将相应的 y_i:= f(x_i) 返回给 A
- 如果攻击者A 不能区分两个挑战者，则PRF是安全的
Formally, for all poly-time D:

$$
|Pr_{k \leftarrow {{0,1}}^n} [D^{Fk(·)=1}] - Pr{f \leftarrow {Func_n}} [D^{F_k(·)=1}]| ≤ε(n)
$$

PRFs vs. PRGs

PRF F immediately implies a PRG G:
- Define G(k) = F_k(0...0) | F_k(0...1)
- Or G(k) = F_k(0) | F_k(1) | F_k(2) | ...
PRF can be viewed as a PRG with random accesss to exponentially long output: F_{k <=>} F_k(0...0) | ... | F_k(1...1)
这句话的意思是，伪随机函数（PRF）可以被视为一种伪随机生成器（PRG），PRF具有对输出的随机访问（指的是能够在不知道完整输出的情况下，直接访问输出的某个特定部分），且输出的长度可以是指数级的（k, x都可变）。

Pseudorandom permutations (伪随机置换)：

Let F be a length-preserving, keyed function
F is a keyed permutation if :
- F_k is a bijection for every k (bijection=双射)
- F_k^-1 is efficiently computable (where $F_k^{-1}(F_k(x)) = x$，即F_k 存在逆函数)
F is a pseudorandom permutation if F_k, for uniform key k∈{0,1}ⁿ is indistinguishable from a uniform permutation f ∈ Perm_n

这是 PRP 的定义。PRP 的安全性要求：其与随机置换在计算上不可区分
通过定义可以知道，PRF 和 PRP（分组密码）之间最主要的区别就是，PRP 是有逆函数的，而 PRF 未必有逆函数。

Block ciphers :

Block ciphers are practical constructions of pseudorandom permutations
No asymptotics: $F:{0,1}^n * {0,1}^m \rightarrow {0,1}^m$
- n = "key length"
- m = "block length"
Hard to distinguish F_k from uniform f ∈ Perm_m even for attackers running in time ≈ 2ⁿ

AES :

Advanced encryption standard (AES)
- Standardized by NIST in 2000 based on a public, worldwide competition lasting over 3 years
- Block length = 128 bits
- Key length = 128, 192 or 256 bits
No real reason to use anything else

3.3 一个 CPA-secure encryption 的计算保密性证明

CPA安全方案的构建与安全性证明 | 辛未羊的网络日志 (panqiincs.me)

Note on block ciphers :

For large enough n, a random permutation is indistinguishable from a random function
So block ciphers are good PRFs, also

CPA-secure encryption :

Let F be a keyed function
Gen(1ⁿ): choose a uniform key k ∈ {0,1}ⁿ
Enc_k(m), for | m | = | k | :
- Choose uniform r ∈ {0,1}ⁿ
- Output ciphertext <r, F_k(r) ⊕ m>
Dec_k(<c₁, c₂>): output c₂ ⊕ F_k(c₁)
The key is as long as the message. But the point is that the scheme can be used to safely encrypt multiple messages

Security?

Theorem: if F is a pseudorandom function, then this scheme, π, is CPA-secure.
Proof by reduction. 构造生成器D 到 attacker A 的映射，如果存在attacker A，则存在能破坏 pseudorandom function 的生成器 D
攻击者可以多次与加密预言机进行选择明文的交互(p1)。攻击时的交互如同p2，当加密算法返回加密结果后，攻击者可以继续p1的选择明文交互

Analysis :

μ(n)为攻击者成功概率

Let $μ(n)=Pr[PrivCPA_{Adv,π}(n)=1]$，$μ(n)$为攻击者成功概率
Let $q(n)$ be a bound on the number of encryption queries made by attacker
If f = F_k for uniform k, then the view of Adv is exactly as in PrivCPA_Adv,π(n)
- $Pr_{k \leftarrow{0,1}^n}[D^{Fk(·)}=1]=Pr[PrivCPA{Adv,π}(n)=1]=μ(n)$
If f is uniform, there are two sub-cases
- 根据随机值r* 是否被使用可分为两种情况
- r* was queried some other time (event Repeat)
- r* was not queried some other time (event ¬Repeat)
$Pr_f[D^{f(·)}=1] \leq Pr_f[D^{f(·)}=1 | \neg \textbf{Repeat}]+Pr[\textbf{Repeat}]$
- $Pr[\textbf{Repeat}] \leq \frac{q(n)}{2^n}$：当随机值r 被使用过时，攻击者A能轻易的判断出哪条消息被加密。而 r 在攻击者与加密预言机交互过程中被使用的概率为 $\frac{q(n)}{2^n}$
- $Pr_f[D^{f(·)}=1 | \neg \textbf{Repeat}] = \frac{1}{2}$：当随机值r* 没有被使用时，我们不知道任何关于 F_k(r) ⊕ m 的信息，因此此时攻击者A只能进行猜测，猜对的概率为 $\frac{1}{2}$
Since F is pseudorandom
- $|μ(n)-Pr_f[D^{f(·)}=1]| \leq ε(n)$
- $μ(n) \leq Pr_f[D^{f(·)}=1] + ε(n) \leq \frac{1}{2}+\frac{q(n)}{2^n}+ε(n)$
For any polynomial q, the term $\frac{q(n)}{2^n}$ is negligible
该方案的安全性关键在于： 避免生成挑战密文c 时所用到的随机值r* 在加密预言机回答敌手询问时被使用，但即使是重复使用了，概率也是可以忽略的。

3.4 Modes of encryption

CPA-secure encryption :

Enc_k(m) = (r, F_k(r) ⊕ m)
- A 1-block plaintext results in a 2-block ciphertext
- Only defined for encryption of n-bit messages
Recall that CPA-security => secrecy for the encryption of multiple messages. So, can encrypt the message m₁, ..., m_t as Enc_k(m₁), Enc_k(m₂), ..., Enc_k(m_t)
- This is also CPA-secure

Drawback :

The ciphertext is twice the length of the plaintext
I.e., ciphertext expansion of a factor of two

Can we do better?

Modes of operation :

Stream cipher modes of operation (流密码安全模式)
- Synchronized (同步)
- Unsynchronized (非同步)

CTR mode :

Enc_k(m₁, ..., m_t)
- Choose ctr <- {0,1}ⁿ , set c₀ = ctr
- For i=1 to t :
  c_i = m_i ⊕ F_k(ctr+i)
- Output c₀, c₁, ..., c_t
Ciphertext expansion is just 1 block
CPA-secure proof

CBC mode :

Enc_k(m₁, ..., m_t)
- Choose random c₀<- {0,1}ⁿ , also called the IV
- For i=1 to t :
  c_i = F_k(m_i ⊕ c_i-1 )
- Output c₀, c₁, ..., c_t
Decryption requires F to be invertible
Ciphertext expansion is just 1 block
CPA-secure proof

ECB mode :

Enc_k(m₁, ..., m_t) = F_k(m₁), ..., F_k(m_t)
Deterministic => Not CPA-secure!
Can tell from the ciphertext whether m_i = m_j
- Does not even have indistinguishable encryptions!

3.5 CCA-secure

So far...

Have been assuming only a passive, eavesdropping attacker
What if the attacker can be active?
- E.g., interfering with the communication channel
- E.g., "impersonating" the sender; injecting communication on the channel

Malleability :

(Informal: ) A scheme is malleable if it is possible to modify a ciphertext and thereby cause a predictable change to the plaintext
Malleability can be dangerous!
- Encrypted email
- Encrypted transactions
All the schemes we have seen so far are malleable!
- E.g., the one-time pad
- Perfect secrecy does not imply non-malleability

Chosen-ciphertext attacks :

Models settings in which the attacker can influence what gets decrypted, and observe the effects
Allow attacker to submit ciphertexts of its choice to the receiver, and learn the corresponding plaintext
- In addition to being able to carry out a chosen-plaintext attack!

CCA-security :

Define a randomized experiment PrivCCA_A,π (n):
1. k <- Gen(1ⁿ)
2. A(1ⁿ) interacts with an encryption oracle Enc_k(·), and a decryption oracle Dec_k(·), and then outputs m₀, m₁ of the same length
3. b <- {0,1}, c <- Enc_k(m_b)
4. A can continue to interact with Enc_k(·), Dec_k(·), but may not request decryption of c
5. A outputs b'; A succeeds if b=b', and experiment evaluates to 1 in this case
π is secure against chosen-ciphertext attacks(CCA-secure) if for all PPT attackers A, there is a negligible function ε such that :
$Pr[ PrivCCA_{A,π}(n)=1] ≤ \frac{1}{2}+ε(n)$

Chosen-ciphertext attacks and malleability :

If a scheme is malleable, then it cannot be CCA-secure
- Modify c, submit modified ciphertext c' to the decryption oracle
CCA-security implies non-malleability

3.6 Padding-Oracle Attacks

Overview :

In the definition of CCA-security, the attacker can obtain the decryption of any ciphertext of its choice (besides the challenge ciphertext)
We show a real-world scenario where:
- One bit about decrypted ciphertexts is leaked
- This can be exploited to learn the entire plaintext

Arbitrary-length messages? (for sender: how to fill it when there is no complete block)

Message -> encoded data -> ciphertext
- Assume message is an integral # of bytes (假设mes是字节的整数倍，但不必是块大小的整数倍)
PKCS #5 encoding:
- Let $L$ be the block length (in bytes) of the cipher
- Let $b$ be the # of bytes that need to be appended to the messages to get length a multiple of $L$:
```
                                                                1≤$b$≤$L$, note $b$≠0
```
- Append $b$ (encoded in 1 byte), $b$ times
I.e., if 3 bytes of padding are needed, append 0x030303

Decryption? (for receiver: how to remove those padding data)

Use CBC-mode decryption to obtain encoded data
Say the final byte of encoded data has value $b$
- If $b$=0 or $b$>$L$, return "error"
- If final $b$ bytes of encoded data are not all equal to $b$, return "error"
- Otherwise, strip off the final $b$ bytes of the encoded data, and output what is left as the message

Padding oracles :

Padding oracles frequently present in web applications
Even if an error is not explicily returned, an attacker might be able to detect differences in timing, behavior, etc.

Main idea of the attack :

Given two-block ciphertext IV, c
- Encoded data = F_k^-1(c) ⊕ IV
Main observation: If an attacker modifies the $i$th byte of IV, this causes a predictable change (only) to the $i$th byte of the encoded data
we can predicate the original messages' length by "error" message

Attack complexity?

≤$L$ tries to learn the # of padding bytes
≤ 2⁸=256 tries to learn each plaintext byte

CCA-security: a summary

Chosen-ciphertext attacks represent a significant, real-world threat
Modern encryption schemes designed to be CCA-secure
Will see an example of a CCA-secure scheme next

Module 4

4.1 Message Integrity（消息完整性）

Secrecy vs. integrity :

So far we have been concerned with ensuring secrecy of communication
What about integrity?
- I.e., ensuring that a received message originated from the intended party, and was not modified. Even if an attacker controls the channel!
- Standard error-correction techniques not enough! (Standard error-correction techniques: 标准纠错技术)
- The right tool is a message authentication code (or MAC)
Secrecy and integrity are orthogonal concerns (保密性和完整性是两个相互独立的问题)
- Possible to have either one without the other
- Encryption does not (in general) provide any integrity
- Related to the issue of malleability (issue of malleability: 方案的可扩展性)

Message authentication code (MAC):

A message authentication code is defined by three PPT algorithms (Gen, Mac, Vrfy):

Gen: takes as input 1ⁿ; outputs k. (Assume |k|≥n.)

Mac: takes as input key k and message m∈{0,1}*; outputs tag t

                              $t:= Mac_k(m)$

Vrfy: takes key k, message m, and tag t as input; outputs 1 ("accept") or 0 ("reject")

Security :

Only one standard definition
Threat model
- "Adaptive chosen-message attack"
- Assume the attacker can induce the sender to authenticate messages of the attacker's choice
  假设攻击者可以诱导 sender 对攻击者选择的信息生成 t
Security goal
- "Existential unforge ability"
- Attacker should be unable to forge a valid tag on any message not authenticated by the sender
  攻击者应无法在未经 sender 验证的任何信息上伪造有效的 t

Security?

Is the definition too strong?
- We don't want to make any assumptions about what the sender might authenticate
- We don't want to make any assumptions about what forgeries are "meaningful"
A MAC satisfying our definition can be used anywhere integrity is needed

Formal definition :

Fix A, π
Define randomized experiment Forge_A,π(n) :
1. k <- Gen(1ⁿ)
2. A interacts with an oracle Mac_k(·), let M be the set of messages submitted to this oracle
3. A outputs (m, t)
4. A succeeds, and the experiment evaluates to 1, if Vrfy_k(m, t)=1 and m∉M

π is secure if for all PPT attackers A, there is a negligible function ε such that :

                             $Pr[Forge_{A,π}(n)=1] ≤ ε(n)$

Replay attacks are not considered :

Note that replay attacks are not prevented
- No stateless mechanism can prevent them (任何无状态机制都无法阻止它们)
Replay attacks can be a significant real-world concern
Need to protect against replay attacks at a higher level (需要在更高层次上防范重放攻击)

4.2 A fixed-length MAC

Intuition :

We need a keyed function Mac such that
- Given Mac_k(m₁), Mac_k(m₂), ...,
- ...it is infeasible to predict the value Mac_k(m) for any m∉{m₁, ..., }
Let Mac be a pseudorandom function !

Construction :

Let F be a length-preserving pseudorandom function
Construct the following MAC π:
- Gen - choose a uniform key k for F
- Mac_k(m) - output F_k(m)
- Vrfy_k(m, t) - output 1 if F_k(m)=t
Theorem: π is a secure MAC

Proof by reduction :

When $D$ interacts with $F_k$ for uniform $k$, the view of the adversary is identical to its view in the real MAC experiment

                                       $Pr[D^{F_k} outputs~1] = Pr[Forge_{Adv,π}(n)=1]$

When $D$ interacts with uniform $f$, then seeing $f(m_1)$, ..., $f(m_i)$ does not help predict $f(m)$ for any $m \notin {m_1,~...,m_i}$
- $Pr[D^f~outputs~1] = 2^{-n}$
Since $F$ is a pseudorandom function,
- $|Pr[D^{F_k} outputs~1] - Pr[D^f outputs~1]| \lt negl(n)$
- $\Rightarrow Pr[Forge_{Adv,π}(n)=1]=Pr[D^F~outputs~1] \le 2^{-n}+negl(n)$

Drawbacks :

In practice, block ciphers (aka PRFs) have short, fixed-length block size.
PRFs will double the length of messages
- E.g., AES has a 128-bit block size
- So the previous construction is limited to authenticating short, fixed-length messages

4.3 CBC-MAC

Recall :

We showed how to construct a secure MAC for short, fixed-length messages based on any PRF/block cipher
We want to extend this to a secure MAC for arbitrary-length messages
- Here: CBC-MAC

CBC-MAC vs. CBC-mode :

CBC-MAC is deterministic (no IV)
In CBC-MAC, **only the final value is output
- Verification is done by re-computing the result
Both of these are essential for security

Security of (basic) CBC-MAC :

If $F$ is a length-preserving pseudorandom function, then for any fixed $l$. basic CBC-MAC is a secure MAC for messages of length $l*n$
- $l$ represented the number of blocks in the message that we were authenticating
I.e., the sender and receiver must agree on the length parameter $l$ in advance
- Basic CBC-MAC is not secure if this is not done!

CBC-MAC extensions :

Several ways to handle variable length messages

One of the simplest: prepend the message length before applying (basic) CBC-MAC
Can also be adapted to handle messages whose length is not a multiple of the block length

4.4 Hash functions

Hash functions :

(Cryptographic) hash function: maps arbitrary length inputs to a short, fixed-length digest
Can define keyed or unkeyed hash funcitons\
- Formally, keyed hash functions are needed
- In practice, hash functions are unkeyed (We will work with unkeyed hash functions, and be less formal)

Collision-resistence (抗碰撞性):

Let $H: {0,1}^*->{0,1}^n$ be a hash function
A collision is a pair of distinct inputs $x$, $x'$ such that $H(x) = H(x')$
$H$ is collision-resistant if it is infeasible to find a collision in $H$

Generic hash-function attacks :

What is the best "generic" collision attack on a hash function $H: {0,1}^*->{0,1}^n$ ?
If we compute $H(x1)$, ..., $H(x{2^n+1})$, we are guaranteed to find a collision
- Is it possible to do better?
"Birthday" attacks and Hash collision:
Compute $H(x1)$, ..., $H(x{2^{n/2}})$
- What is the probability of a collision?
- Related to so-called birthday paradox: How many people are needed to have a 50% chance that some two people share a birthday?
When the number of balls is O(N^1/2 ), the probability of a collision is 50%
- Birthdays: 23 people suffice
- Hash functions: O(N^1/2) hash function-evalutions
Hash functions need 2n-bit output length to get security against attackers running in 2ⁿ time
- Note: twice the length of block cipher keys

Hash functions in practice :

MD5
- Developed in 1991
- 128-bit output length
- Collisions found in 2004, should no longer be used
SHA-1
- Introduced in 1995
- 160-bit output length
- Theoretical analyses indicate some weaknesses, so trend to migrate to SHA-2
SHA-2
- 256-bit or 512-bit output lengths
- No known significant weaknesses
SHA-3/Keccak
- Result of a public competition from 2008-2012
- Very different design than SHA family. If we find some weakness in the SHA family, it won't implicate it
- Supports 224, 256, 384, and 512-bit outputs

4.5 HMAC

Recall :

We showed how to construct a secure MAC for short, fixed-length messages based on any PRF/block cipher
We want to extend this to a secure MAC for arbitrary-length messages
- Here: using hash functions

Two construction :

Security?

If the MAC is secure for fixed-length messages, and H is collision-resistant, then the second construction is a secure MAC for arbitrary-length messages

Proof sketch :

Say the sender authenticats M₁, M₂, ...
- Let $h_i=H(M_i)$
Attacker outputs forgery $(M,t)$, $M\neq M_i$ for all $i$
Two cases:
- $H(M)=H(M_i)$ for some $i$: Collision in $H$! -- Impossible for collision-resistent MAC
- $H(M) \neq h_i$ for all $i$: Forgery in the underlying, fixed-length MAC -- Impossible for collision-resistent MAC

Instantiate in practice?

Hash function + block cipher-based MAC?

Directly couple them together: block length mismatch
Pair well together: need to implement two crypto primitives

So we have HMAC :

Constructed entirely from (certain type of) hash functions
- MD5, SHA-1, SHA-2
- Not SHA-3
Can be viewed as following the hash-and-MAC paradigm
- With (part of the) hash function being used as a block cipher

4.6 Authenticated Encryption

Secrecy + integrity?

We have shown primitives for achieving secrecy and integrity in the private-key setting
What if we want to achieve both?

Construct 1: Encrypt and authenticate

The tag t might leak information about m !
- Nothing in the definition of security for a MAC implies that it hides information about m
If the MAC is deterministic (as are CBC-MAC, HMAC), it leaks whether the same message is encrypted twice

Construct 2: Encrypt then authenticate

If the encryption scheme is CPA-secure and the MAC is secure then :
- The combination is CPA-secure
- The combination is a secure MAC

Authenticated encryption :

The combination achieves something stronger:
- Given ciphertexts corresponding to (chosen) plaintexts m₁, ..., m_k, it is infeasible for an attacker to generate any new, valid ciphertext
Authenticated encryption scheme
- Infeasible to generate any new, valid ciphertexts
In combination with CPA-security, this implies CCA-security
The encrypt-then-authenticate approach (with independent keys) is a sound way to construct an authenticated encryption scheme
Other, more efficient constructions have been proposed and are an active area of research and standardization

4.7 Secure Communication Sessions

Secure sessions?

Consider parties who wish to communicate securely over the course of a session :
- "Securely" = secrecy and integrity
  “安全"=保密性+完整性
- "Session" = period of time over which parties are willing to maintain state
  “会话"=各方愿意维持对话状态的时间段
Can use authenticated encryption

加密图示： (此图的 Enc 代表的是上一节提到的 Encryption+Authentication 方案)

上图情况不能防范的攻击：

Secure sessions

These attacks (and others) can be prevented using counters and a identities. (使用计数器和身份验证来防止)
One can define a notion of secure session and prove that this construction realizes it

Summary :

We have finished our treatment of the private-key setting. (Though it will come up again later)
Primitives
- Privacy: private-key encryption schemes
- Integrity: message authentication codes
- Both: authenticated encryption
- Collision-resistant hash functions

Module 5

5.1 Algorithmic number theory

Why now?

We have not needed any number theory or "advanced math" until now
- Practical private-key cryptography is based on stream ciphers, block ciphers, and hash functions
- Lots of non-trivial crypto can be done without any number theory
But the public-key world is different !

This module :

Covaer basic number theory quickly
Goal is to cover the minimum needed for the applications we will study
- Some facts stated without proof

Algorithmic number theory :

Efficient algorithms for various computations are critical for efficient implementation of cryptography
- Asymptotics measured in terms of the input length
- $||a|| = O(log~a);~a=2^{||a||}$
Will not dwell extensively on this
- Goal: classify problems as either "easy" or "hard"
Efficient computations:
- Integer addition / subtraction / multiplication / division with remainder (整数加法/减法/乘法/带余除法)
- Modular addition / subtraction / multiplication / reduction (模的加法/减法/乘法/减法)
- (Modular) exponentiation? (模的)指数计算？
- Important computation; good illustration of a non-trivial algorithm

callback-Modular arithmetic :

Notation :
- $[a~mod~N]$ is the remainder of a when divided by $N$
- Note $0 \leq [a~mod~N] \leq N-1$
$a=b ~mod~N \Longleftrightarrow [a~mod~N]=[b~mod~N]$

Exponentiation :

Compute $a^b$ ?
- $||a|| = O(b*||a||)$
- Just writing down the answer takes exponential time
Compute $[a^b~mod~N]$
- Size of the answer $< ||N||$
- Do not compute $a^b$ and then reduce modulo $N$

Efficient exponentiation :

Consider the following algorithm:

exp (a, b, N){
// assume b≥0
ans = 1;
for (i=1; i≤b; i++)
    ans = [ans*a mod N];
return ans;
}

It is running for a total of B times.
But we would like an algorithm that runs in time polynomial in the length of B

‍

Efficient exponentiation :

Assume $b=2^k$ for simplicity
- The preceding algorithm roughly corresponds to computing $aaa...a$
- Better: compute $((((a)^2)^2)^2...)^2$
- $2^k$ multiplications vs. $k$ squarings. Note $k=O(||b||)$

Consider the following algorithm :

exp (a, b, N){
// assume b≥0
x=a, t=1;
while (b>0){
    if (b odd)
        t = [t*x mod N], b=b-1;
    x = [x^2 mod N], b=b/2; }
return t;
}

Why does this work?
- Invariant: answer is $[t*x^b~mod~N]$
Running time is polynomial in $||a||$, $||b||$, $||N||$

5.2 Groups

This part :

Introduce the notion of a group
Provides a way of reasoning about objects that share the same mathematical structure
- Not absolutely needed to understand crypto applications, but does make it conceptually easier

Groups :

An abelian group is a set $G$ and a binary operation $\circ$ defined on $G$ such that:
- There is an identity $e∈G$ such that $e \circ g=g$ for $g∈G$ (有单位元)
- Every $g∈G$ has an inverse $h∈G$ such that $h \circ g=e$ (有逆元)
- (Associativity) For all $f$, $g$, $h∈G$, such that $f \circ(h \circ g)=(f \circ h) \circ g$ (符合结合律)
- (Commutativity) For all $g$, $h∈G$, such that $h \circ g=g \circ h$ (符合交换律)
The order of a finite group $G$ is the number of elements in $G$
(有限群 G 的阶数是 G 中元素的个数)

Representation :

The group operation can be written additively or multiplicatively
- I.e., instead of $g \circ h$, write $g+h ~or ~gh$
- Does not mean that the group operation corresponds to (integer) addition or multiplication
Identity denoted by 0 or 1, respectively
(加法和乘法的单位元分别用0和1表示)
Inverse of $g$ denoted by $-g$ or $g^{-1}$, respectively
Group exponentiation: $m \cdot a$ or $a^m$
- a is a group element, and m is an integer.
- what we're referring to is repeated application of the group operation to the group element a, a total of m times.

Computations in groups

Assume it is possible to efficiently recognize group elements
Assume the group operation can be computed efficiently
- Group exponentiation can be computed efficiently

‍

Example :

$Z_N={0,~...,~N-1}$ under addition modulo $N$ (完全剩余系)
- Identity is $0$
- Inverse of a is $[-a~mod~N]$
- Associativity, commutativity obvious
- Order $N$
- $m \cdot a=a+...+a~mod~N$, can be computed efficiently
What happens if we consider multiplication modulo $N$
- ${0,~...,~N-1}$ is not a group under this operation
- $1$ would be the identity
- but $0$ has no inverse
- Even if we exclude 0, there is, e.g., no inverse of 2 modulo 4

Invertible :

Say $b$ is invertible modulo $N$ if there is an element, $b^{-1}$, such that $b \cdot b^{-1}=1~mod~N$
- If such $b^{-1}$ exists, it is unique modulo $N$
Define "division by $b$" = "multiplication by $b^{-1}$"
- Only defined when $b$ is invertible modulo $N$

Modular inverses :

Theorem: $b$ is invertible modulo $N$ if and only if $gcd(b,N)=1$ = "b和N互素"
Invertibility can be determined efficiently
- Because $gcd$ can be computed efficiently
If $p$ is prime, then $1,~2,~3,~...,~p-1$ are all invertible modulo $p$
If $N=pq$ for $p$, $q$ distinct primes, then the invertible elements are the integres from $1$ to $N-1$ that are not multiples of $p$ or $q$

So we get a group :

$Z^_N =$ invertible elements between $1$ and $N-1$ under multiplication modulo $N$ (简化剩余系)*
- Closure not obvious, but can be shown
- Identity is $1$
- Inverse of $a$ is $[a^{-1}~mod~N]$
- Associativity, commutativity obvious
- $a^m=a~··· a~mod~N$, can be computed efficiently

$\Phi(N)$ --欧拉函数 :

$\Phi(N)=$ the number of invertible elements modulo $N$
- = $|{ a∈ {1,...,N-1}~:~gcd(a,N)=1 }|$
- = the oreder of $Z^*_N$
If $N$ is prime, then $\Phi(N)=N-1$
If $N=pq$, $p$ and $q$ distinct primes, $\Phi(N)=(p-1)(q-1)$
- $=|{1,...,N-1}|-|multiples~of~p|-|multiples~of~q|$
- $=N-1-(p-1)-(q-1)=N-q-p+1=(p-1)(q-1)$

Fermat's little theorem :

Let $G$ be a finite group of order $m$. Then for any $g∈G$, it holds that $g^m=1$

Examples :

In $Z_N$:
- For all $a∈Z_N$, we have $N \cdot a=0~mod~N$
In $Z^*_N$:
- For all $a∈Z^*_N$, we have $a^{\Phi(N)}=1~mod~N$
- $N$ prime: for all $a∈Z^*_N$, we have $a^{N-1}=1~mod~N$

Corollary (推论):

Let $G$ be a finite group of order $m$. Then for $g∈G$ and integer $x$, it holds that $g^x = g^{[x~mod~m]}$
- Proof: Let $x=qm+r$. Then $g^x = g^{qm+r} = (g^m)^q g^r = g^r$
This can be used for efficient computation...
- reduce the exponent modulo the group order before computing the exponentiation
Let $G$ be a finite group of order $m$. For any integer $e$, define $f_e(g) = g^e$. If $gcd(e,m)=1$, then $f_e$ is a permutation.
Moreover, if $d=e^{-1} ~mod~m$ then $f_d$ is the inverse of $f_e$
- Proof: The first part of follows from the second. And $f_d(f_e(g)) = (g^e)^d = g^{ed} = g^{[ed~mod~m]}=g^1=g$

5.3 Hard problems & RSA

Hard problems :

So far, we have only discussed number-theoretic problems that are easy
Some problems are (conjectured to be) hard

Factoring (因数分解):

Multiplying two numbers is easy; factoring a number is hard
- Given $x,y$, easy to compute $x \cdot y$
- Given $x \cdot y$, hard to find $x$ and $y$
It's not hard to factor all numbers
- 50% of the time, random number is even (even=偶数)
- 1/3 of the time, random number is divisible by $3$
The hardest numbers to factor are those that are the product of two, equal-length primes

The RSA problem :

The factoring problem is not directly useful for cryptography

So we will not formalize it
Instead, introduce a problem related to factoring: the RSA problem

The RAS problem: setting

For the rest of this lecture, $N=pq$ with $p$ and $q$ distinct, odd primes
$Z_N^*=$ invertible elements under multiplication modulo $N$
- The order of $Z_N^*$ is $\Phi(N)=(p-1)(q-1)$
- $\Phi(N)$ is easy to compute if $p$, $q$ are known
- $\Phi(N)$ is hard to compute if $p$, $q$ are not known
- Equivalent to factoring $N$

Recall : Let $G$ be a finite group of order $m$. For any integer $e$, define $f_e(g) = g^e$. If $gcd(e,m)=1$, then $f_e$ is a permutation. Moreover, if $d=e^{-1} ~mod~m$ then $f_d$ is the inverse of $f_e$

The RSA problem :

Here: group $Z_N^*$, of order $\Phi(N)$
Fix $e$ with $gcd(e, \Phi(N))=1$
- Exponentiation to the $e$-th power is a permutation on $Z_N^*$
If $ed=1~mod~ \Phi(N)$, raising to the $d$-th power is the inverse of raising to the $e$-th power
- I.e., $(x^e)^d=x~mod~N$, $(x^d)^e=x~mod~N$
- $x^d$ is the $e$-th root of $x~mod~N$

The RSA problem ~ factoring:

If $p$, $q$ are known :

$\Rightarrow \Phi(N)$ can be computed

$\Rightarrow d=e^{-1}~mod~\Phi(N)$ can be computed

$\Rightarrow$ possible to compute $e$-th roots $mod~N$
If $p$, $q$ are not known :

$\Rightarrow$ computing $\Phi(N)$ is as hard as factoring $N$

$\Rightarrow$ computing $d$ is as hard as factoring $N$

The RSA assumption (informal):

Informally: given $N$ and $e$, hard to compute the $e$-th root of a uniform element $y∈Z_N^*$

The RSA assumption (formal):

Need to specify how $N$, $e$ are generated
GenRSA: on input $1^n$, outputs $(N,e,d)$ with $N=pq$ a product of two $n$-bit primes, and with $ed=1~mod~\Phi(N)$
Let GenRSA be an algorithm that, on input $1^n$, outputs $(N,e,d)$ with $ed=1~mod~\Phi(N)$
Experiment $RSA-inv_{A,GenRSA}(n)$ :
- Compute $(N,e,d) \gets GenRSA(1^n)$
- Choose uniform $y∈Z_N^*$
- Run $A(N,e,y)$ to get $x$
- Experiment evaluates to $1$ if $x^e=y~mod~N$
The RSA problem is hard relative to GenRSA if for all PPT algorithms A,

$$
Pr[RSA-inv_{A,GenRSA}(n)=1] \lt negl(n)
$$

Implementing GenRSA :

One way to implement GenRSA :
- Generate uniform $n$-bit primes $p$,$q$
- Set $N:=pq$
- Choose arbitrary $e$ with $gcd(e,\Phi(N))=1$
- Compute $d:=[e^{-1}~mod~\Phi(N)]$
- Output $(N,e,d)$
Choice of $e$?
- Does not seem to affect hardness of the RSA problem
- $e=3,~2^{16}+1$ for efficient exponentiation

RSA and factoring :

If factoring moduli output by GenRSA is easy, then the RSA problem must be easy relative to GenRSA
- Factoring is easy $\Rightarrow$ RSA problem is easy
Hardness of the RSA problem is not known to be implied by hardness of factoring
- Possible factoring is hard but RSA problem is easy
- Possible both are hard but RSA problem is "easier"
- Currently, RSA is as hard as factoring
也就是说 RSA problem 和 factoring problem 的因果关系并没有得到证明

5.4 Discrete-logarithm problem(离散对数问题)

Cyclic group (循环群):

Let $G$ be a finite group of order $m$ (written multiplicatively) (有限乘法群)
Let $g$ be an element of $G$
Consider the set ${ g^0,g^1,...}$
- We know $g^m=1=g^0$, so the set has $\le m$ elements
- If the set has $m$ elements, then it is all of $G$
- In this case, we say $g$ is a generator of $G$ (生成元)
- If $G$ has a generator, we say $G$ is cyclic

Examples :

$Z_N$
- Cyclic (for any $N$); $1$ is always a generator: ${0,1,2,~...,N-1 }$
$Z_8$
- 3 is a generator: ${ 0,13,6,1,4,7,2,5 }$
- 2 is not a generator: ${ 0,2,4,6 }$

Important examples :

Theorem: Any group of prime order is cyclic, and every non-identity element is a generator
Theorem: If $p$ is prime, then $Z_P^*$ is cyclic (of order $p-1$)

Uniform sampling :

Given group $G$ of order $m$ and generator $g$, easy to sample a uniform element $h∈G$

Choose uniform $x∈{0,~...,m-1}$; set $h:=g^x$

Discrete-logarithm problem :

Fix cyclic group $G$ of order $m$, and generator $g$
We know that ${ g^0,g^1,~...,g^{m-1}}=G$
- For every $h∈G$, there is a unique $x∈Z_m$ such that: $g^x=h$
- Define $log_gh$ to be this $x$ - the discrete logarithm of $h$ with respect to $g$ (in the group $G$)
Dlog problem in $G$: Given $g$, $h$, compute $log_gh$
Dlog assumption in $G$: Solving the discrete log problem in $G$ is hard

Discrete-logarithm problem (formal):

Let $g$ be a group-generation algorithm
- On input $1^n$, outputs a cyclic group $G$, its order $q$ (with $||q||=n$), and a generator $g$
For algorithm $A$, define exp't $Dlog_{A,g}(n)$ :
- Compute $(G,q,g) \leftarrow g(1^n)$
- Choose uniform $h∈G$
- Run $A(G,q,g,h)$ to get $x$
- Experiment evaluates to $1$ if $g^x=h$
The discrete-logarithm problem is hard relative to $g$ if for all PPT algorithms $A$ :
$Pr[Dlog_{A,g}(n)] \le negl(n)$

Diffie-Hellman problems :

Fix group $G$ with generator $g$
Define $DH_g(h_1,h_2)=DH(g^x,g^y)=g^{xy}$
Computational Difie-Hellman (CDH) problem :
- Given $g$, $h_1$, $h_2$, compute $DH_g(h_1,h_2)$
Decisional Difie-Hellman (DDH) problem :
- Given $g$, $h_1$, $h_2$, distinguish the correct $DH_g(h_1,h_2)$ from a uniform element of $G$

DDH problem :

Let $g$ be a group-generation algorithm
- On input $1^n$, outputs a cyclic group $G$, its order $q$ (with $||q||=n$), and a generator $g$
The DDH problem is hard relative to $g$ if for all PPT algorithms $A$ :
$| Pr[A(G,q,g,g^x,g^y,g^z)=1]- Pr[A(G,q,g,g^x,g^y,g^{xy})=1] | \le ε(n)$

Relating the Diffie-Hellman problems (Relative to $g$ ):

If the discrete-logarithm problem is easy, so is the CDH problem
- I.e., the CDH assumption is stronger than the dlog assumption
If the CDH problem is easy, so is the DDH problem
- I.e., the DDH assumption is stronger than the CDH assumption

Group selection :

For cryptographic applications, best to use prime-order groups
- The dlog problem is "easier" if the order of the group has small prime factors
Prime-order subgroup of $Z_p^*$, $p$ prime
- E.g., $p=tq+1$ for $q$ prime
- Take the subgroup of $t^{th}$ powers, i.e., $G={ x^t~mod~p~|~x∈Z_p^* }$
- This is a group
- It has order $(p-1)/t=q$
- Since $q$ is prime, the group is cyclic
- More generally, prime-order subgroup of the mult. group of a finite field (of large characteristic)
  在一个大特征的有限域中，我们可以找到一个乘法群，这个乘法群中存在一个元素数量很少（具体来说，是一个素数）的子群。

Elliptic curve :

Prime-order subgroup of an elliptic curve group (椭圆曲线群)
- Details omitted ...
For our puposes, we will usually describe algorithms in "abstract" groups
- So can ignore details of the underlying group
- So can instantiate with any (appropriate) group

5.5 Choice of parameters

Review :

We have discussed two classes of cryptographic assumptions
- Factoring-based (factoring, RSA assumptions)
- Dlog-based (dlog, CDH, and DDH assumptions)
- In two classes of groups
All these problems are believed to be "hard", i.e., to have no poly-time algorithms
- But, how hard are they?

Security :

For symmetric-key algorithms (对称密钥加密算法)
- Block cipher with $n$-bit key ≈ security against $2^n$-time attacks
- Hash function with $n$-bit output ≈ security against $2^{n/2}$-time attacks
Does factoring a modulus of length $n$ take $2^n$ time?
Does computing discrete logs in a group with $2^n$ elements take $2^n$ time?

Disclaimer :

The goal here is just to give an idea as to how parameters are calculated
The goal of this lecture is not to actually give you parameters that you can then use in practice
- In practice, other important considerations come into play

Algorithms for factoring :

There exist factoring algorithms that run in much less than $2^{||N||}$ time
Best known algorithm (asymptotically): general number field sieve
- Running time (heuristic): $2^{O(||N||^{1/3} log||N||^{2/3} )}$

Algorithms for dlog :

Two classes of algorithms :
- Ones that work for arbitrary ("generic") groups
- Ones that target specific groups
Best "generic" algorithm :
- Time $2^{n/2}$ in a group of order ≈$2^n$
- This is known to be optimal
Best known algorithm (asymptotically) for (subgroups of) $Z_p^$: number field sieve*
第三十七个知识点: The Number Field Sieve - WangZhuo2000 - 博客园 (cnblogs.com)
- Running time (heuristic): $2^{O(||p||^{1/3} log||p||^{2/3} )}$
For (appropriately chosen) elliptic-curve groups, nothing better than generic algorithms is known
- So this means that in some sense, elliptic curve groups are currently optimal with respect to what we could hope for as far as security is concerned.

Choosing parameters :

As recommended by NIST (112-bit security) :
- Factoring: 2048-bit modulus (被用于分解的大数为2048位)
- Dlog, order-$q$ subgroup of $Z_p^*$: $||q||=224,~||p||=2048$
- Dlog, elliptic-curve group of order $q$: $||q||=224$
Much longer than for symmetric-key algorithms :
- Explains in part why public-key crypto is less efficient than symmetric-key crypto

Module 6

6.1 Why we need Public-Key

Private-key cryptography :

Private-key cryptography allows two users who share a secret key to establish a "secure channel"
The need to share a secret key incurs several drawbacks

The key-distribution problem :

How do users share a key in the first place?
- Need to share the key using a secure channel
This problem can be solved in some settings
- E.g., physical proximity, trusted courier (物理上接近，信任的信使)
- (Note: this does not make private-key cryptography useless)
But not others (or at least not cheaply)

The key-management problem :

Imagine an organization with $N$ employees, where each pair of employees might need to communicate securely
Solution using private-key cryptography :
- Each user shares a key with all other users
- Each user must store/manage $N-1$ secret keys
- $O(N^2)$ keys overall !

Lack of support for "open systems"

Say two users who have no prior relationship want to communicate securely
- When would they ever have shared a key?
This is not at all far-fetched
- Customer sending credit-card data to merchant
- Sending an email to a colleague
"Classical" cryptography offers no solution to these problems

New directions (Key ideas):

Some problems exhibit asymmetry - easy to compute, but hard to invert (think factoring) (asymmetry-不对称)
Use this asymmetry to enable two parties to agree on a shared secret key using public discussion
- Key exchange

Formally :

Fix a key-exchange protocol $\Pi$ and an attacker (passive eavesdropper) $A$
Define the following experiment $KE_{A,\Pi}(n)$ :
- Honest parties run $\Pi$ using security parameter $n$, resulting in a transcript trans and (shared) key $k$ (trans-协议的交互记录)
- Choose uniform bit $b$. If $b=0$, then set $k'=k$;
  If $b=1$, then choose uniform $k'∈{0,1}^n$
- Give trans and $k'$ to $A$, which outputs a bit $b'$
- Exp't evaluates to $1$ ($A$ succeeds) if $b'=b$
Key-exchange protocol $\Pi$ is secure (against passive eavesdropping) if for all probabilistic, poly-time $A$ its holds that :

$$
Pr[KE_{A,\Pi}(n)=1] \le \frac12 + negl(n)
$$

Notes :

Being unable to compute the key given the transcript is a weak guarantee
Indistinguishability of the shared key from uniform is a much stronger guarantee
- and is what is needed if the shared key will subsequently be used for private-key crypto

Modern key-exchange protocols

Security against passive eavesdroppers is insufficient
Want authenticated key exchange, i.e., parties know each others' identities (and these cannot be spoofed)
Modern key-exchange protocols provide this, assuming some setup
- We will return to this at the end of the course

6.2 Diffie-Hellman key exchange

Recall (informal) :

Group-generation algorithm $g$ outputs cyclic group $G$ of prime order $q$ with generator $g$
- $|q|=n~bits$
Decisional Diffie-Hellman (DDH) problem :
- Given $g^x$, $g^y$, distinguish $g^{xy}$ from a uniform group element
Hardness of DDH implies hardness of the discrete-logarithm problem
- This alone is not enough for key exchange

Key exchange :

Security :

Eavesdropper sees transcript $G$, $q$, $g$, $g^x$, $g^y$
Shared key $k$ is $g^{xy}$
Computing $k$ from the transcript is exactly the computational Diffie-Hellman problem
Distinguishing $k$ from a uniform group element is exactly the decisional Diffie-Hellman problem
- If the DDH problem is hard relative to $g$, this is a secure key-exchange protocol

A subtlety :

We wanted our key-exchange protocol to give us a uniform(-looking) key $k∈{0,1}^n$
Instead we have a uniform(-looking) group element $k∈G$
- Not clear how to use this as, e.g., an AES key
Solution: key derivation (即通过某种方法将 $k$ 转换成适合AES等加密算法使用的密钥)
- Set $k'=H(k)$ for suitable hash function $H$
- Exact requirements on $H$ omitted here

6.3 The Public-Key Distribution

The public-key setting :

A party generates a pair of keys: a public key $pk$ and a private key $sk$
- Public key is widely disseminated (disseminated-传播)
- Private key is kept secret, and shared with no one
Private key used by this party; public key used by everyone else
Also called asymmetric cryptography

Two methods of public-key distribution :

Public-key distribution :

Previous figures (implicitly) assume parties are able to obtain correct copies of each others' public keys
- I.e., the attacker is passive, at least during key distribution
We will revisit this assumption later

Primitives :

Setting\ Target	Private-key setting	Public-key setting
Secrecy	Private-key encryption	Public-key encryption
Integrity	Message authentication codes	Digital signature schemes

Drawbacks of private-key crypto :

Key distribution
- Public keys can be distributed over public (but authenticated) channels
Key management in large systems of $N$ users
- Each user stores $1$ private key and $N-1$ public keys; only $N$ keys overall
- Public keys could be stored in a central directory
Applicability in "open systems"
- Even parties who have no prior relationship can find each others' public keys

Why study private-key crypto :

(Public key encryption is so good, why do we need to work on private key encryption)

Private-key cryptography is more suitable for certain applications
- E.g., disk encryption
Public-key crypto is roughly 2-3 orders of magnitude slower than private-key crypto
- If private-key crypto is an option, use it
- Indeed, private-key crypto used for efficiency even in public-key setting

6.4 Public-key Encryption

Public-key encryption :

A public-key encryption scheme is composed of three PPT algorithms :
- $Gen$ : key-generation algorithm that on input $1^n$ outputs $pk,sk$
- $Enc$ : encryption algorithm that on input $pk$ and a message $m$ outputs a ciphertext $c$
- $Dec$ : decryption algorithm that on input $sk$ and a ciphertext $c$ outputs a message $m$ or an error $\perp$
- For all $m$ and $pk,sk$ outputs by $Gen$ :
```
               $Dec_{sk}(Enc_{pk}(m))=m$
```

CPA-security :

Fix a public-key encryption scheme $\Pi$ and an adversary $A$
Define experiment $PubK \text{-} CPA_{A,\Pi}(n)$ :
- Run $Gen(1^n)$ to get keys $pk,sk$
- Given $pk$ to $A$, who outputs ${m_0,m_1}$ of same length
- Choose uniform $b∈{0,1}$ and compute the ciphertext $c \leftarrow Enc_{pk}(m_b)$; give $c$ to $A$
- $A$ outputs a guess $b'$, and the experiment evaluates to $1$ if $b'=b$
Public-key encryption scheme $\Pi$ is CPA-secure if for all PPT adversaries $A$ :

$$
Pr[PubK \text{-} CPA_{A,\Pi}(n)=1] \le \frac12 + negl(n)
$$

Notes on the definition :

No ecryption oracle?
- Encryption oracle redundant in public-key setting (everyone can get $pk$ and encryption algorithm)
No perfectly secret public-key encryption
- No deterministic public-key encryption scheme can be CPA-secure
- CPA-security implies security for encrypting multiple messages as in the private-key case

Chosen-ciphertext attacks :

Chosen-ciphertext attacks are arguably even a greater concern in the public-key setting
- Attacker might be a legitimate sender
- so Easier for attacker to obtain full decryptions of ciphertexts of its choice
Related concern: malleability
- I.e., given a ciphertext $c$ that is the encryption of an unknown message $m$, might be possible to produce ciphertext $c'$ that decrypts to a related message $m'$
- This is also undesirable in the public-key setting
Can define CCA-security for public-key encryption by analogy to the definition for private-key encryption (to skip this here for simplicity)

Hybrid encryption :

The functionality of public-key encryption at the (asymptotic) efficiency of private-key encryption
Use public-key encryption scheme encrypt the $k$ of private-key encryption scheme

And use the private-key encryption scheme to encrypt our bulk data $m$

Security of hybrid encryption :

Let $\Pi$ be the public-key component, and $\Pi'$ the private-key component

Let $\Pi_{hy}$ denote the combination
If $\Pi$ is a CPA-secure public-key scheme, and $\Pi'$ is a CPA-secure private-key scheme, then $\Pi_{hy}$ is a CPA-secure public-key scheme
- Similarly for CCA-security

6.5 El Gamal encryption

Recall Diffie-Hellman key exchange scheme :

‍

稍微修改一下，就是一个加密方案 (El Gamal encryption)：

$h_1^y$ 是 Diffie-Hellman 中生成的密钥。
$h_1^y ·m$ 是用生成的密钥进行 one-time pad 加密
$Gen(1^n)$
- Run $g(1^n)$ to obtain $G,q,g$, Choose uniform $x∈Z_q$. The public key is $(G,q,g,g^x)$ and the private key is $x$
$Enc_{pk}(m)$, where $pk=(G,q,g,h)$ and $m∈G$
- Choose uniform $y∈Z_q$. The ciphertext is $g^y,h^y·m$
$Dec_{sk}(c_1,c_2)$, where $c_1=g^y,c_2=h^y·m$
- Output $c_2 / c_1^x$

In practice :

Parameters $G,q,g$ are standardized and shared. And instead, key generation would simply consist of choosing a uniform exponent $x$ and count $g^x$
Inconvenient to treat messages as group element
- Use key derivation to derive a key $k$ instead, and use that key to encrypt the message
- I.e., ciphertext is $(g^y,Enc_k'(m))$, where $k=H(h^y)$
- Essentially hybrid encryption

CPA Secure?

If the DDH assumption is hard for $g$, then the El Gamal encryption scheme is CPA-secure
- Follows from security of Diffie-Hellman key exchange, or can be proved directly
Discrete-logarithm assumption alone is not enough for public-key encryption

How About Chosen-ciphertext attacks?

El Gamal encryption is not secure against chosen-ciphertext attacks
- Follows from the fact that it is malleable
Given ciphertext $c_1,c_2$, transform it to obtain the ciphertext $c_1,c_2'=c_1, \alpha ·c_2$ for arbitrary $\alpha$
- Since $c_1,c_2 = g^y,h^y·m$
We have $c_1,c_2' = g^y,h^y·(\alpha m)$
- I.e., encryption of $m$ becomes an encryption of $\alpha m$
e.g., in the online auction event, the second bidder was able not only to always out bid the first bidder but to do it always do by bidding exactly twice as much.

Achieve Chosen-ciphertext security :

Use key derivation coupled with CCA-secure private-key encryption scheme
- I.e., ciphertext is $g^y,Enc_k'(m)$
where $k=H(h^y)$ and $Enc'$ is a CCA-secure scheme
Can be proved CCA-secure under appropriate assumptions, if $H$ is modeled as a random oracle
DHIES/ECIES

6.6 RSA based scheme

Recall (informal) :

Choose random, equal-length primes $p,q$
Compute modulus $N=pq$
Choose $e,d$ such that $e·d=1~mod~ \Phi(N)$

The $e^{th}$ root of $x$ modulo $N$ is $[x^d~mod~N]$

              $(x^d)^e=x^{de}=x^{[ed~mod~\Phi(N)]}=x~mod~N$

RSA assumption: given $N,e$ onlym it is hard to compute the $e^{th}$ root of a uniform $c∈Z_N^*$

"Plain" PSA encryption :

"Plain" PSA encryption Not security :

This scheme is deterministic
- Cannot be CPA-secure
RSA assumption only refers to hardness of computing the $e^{th}$ root of uniform $c$
- $c$ is not uniform unless $m$ is uniform (Clearly $m$ is not uniform)
RSA assumption only refers to hardness of computing the $e^{th}$ root of $c$ in its entirety
- Partial information about the $e^{th}$ root may be leaked (In fact, this is the case)

Conclude! Plain RSA should never be used!

To improve this scheme?

PKCS #1 v1.5 :

Standard issued by RSA labs in 1993
Idea: add random padding
- To encrypt $m$, choose random $r$
- $c=[(r|m)^e~mod~N]$
Issues :
- No proof of CPA-security (unless $m$ is very short)
- Chosen-plaintext attacks known if $r$ is too short
- Chosen-ciphertext attacks known

PKCS #1 v2.0 :

Optimal asymmetric encryption padding (OAEP) applied to message first
This padding introduces redundancy, so that not every $c∈Z_N^*$ is a valid ciphertext (by 0...00)
- Need to check for proper format upon decryption
- Return error if not properly formatted
1. $r$是一个随机的$k_0$长的比特串
2. 消息后面填充$k_1$个$0$，长度变为$n-k_0$比特
3. $G$函数将$k_0$比特的$r$，映射成$n-k_0$比特长
4. $s=(m||0···0) \oplus G(r)$
5. $H$函数将$n-k_0$比特的$S$，映射成$k_0$比特长
6. $t=r \oplus H(s)$
7. 将$(s||t)$进行RSA加密

Security?

RSA-OARP can be proven CCA-secure under the RSA assumption, if $G$ and $H$ are modeled as random oracles
This scheme widely used in practice

Module 7

7.1 Digital signatures compare to MAC

Digital signatures :

Provide integrity in the public-key setting
Analogous to message authentication codes(MAC), but some key differences

Security target (informal):

Even after observing signatures on multiple messages, an attacker should be unable to forge a valid signature on a new message (same to the MAC)

Compare to MACs :

以发布补丁为例
signature:
MAC-1：当发布者和所有下载者共享一个密钥时，下载者可以伪造新的MAC签名
MAC-2：当发布者和不同下载者之间有不同密钥时，发布者负担过重

Comparison to MACs :

Public verifiability
- "Anyone" can verify a signature
- (Only a holder of the key can verify a MAC tag)
Transferability
- Can forward a signature to someone else
Non-repudiation (不可否认性)
- Signer cannot (easily) deny issuing a signature
- Crucial for legal applications.
- Judge can verify signature using public copy of pk
- MACs cannot provide this functionality
- Without access to the key, no way to verify a tag
- Even if receiver leaks key to judge, how can the judge verify that the key is correct (??)
- Even if key is correct, receiver could have generated the tag also

7.2 Formal definitions of security for digital signature

Signature schemes :

A signature scheme is defined by three PPT algorithms $(Gen, Sign,Vrfy)$ :
- $Gen$: takes as input $1^n$; outputs $pk,sk$
- $Sign$: takes as input a private key $sk$ and a message $m ∈ {0,1}^*$; outputs signature $\sigma$, $\sigma \leftarrow Sign_{sk}(m)$
- $Vrfy$: takes public key $pk$, message $m$, and signature $\sigma$; outputs $1$ or $0$. $1$ indicates acceptance or validity, and a $0$ indicates rejection or invalidity.
For all $m$ and all $pk,sk$ outputs by $Gen$, $Vrfy{pk}(m,Sign{sk}(m))=1$

Security?

Threat model
- "Adaptive chosen-message attack"
- Assume the attacker can induce the sender to sign messages of the attacker's choice
Security goal
- "Existential unforgeability"
- Attacker should be unable to forge valid signature on any message not signed by the sender
Attacker gets the public key (of course)

Formal definition of security :

Fix $A, \Pi$
Define randomized experiment $Forge_{A, \Pi}(n)$ :
1. $pk,sk \leftarrow Gen(1^n)$
2. $A$ given $pk$, and interacts with oracle $Sign_{sk}(\cdot)$; let $M$ be the set of messages sent to this oracle
3. $A$ outputs $(m, \sigma)$
4. $A$ succeeds, and the experiment evaluates to $1$, if $Vrfy_{pk}(m, \sigma)=1$ and $m ∉M$

$\Pi$ is secure if for all PPT attackers $A$, there is a negligible function $ε$ such that :

                                  $Pr[Forge_{A,\Pi}(n)=1]  \leq ε(n)$

Replay attacks need to be addressed just as in the symmetric-key setting

Hash-and-sigh paradigm scheme :

Given
- A signature scheme $\Pi = (Gen,Sign,Vrfy)$ for "short" messages of length $n$
- Hash function $H:{0,1}^* \rightarrow {0,1}^n$
Construct a signature scheme $\Pi' = (Gen,Sign',Vrfy')$ for arbitrary-length messages :
- $Sign{sk}'(m) = Sign{sk}(H(m))$
- $Vrfy{pk}'(m, \sigma) = Vrfy{pk}(H(m), \sigma)$
Theorem: if $\Pi$ is secure and $H$ is collision-resistant, then $\Pi'$ is secure
Proof: Say the sender authenticates $m_1,m_2,...$
- Let $h_i = H(m_i)$
- Attacker outputs forgery $(m, \sigma)$, $m \ne m_i$ for all $i$ . Two cases :
- $H(m) = h_i$ for some $i$: means Collision in H (negligible)
- $H(m) \ne h_i$ for all $i$: means forgery in the underlying signature scheme (negligible)

Hash-and-sigh paradigm :

Analogous to hybrid encryption
- The functionality of digital signatures at the asymptotic cost of a symmetric-key operation
Used extensively in practice

7.3 RSA-based signature

Recall... (informal) :

Choose random, equal-length primes $p,q$
Compute modulus $N=pq$
Choose $e,d$ such that $e \cdot d=1~mod~ \Phi(N)$

The $e^{th}$ root of $m$ modulo $N$ is $[m^d~mod~N]$

               $(m^d)^e=m^{de}=m^{[ed~mod~\Phi(N)]}=m~mod~N$

RSA assumption: given $N,e$ only, hard to compute the $e^{th}$ root of a uniform $m∈Z_N^*$

"Plain" RSA signatures :
$\sigma$ 表示签名：$\sigma = [m^d~mod~N]$, $d$ 是私钥

Security intuition :

Signature of $m$ is the $e^{th}$ root of $m$ - supposedly hard to compute
In fact, this intuition is really completely wrong in several respects.

Attack :

Can sign specific messages
- E.g., easy to compute the $e^{th}$ root of $m=1$
Can sign "random" messages
- Choose arbitrary $\sigma$; set $m=[\sigma^e~mod~N]$, 并假装 $\sigma$ 是 $m$ 的签名
- $Vrfy:m \stackrel{?}{=} [\sigma^e~mod~N]$，显然相等(compute $[\sigma^e~mod~N]$)
Can combine two signatures to obtain a third
- Say $\sigma_1,\sigma_2$ are valid signatures on $m_1,m_2$ with resepct to public key $N,e$
- Then $\sigma'= [\sigma_1,\sigma_2 ~mod~N]$ is a valid signature on the message $m'= [m_1,m_2 ~mod~N]$
```
                           $(\sigma_1 \cdot \sigma_2)^e=\sigma_1^e \cdot \sigma_2^e=m_1 \cdot m_2~mod~N$
```

new scheme RSA-FDH : Apply a "cryptographic transformation" to messages before signing

Public key: $(N,e)$ private key: $d$
$Sign_{sk}(m)=H(m)^d~mod~N$
$Vrfy_{pk}(m, \sigma)$: output $1$ if $\sigma^e = H(m)~mod~N$
(This also handles long messages)

Security intuition?

Look at the three previous attacks :
- Not easy to compute the $e^{th}$ root of $H(1)$
- Choose $\sigma$, but cannot find a $m$ such that $H(m) = \sigma^e~mod~N$ (computing inverses of $H$ should be hard)
- $H(m_1) \cdot H(m_2) = \sigma_1^e \cdot \sigma_2^e = (\sigma_1 \cdot \sigma_2)^e \ne H(m_1 \cdot m_2)$

Security of RSA-FDH :

If the RSA assumption holds, and $H$ is modeled as a random function mapping onto $Z_N^*$, then RSA-FDH is secure
- Proof omitted
In practice, $H$ is instantiated with a (modified) cryptographic hash function
- Must ensure that the range of $H$ is large enough

RSA-FDH in practice :

The RSA PKCS #1 v2.1 standard includes a signature scheme that can be viewed as inspired by RSA_FDH
- Essentially RSA-FDH is signer uses no randomness

7.4 Identification schemes

Identification schemes :

Theoretically interesting in their own right
- Even though limited applicability on their own
Extremely important as a building block for digital signatures
Will be somewhat informal in our discussion
The goal of an identification scheme is to allow the prover to convince the verifier that the prover is who he claims he is.

Security :

Security against passive attacks
- Even after an attacker eavedrops on multiple, honest executions, the attacker cannot impersonate the honest prover
- I.e., cannot fool the verifier into accepting
For technical reasons, we assume non-degenerate schemes
- I.e., the first message A has negligible probability of repeating

Prototypical application :

In-person identification
- E.g., door access
Not suitable for remote authentication
- Noting binds the execution of the ID protocol to any subsequent communication (c is transmitted in plaintext)

From identification to signatures :

To sign, the prover runs the scheme "by itself", generating the challenge using a hash function

The Fait-Shamir transform

Security :

Theorem: If the identification scheme is secure against passive attacks, and $H$ is modeled as a random function, then the derived signature scheme is secure

Recall discrete logarithm assumption (informal) :

Group-generation algorithm $g$ outputs cyclic group $G$ of prime order $q$ with generator $g$
- $|q|=n~bits$
Discrete logarithm assumption :
- Given uniform $h∈G$, hard to compute $log_g~h$

Schnorr identification scheme :

正确性：$g^sh^{-c}=g^{cx+r}h^{-c}=(g^x)^cg^rh^{-c}=g^r=A$

Security? (??)

$g^s h^{-c} = g^{s'} h^{-c'}$

$\Rightarrow g^{s-s'} = h^{c-c'}$

$\Rightarrow log_gh = [(s-s')/(c-c') ~mod~q]$

Security theorem&corollary :

Theorem: If the discrete logarithm assumption holds, the Schnorr identification scheme is secure against passive attacks
Corollary: If the discrete logarithm assumption holds, and $H$ is modeled as a random function, the Schnorr signature scheme is secure

DSS (standard):

NIST standard for digital signatures
- DSA, based on discrete-logarithm problem in subgroup of $Z_p^*$
- ECDSA, based on elliptic-curve groups
Can be viewed as being derived from an identification scheme using an unproven variant of the Fiat-Shamir transform

7.5 Public-key infrastructure (PKI)

Public-key distribution :

Use signatures for secure key distribution :

Assume a trusted party with a public key known to everyone
- CA = certificate authority
- Public key $pk_{CA}$
- Private key $sk_{CA}$
Alice asks the CA to sign the binding $(Alice,pk)$
- $cert{CA \rightarrow Alice} = Sign{sk_{CA}}(ALice,pk)$
(CA must verify Alice's identity out of band)
Bob obtains Alice, pk, and the certificate $cert_{CA \rightarrow Alice}$ ...
- verifies that $Vrfy{PK{CA}} ((Alice,pk),~cert_{CA \rightarrow Alice} ) =1$
Bob is then assured that pk is Alice's public key
- As long as the CA is trustworthy (CA honest and properly verifies Alice's identity)
- and the CA's private key has not been compromised

Is there a chicken-and-egg problem?

How does Bob get $pk_{CA}$ in the first place?
Several possibilities :
1. "Roots of trust"
  - Bob only need to securely obtain a small number of CA's public keys
  - Need to ensure secure distribution only for these few, initial public keys rather than having to worry about secure distribution of millions of public keys.
  - E.g., distribute as part of an operating system, or web browser
  - Firefox, or Windows
2. "Web of trust"
  - Obtain public keys from my friends in person
  - "Key-signing parties"
  - Obtain "certificates" on my public key from my friends
  - If A knows $pk_B$, and B issued a certificate for C, then C can send this cerificate to A
3. Public-key repository
  - Store certificates in a central repository
  - E.g., MIT PGP keyserver
  - To find Alice's public key
  - Get all public keys for "Alice", and certificates on those keys
  - Look for a certificate signed by someone you trust whose public key you already have

PKI in practice: does not work quite as well as in theory

Proliferation of root CAs (There are lots and lots of CAs that are trusted in your basic browser software. And if any of those are ever compromised, it would have disastrous con consequences for public key distribution.)
Revocation (Users want to revoke old public-key after private-key compromise)
Other issues

7.6 SSL/TLS

Objectives of this subsection :

Goals
- Understand (at a high level) a real-world crypto protocol
- Pull together everything learned in this course
Not goals
- Understanding low-level details or implementation
- Defining or proving security

SSL/TLS :

Consider this scenario: How can you securely send your credit card to Coursera?
SSL/TLS
- Secure Socket Layre (Netscape, mid-90s)
- Transport Layer Security
- TLS 1.0 (1999)
- TLS 1.2 (2008, current scheme)
- TLS 1.3 (draft)
- Used by every web browser for https connections
Includes Two phases :
- Handshake protocol (Establish a shared key between two entities)
- Record-layer protocol (Use the shared key for secure communication)

Module 1

1.1 以凯撒密码引入

1.2 Hex and ASCII

1.3 现代密码学原理的三个核心

1.4 完全保密(Perfect Secrecy)

1.5 实现 One-Time Pad

Module 2

2.1 One-Time Pad的缺陷

2.2 计算保密(Computational secrecy)

2.3 伪随机性(Pseudorandomness)

2.4 伪随机生成器(PRGs)

2.5 “Pseudo” one-time pad 的计算保密性证明

Module 3

3.1 CPA-security

3.2 PRFs (Pseudorandom functions，伪随机函数)

3.3 一个 CPA-secure encryption 的计算保密性证明

3.4 Modes of encryption

3.5 CCA-secure

3.6 Padding-Oracle Attacks

Module 4

4.1 Message Integrity（消息完整性）

4.2 A fixed-length MAC

4.3 CBC-MAC

4.4 Hash functions

4.5 HMAC

4.6 Authenticated Encryption

4.7 Secure Communication Sessions

Module 5

5.1 Algorithmic number theory

5.2 Groups

5.3 Hard problems & RSA

5.4 Discrete-logarithm problem(离散对数问题)

5.5 Choice of parameters

Module 6

6.1 Why we need Public-Key

6.2 Diffie-Hellman key exchange

6.3 The Public-Key Distribution

6.4 Public-key Encryption

6.5 El Gamal encryption

6.6 RSA based scheme

Module 7

7.1 Digital signatures compare to MAC

7.2 Formal definitions of security for digital signature

7.3 RSA-based signature

7.4 Identification schemes

7.5 Public-key infrastructure (PKI)

7.6 SSL/TLS

发送评论 编辑评论

发送评论编辑评论