Cryptanalysis — Complete Breaking Guide

⚔️

Types of Cryptanalytic Attacks

Classified by how much information the attacker has

🔍

Ciphertext-Only Attack
Attacker has only ciphertext. Hardest attack. Relies on statistics and guessing.
Hardest

📋

Known-Plaintext Attack
Attacker has plaintext + matching ciphertext pairs. Used to deduce the key.
Medium

✏️

Chosen-Plaintext Attack
Attacker picks the plaintext to be encrypted. More powerful than known-plaintext.
Easier

🔨

Brute-Force Attack
Try every possible key until plaintext is found. Impractical for large key spaces.
Always possible

✦ Practical Example — Brute-Force on Caesar Cipher

Caesar has only 26 possible keys. We simply try all of them.

Ciphertext KHOOR ZRUOG

Key k	Decrypted Result	Valid English?
1	JGNNQ YQTNF	✗
2	IFMMP XPSME	✗
3	HELLO WORLD	✓ Found!
4	GDKKN VNQKC	✗

Recovered Plaintext

HELLO WORLD | Key = 3

🔀

Breaking Transposition Ciphers

Columnar · Keyword Columnar · Double Transposition · Scytale

💡

Key insight: In a transposition cipher, the same letters are present as in the plaintext — only their positions change. Therefore the letter frequency distribution is identical to the original plaintext. This is how we detect transposition.

1. Breaking Simple Columnar Transposition

The ciphertext is formed by reading down each column of a rectangular grid. To break it, we test each divisor of the ciphertext length as a possible number of columns.

Ciphertext SHGEEHELTTIX (12 letters)

Possible column counts: divisors of 12 → 2, 3, 4, 6

1

Try 4 columns (12 ÷ 4 = 3 rows). Fill ciphertext down each column:

Try: 4 columns

C0

C1

C2

C3

R0

S

E

T

R1

H

E

L

I

R2

G

H

T

X

→

READ ROWS:

          S-E-E-T
H-E-L-I
G-H-T-X
        

2

Reading rows gives: SEETHELIGHT ✓ — meaningful English!

3

Key = 4 columns. The trailing X was padding to complete the grid.

Recovered Plaintext

SEETHELIGHT | Key = 4 columns

2. Breaking Keyword Columnar Transposition

Ciphertext (45 letters) VOESA IVENE MRTNL EANGE WTNIM HTMEE ADLTR NISHO DWOEH

1

45 letters → possible grid dimensions: 9×5, 5×9, 15×3, 3×15

2

Try a 9 rows × 5 columns grid. Fill ciphertext column by column:

Col 0	Col 1	Col 2	Col 3	Col 4
V	E	G	M	I
O	M	E	E	S
E	R	W	E	H
S	T	T	A	O
A	N	N	D	D
I	L	I	L	W
V	E	M	T	O
E	A	H	R	E
N	N	T	N	H

3

Permute columns until row 0 spells a recognizable word.
Try order 2, 4, 0, 1, 3: Row 0 → G · I · V · E · M → starts with GIVE ✓

4

Column order 24013 is the recovered key. Read rows to get plaintext.

Recovered Plaintext

GIVE ME SOMEWHERE TO STAND AND I WILL MOVE THE EARTH | Key = 24013

3. Breaking Double Transposition — Divide & Conquer

⚡

Strategy: Instead of guessing the full key at once, split into two stages.
Step A — undo the column permutation (look for partial words in columns).
Step B — undo the row permutation (now trivial since columns are fixed).

Ciphertext NADWTKCAATAT

1

Place in a 3×4 matrix (read across rows):

Ciphertext in 3×4 grid

N

A

D

W

T

K

C

A

T

A

T

→ col perm (3,1,0,2)

After undoing columns

W

A

N

D

A

K

T

C

T

A

→ row perm (2,1,0)

Final plaintext grid

A

T

A

C

K

A

T

D

A

W

N

Recovered Plaintext

ATTACKATDAWN | Column perm: (3,1,0,2) · Row perm: (2,1,0)

👑

Breaking the Caesar Cipher

Only 26 possible keys — brute-force or frequency analysis

📐

Encryption: c = (p + k) mod 26
Decryption: p = (c − k) mod 26
where A=0, B=1, … Z=25

🎯

Breaking strategy:
1. Count letter frequencies in ciphertext
2. Most frequent letter ≈ E (or T, A, O…)
3. Compute k, verify with brute-force if needed

✦ Worked Example

Ciphertext WKH TXLFN EURZQ IRA

1

Try k = 3: W→T, K→H, H→E → "THE" ✓ — common English word!

2

Apply p = (c − 3) mod 26 to every letter:

Cipher letter	Numeric value	−3 mod 26	Plaintext
W	22	22 − 3 = 19	T
K	10	10 − 3 = 7	H
H	7	7 − 3 = 4	E
T	19	19 − 3 = 16	Q
X	23	23 − 3 = 20	U
L	11	11 − 3 = 8	I
F	5	5 − 3 = 2	C
N	13	13 − 3 = 10	K

Recovered Plaintext

THE QUICK BROWN FOX | Key k = 3

🧮

Breaking the Affine Cipher

e(x) = ax + b (mod 26) · only 312 valid key pairs

🔑

Encryption: c = ax + b (mod 26)
Decryption: p = a⁻¹(y − b) (mod 26)
Constraint: gcd(a, 26) = 1

🎯

Breaking strategy:
1. Find the 2 most frequent ciphertext letters
2. Assume they map to E and T (or E and A, etc.)
3. Solve 2 equations in 2 unknowns (a and b)
4. Verify gcd(a,26)=1, then decrypt

✦ Worked Example (from the lecture)

Ciphertext (57 characters) FMXVEDKAPHFERBNDKRXRSREFMORUDSDKDVSHVUFEDKAPR KDLYEVLRHHRH

Step 1 — Frequency count

R

8

most

D

7

2nd

E

5

3rd

H

5

3rd

K

5

3rd

F

4

S

4

V

4

Step 2 — Form hypotheses

Most frequent ciphertext letter R (value 17) → assume it encrypts e (value 4)
Second most frequent D (value 3) → assume it encrypts t (value 19)

e_K(4) = 17 → 4a + b = 17
e_K(19) = 3 → 19a + b = 3

Hypothesis	R maps to	2nd maps to	Equations	a	b	gcd(a,26)	Valid?
#1	e (4)	D→t (19)	4a+b=17, 19a+b=3	6	−7	2	✗ illegal
#2	e (4)	E→t (19)	4a+b=17, 4a+b=4	13	—	13	✗ illegal
#3	e (4)	H→t (19)	4a+b=17, 7a+b=19	8	—	2	✗ illegal
#4	e (4)	K→t (10)	4a+b=17, 10a+b=19	3	5	1	✓ Legal!

Step 3 — Compute decryption function

Need: a⁻¹ mod 26 where a = 3
3 × 9 = 27 ≡ 1 (mod 26) → a⁻¹ = 9

d_K(y) = 9(y − 5) mod 26 = 9y − 45 mod 26 = 9y − 19 (mod 26)

Step 4 — Decrypt a few letters to verify

Cipher letter	y value	9y − 19 mod 26	Plaintext
F	5	45 − 19 = 26 ≡ 0	a
M	12	108 − 19 = 89 ≡ 11	l
X	23	207 − 19 = 188 ≡ 6	g
V	21	189 − 19 = 170 ≡ 14	o

Recovered Plaintext | Key: a=3, b=5

ALGORITHMS ARE QUITE GENERAL DEFINITION SO FAR ARITHMETIC PROCESSES

🗝️

Breaking the Vigenère Cipher

Kasiski Test + Index of Coincidence → find key length → break each shift separately

🎯

Master strategy — three phases:
Phase 1 — Determine the key length (Kasiski or IC method)
Phase 2 — Split ciphertext into groups of size = key length
Phase 3 — Solve each group independently as a Caesar cipher

Phase 1A — Kasiski Test

If the same plaintext fragment happens to align with the same portion of the keyword more than once, it will produce identical ciphertext. The spacing between these repetitions is a multiple of the key length.

Example ciphertext — encrypted with keyword POETRY (length 6) IVIVYGARMLMYIVIKFDIVIFRL

1

Scan for repeated groups: IVI appears at positions 0, 12, 18

2

Compute gaps: 12 − 0 = 12, 18 − 12 = 6

3

GCD(12, 6) = 6 → key length is likely 6 (or a divisor: 2, 3)

4

In this case key length = 6 exactly ✓

Phase 1B → 3 — Complete Break (3-letter keyword)

Ciphertext (key length = 3, determined by Kasiski/IC) RLWRV MRLAQ EDUEQ QWGKI LFMFE XZYXA QXGJH FMXKM QWRLA LKLFE LGWCL SOLMX RLWPI OCVWL SKNIS IMFES JUVAR MFEXZ CVWUS MJHTC RGRVM RLSZS MREFW XZGRY RLWPI OMYDB SFJCT CAZYX AQ

1

Split into 3 sets by position modulo 3:
S₀ = letters at positions 0,3,6,9,… | S₁ = 1,4,7,10,… | S₂ = 2,5,8,11,…

S₀ — most frequent

Letter	Count
R	10
Q	4
M	3

S₁ — most frequent

Letter	Count
X	7
L	6
V	5

S₂ — most frequent

Letter	Count
W	6
M	4
A	5

2

For S₀: R (17) is most frequent → assume R = E (4):
k₀ = 17 − 4 = 13. Candidate k₀ values: {13, 24, 4, 3, 0, 9, 17, 25} (matching E,T,N,O,R,I,A,S)

3

For S₁: X (23) is most frequent → k₁ candidates map X to E,T,N…
k₁ ∈ {19, 10, 4, 3, 17, 22, 6, 18}

4

For S₂: W (22) is most frequent → k₂ candidates:
k₂ ∈ {18, 9, 3, 2, 16, 21, 5, 17}

5

Test each combination (8³ = 512 candidates) against the first few ciphertext letters.
Winner: (k₀=24, k₁=4, k₂=18) → Keyword = Y · E · S ✓

Verification — decrypt with key YES

p_i = (c_i − k_{i mod 3}) mod 26

R=17 − Y=24 → (17−24+26) mod 26 = 19 = T ✓
L=11 − E=4 → 7 = H ✓
W=22 − S=18 → 4 = E ✓ → "THE…"

Recovered Plaintext | Key = YES

THE TRUTH IS ALWAYS SOMETHING THAT IS TOLD, NOT SOMETHING THAT IS KNOWN. IF THERE WERE NO SPEAKING OR WRITING, THERE WOULD BE NO TRUTH ABOUT ANYTHING. THERE WOULD ONLY BE WHAT IS.

📊

Statistical Analysis of Simple Substitution

Letter frequencies · digraphs · trigraphs · roughness

English Letter Frequencies

E

12.7%

T

9.1%

A

8.2%

O

7.5%

I

7.0%

N

6.7%

S

6.3%

H

6.1%

R

6.0%

D

4.3%

L

4.0%

🧠

Memory aid: High-frequency letters in order → ETAOINSHRD
Or remember the word SENORITA which contains most top-8 letters.
Low-frequency letters: J, K, Q, X, Z (together < 1% of English text)

Most Common Digraphs (pairs)

Rank	Digraph
1	TH
2	HE
3	IN
4	ER
5	AN
6	RE
7	ED
8	ON

Most Common Trigraphs (triples)

Rank	Trigraph
1	THE
2	ING
3	AND
4	HER
5	ERE
6	ENT
7	THA
8	NTH

Roughness — Identifying Cipher Type by Distribution Shape

📈

Rough distribution (high peaks & low troughs)
Indicates: plaintext, transposition, or simple substitution.
The same letter frequencies are present — just possibly relabeled.

📉

Flat distribution (no clear peaks)
Indicates: polyalphabetic substitution (Vigenère, Enigma).
Multiple alphabets suppress frequency patterns.

✦ Breaking Simple Substitution — Step by Step

Ciphertext (simple substitution) LWVOL QVWAT DOLOH HLDAW VWPTV FHWDW RSVWO DNTVA WRWDF HWHFO RLFWK LFJLF FLQOT DHFVW DMFBW DFWVO DMSTX VHWAF TVPKA QLVCW

1

Count letter frequencies. Find: W=16, F=9, V=8, D=7, L=6, H=5, O=5…
W is the most frequent → most likely represents E (most common in English)

2

F (2nd most frequent) → likely T. So substitute: W→E, F→T

3

Look for digraphs in ciphertext. If WH appears often → WH = EH or TH? Try WH = TH → W=T conflicts. So W=E, H=T gives HW = TE, WH = ET — check against known digraph EN, TH patterns.

4

Once a few letters are correctly guessed, partial words start to reveal themselves (e.g., "_HE_" → "THE_"). Fill in letter by letter until the full message is revealed.

💡

Pro tip: Look for single-letter words in ciphertext → must be A or I.
Two-letter words most likely: OF, TO, IN, IT, IS, BE, AS, AT, SO, WE, HE, BY, OR, AN, DO.
Three-letter words most likely: THE, AND, FOR, ARE, BUT, NOT, YOU, ALL, CAN, HER.

📐

Index of Coincidence (IC)

Determine cipher type and estimate Vigenère key length

IC = Σ Fᵢ(Fᵢ − 1) / n(n − 1)
Fᵢ = count of letter i in ciphertext · n = total ciphertext length

🔵

IC ≈ 0.065 — Mono-alphabetic substitution or transposition
Distribution is "rough" like natural English.

🟠

IC ≈ 0.038 — Polyalphabetic substitution (Vigenère, long key)
Distribution is nearly flat / uniform.

✅

Decision rule:
IC > 0.065 → Mono-alphabetic (simple substitution)
IC ≈ 0.065 exactly → Transposition (same letters, shuffled)
0.038 ≤ IC < 0.065 → Polyalphabetic (Vigenère etc.)

✦ Full Worked Example (from the lecture)

Ciphertext — 120 letters EEAHR RFOWW TGDTE SCHES ROEST EMCNEAOOTL AKNEE TSSEO AVXNC STPOO OEOEATASBI OAEER AXHEE RADNF PSINO ISEAURPNED XEPSE PFCDL LZTER JAETY RETHE

Step 1 — Build the frequency table

Letter	F	F(F−1)	Letter	F	F(F−1)	Letter	F	F(F−1)
A	11	110	J	1	0	S	10	90
B	1	0	K	1	0	T	10	90
C	4	12	L	3	6	U	1	0
D	4	12	M	1	0	V	1	0
E	24	552	N	6	30	W	2	2
F	3	6	O	11	110	X	3	6
G	1	0	P	5	20	Y	1	0
H	4	12	Q	0	0	Z	1	0
I	3	6	R	8	56	Σ = 1110

Step 2 — Compute IC

ΣF(F−1) = 110+0+12+12+552+6+0+12+6+0+0+6+0+30+110+20+56+90+90+0+0+2+6+0+0+0 = 1110

n(n−1) = 120 × 119 = 14,280

IC = 1110 / 14280 = 0.0777

Conclusion

IC = 0.077 > 0.065 → Mono-alphabetic substitution cipher ✓

Using IC to Estimate Vigenère Key Length

Once you know the cipher is polyalphabetic, estimate the key length k using:

k ≈ (0.0265 × N) / ((N−1) × IC − 0.065N + 0.0385)
N = total ciphertext length · IC = computed index of coincidence

🧪

Example: IC = 0.047, N = 200 letters
k ≈ (0.0265 × 200) / ((199 × 0.047) − (0.065 × 200) + 0.0385)
k ≈ 5.3 / (9.353 − 13 + 0.0385) ≈ 5.3 / (−3.61)…
Try k = 4, 5, 6: for each k, split text and compute IC of each sub-sequence. The k giving sub-IC ≈ 0.065 is the correct key length.

Master Cryptanalysis Flowchart

1

Compute IC. Classify: Mono (IC≈0.065), Poly (IC≈0.038), or Transposition (IC≈0.065 with same letter set as plaintext)

2

If Transposition: Test array dimensions (divisors of ciphertext length). Try reading rows/columns. Look for English words in rows.

3

If Mono-alphabetic: Count letter frequencies. Map most frequent ciphertext letter → E (12.7%). Use digraph/trigraph patterns. Fill in partial words iteratively.

4

If Affine: Use top 2 frequent letters, set up 2-equation system. Solve for a,b. Check gcd(a,26)=1. Compute a⁻¹ mod 26. Decrypt and verify.

5

If Poly (Vigenère): Use Kasiski test or IC sub-sequence method to find key length k. Split into k groups. Solve each group as Caesar cipher using frequency analysis.

6

Verify: Decrypted text must be readable, grammatical English. If not, revisit your frequency assumptions and try the next candidate.