🔓 Cryptanalysis — Breaking Ciphers

Complete step-by-step guide to breaking every cipher type covered in the course

⚔️

Types of Cryptanalytic Attacks

Classified by how much information the attacker has

🔍
Ciphertext-Only Attack
Attacker has only ciphertext. Hardest attack. Relies on statistics and guessing.
Hardest
📋
Known-Plaintext Attack
Attacker has plaintext + matching ciphertext pairs. Used to deduce the key.
Medium
✏️
Chosen-Plaintext Attack
Attacker picks the plaintext to be encrypted. More powerful than known-plaintext.
Easier
🔨
Brute-Force Attack
Try every possible key until plaintext is found. Impractical for large key spaces.
Always possible

✦ Practical Example — Brute-Force on Caesar Cipher

Caesar has only 26 possible keys. We simply try all of them.

Ciphertext KHOOR ZRUOG
Key kDecrypted ResultValid English?
1JGNNQ YQTNF
2IFMMP XPSME
3HELLO WORLD✓ Found!
4GDKKN VNQKC
Recovered Plaintext
HELLO WORLD  |  Key = 3
🔀

Breaking Transposition Ciphers

Columnar · Keyword Columnar · Double Transposition · Scytale

💡
Key insight: In a transposition cipher, the same letters are present as in the plaintext — only their positions change. Therefore the letter frequency distribution is identical to the original plaintext. This is how we detect transposition.

1. Breaking Simple Columnar Transposition

The ciphertext is formed by reading down each column of a rectangular grid. To break it, we test each divisor of the ciphertext length as a possible number of columns.

Ciphertext SHGEEHELTTIX    (12 letters)

Possible column counts: divisors of 12 → 2, 3, 4, 6

1

Try 4 columns (12 ÷ 4 = 3 rows). Fill ciphertext down each column:

Try: 4 columns
C0
C1
C2
C3
R0
S
E
E
T
R1
H
E
L
I
R2
G
H
T
X

READ ROWS:

S-E-E-T
H-E-L-I
G-H-T-X
2

Reading rows gives: SEETHELIGHT ✓ — meaningful English!

3

Key = 4 columns. The trailing X was padding to complete the grid.

Recovered Plaintext
SEETHELIGHT  |  Key = 4 columns

2. Breaking Keyword Columnar Transposition

Ciphertext (45 letters) VOESA IVENE MRTNL EANGE WTNIM HTMEE ADLTR NISHO DWOEH
1

45 letters → possible grid dimensions: 9×5, 5×9, 15×3, 3×15

2

Try a 9 rows × 5 columns grid. Fill ciphertext column by column:

Col 0Col 1Col 2Col 3Col 4
VEGMI
OMEES
ERWEH
STTAO
ANNDD
ILILW
VEMTO
EAHRE
NNTNH
3

Permute columns until row 0 spells a recognizable word.
Try order 2, 4, 0, 1, 3: Row 0 → G · I · V · E · M → starts with GIVE ✓

4

Column order 24013 is the recovered key. Read rows to get plaintext.

Recovered Plaintext
GIVE ME SOMEWHERE TO STAND AND I WILL MOVE THE EARTH  |  Key = 24013

3. Breaking Double Transposition — Divide & Conquer

Strategy: Instead of guessing the full key at once, split into two stages.
Step A — undo the column permutation (look for partial words in columns).
Step B — undo the row permutation (now trivial since columns are fixed).
Ciphertext NADWTKCAATAT
1

Place in a 3×4 matrix (read across rows):

Ciphertext in 3×4 grid
N
A
D
W
T
K
C
A
A
T
A
T
→ col perm (3,1,0,2)
After undoing columns
W
A
N
D
A
K
T
C
T
T
A
A
→ row perm (2,1,0)
Final plaintext grid
A
T
T
A
C
K
A
T
D
A
W
N
Recovered Plaintext
ATTACKATDAWN  |  Column perm: (3,1,0,2) · Row perm: (2,1,0)
👑

Breaking the Caesar Cipher

Only 26 possible keys — brute-force or frequency analysis

📐
Encryption: c = (p + k) mod 26
Decryption: p = (c − k) mod 26
where A=0, B=1, … Z=25
🎯
Breaking strategy:
1. Count letter frequencies in ciphertext
2. Most frequent letter ≈ E (or T, A, O…)
3. Compute k, verify with brute-force if needed

✦ Worked Example

Ciphertext WKH TXLFN EURZQ IRA
1

Try k = 3: W→T, K→H, H→E → "THE" ✓ — common English word!

2

Apply p = (c − 3) mod 26 to every letter:

Cipher letterNumeric value−3 mod 26Plaintext
W2222 − 3 = 19T
K1010 − 3 = 7H
H77 − 3 = 4E
T1919 − 3 = 16Q
X2323 − 3 = 20U
L1111 − 3 = 8I
F55 − 3 = 2C
N1313 − 3 = 10K
Recovered Plaintext
THE QUICK BROWN FOX  |  Key k = 3
🧮

Breaking the Affine Cipher

e(x) = ax + b (mod 26) · only 312 valid key pairs

🔑
Encryption: c = ax + b (mod 26)
Decryption: p = a⁻¹(y − b) (mod 26)
Constraint: gcd(a, 26) = 1
🎯
Breaking strategy:
1. Find the 2 most frequent ciphertext letters
2. Assume they map to E and T (or E and A, etc.)
3. Solve 2 equations in 2 unknowns (a and b)
4. Verify gcd(a,26)=1, then decrypt

✦ Worked Example (from the lecture)

Ciphertext (57 characters) FMXVEDKAPHFERBNDKRXRSREFMORUDSDKDVSHVUFEDKAPR KDLYEVLRHHRH

Step 1 — Frequency count

R
8
most
D
7
2nd
E
5
3rd
H
5
3rd
K
5
3rd
F
4
S
4
V
4

Step 2 — Form hypotheses

Most frequent ciphertext letter R (value 17) → assume it encrypts e (value 4)
Second most frequent D (value 3) → assume it encrypts t (value 19)

e_K(4) = 17 → 4a + b = 17
e_K(19) = 3 → 19a + b = 3
HypothesisR maps to2nd maps toEquationsabgcd(a,26)Valid?
#1e (4)D→t (19)4a+b=17, 19a+b=36−72✗ illegal
#2e (4)E→t (19)4a+b=17, 4a+b=41313✗ illegal
#3e (4)H→t (19)4a+b=17, 7a+b=1982✗ illegal
#4e (4)K→t (10)4a+b=17, 10a+b=19351✓ Legal!

Step 3 — Compute decryption function

Need: a⁻¹ mod 26 where a = 3
3 × 9 = 27 ≡ 1 (mod 26) → a⁻¹ = 9

d_K(y) = 9(y − 5) mod 26 = 9y − 45 mod 26 = 9y − 19 (mod 26)

Step 4 — Decrypt a few letters to verify

Cipher lettery value9y − 19 mod 26Plaintext
F545 − 19 = 26 ≡ 0a
M12108 − 19 = 89 ≡ 11l
X23207 − 19 = 188 ≡ 6g
V21189 − 19 = 170 ≡ 14o
Recovered Plaintext  |  Key: a=3, b=5
ALGORITHMS ARE QUITE GENERAL DEFINITION SO FAR ARITHMETIC PROCESSES
🗝️

Breaking the Vigenère Cipher

Kasiski Test + Index of Coincidence → find key length → break each shift separately

🎯
Master strategy — three phases:
Phase 1 — Determine the key length (Kasiski or IC method)
Phase 2 — Split ciphertext into groups of size = key length
Phase 3 — Solve each group independently as a Caesar cipher

Phase 1A — Kasiski Test

If the same plaintext fragment happens to align with the same portion of the keyword more than once, it will produce identical ciphertext. The spacing between these repetitions is a multiple of the key length.

Example ciphertext — encrypted with keyword POETRY (length 6) IVIVYGARMLMYIVIKFDIVIFRL
1

Scan for repeated groups: IVI appears at positions 0, 12, 18

2

Compute gaps: 12 − 0 = 12,   18 − 12 = 6

3

GCD(12, 6) = 6 → key length is likely 6 (or a divisor: 2, 3)

4

In this case key length = 6 exactly ✓

Phase 1B → 3 — Complete Break (3-letter keyword)

Ciphertext (key length = 3, determined by Kasiski/IC) RLWRV MRLAQ EDUEQ QWGKI LFMFE XZYXA QXGJH FMXKM QWRLA LKLFE LGWCL SOLMX RLWPI OCVWL SKNIS IMFES JUVAR MFEXZ CVWUS MJHTC RGRVM RLSZS MREFW XZGRY RLWPI OMYDB SFJCT CAZYX AQ
1

Split into 3 sets by position modulo 3:
S₀ = letters at positions 0,3,6,9,…  |  S₁ = 1,4,7,10,…  |  S₂ = 2,5,8,11,…

S₀ — most frequent

LetterCount
R10
Q4
M3

S₁ — most frequent

LetterCount
X7
L6
V5

S₂ — most frequent

LetterCount
W6
M4
A5
2

For S₀: R (17) is most frequent → assume R = E (4):
k₀ = 17 − 4 = 13. Candidate k₀ values: {13, 24, 4, 3, 0, 9, 17, 25} (matching E,T,N,O,R,I,A,S)

3

For S₁: X (23) is most frequent → k₁ candidates map X to E,T,N…
k₁ ∈ {19, 10, 4, 3, 17, 22, 6, 18}

4

For S₂: W (22) is most frequent → k₂ candidates:
k₂ ∈ {18, 9, 3, 2, 16, 21, 5, 17}

5

Test each combination (8³ = 512 candidates) against the first few ciphertext letters.
Winner: (k₀=24, k₁=4, k₂=18) → Keyword = Y · E · S

Verification — decrypt with key YES

p_i = (c_i − k_{i mod 3}) mod 26

R=17 − Y=24 → (17−24+26) mod 26 = 19 = T ✓
L=11 − E=4 → 7 = H ✓
W=22 − S=18 → 4 = E ✓ → "THE…"
Recovered Plaintext  |  Key = YES
THE TRUTH IS ALWAYS SOMETHING THAT IS TOLD, NOT SOMETHING THAT IS KNOWN. IF THERE WERE NO SPEAKING OR WRITING, THERE WOULD BE NO TRUTH ABOUT ANYTHING. THERE WOULD ONLY BE WHAT IS.
📊

Statistical Analysis of Simple Substitution

Letter frequencies · digraphs · trigraphs · roughness

English Letter Frequencies

E
12.7%
T
9.1%
A
8.2%
O
7.5%
I
7.0%
N
6.7%
S
6.3%
H
6.1%
R
6.0%
D
4.3%
L
4.0%
🧠
Memory aid: High-frequency letters in order → ETAOINSHRD
Or remember the word SENORITA which contains most top-8 letters.
Low-frequency letters: J, K, Q, X, Z (together < 1% of English text)

Most Common Digraphs (pairs)

RankDigraph
1TH
2HE
3IN
4ER
5AN
6RE
7ED
8ON

Most Common Trigraphs (triples)

RankTrigraph
1THE
2ING
3AND
4HER
5ERE
6ENT
7THA
8NTH

Roughness — Identifying Cipher Type by Distribution Shape

📈
Rough distribution (high peaks & low troughs)
Indicates: plaintext, transposition, or simple substitution.
The same letter frequencies are present — just possibly relabeled.
📉
Flat distribution (no clear peaks)
Indicates: polyalphabetic substitution (Vigenère, Enigma).
Multiple alphabets suppress frequency patterns.

✦ Breaking Simple Substitution — Step by Step

Ciphertext (simple substitution) LWVOL QVWAT DOLOH HLDAW VWPTV FHWDW RSVWO DNTVA WRWDF HWHFO RLFWK LFJLF FLQOT DHFVW DMFBW DFWVO DMSTX VHWAF TVPKA QLVCW
1

Count letter frequencies. Find: W=16, F=9, V=8, D=7, L=6, H=5, O=5…
W is the most frequent → most likely represents E (most common in English)

2

F (2nd most frequent) → likely T. So substitute: W→E, F→T

3

Look for digraphs in ciphertext. If WH appears often → WH = EH or TH? Try WH = TH → W=T conflicts. So W=E, H=T gives HW = TE, WH = ET — check against known digraph EN, TH patterns.

4

Once a few letters are correctly guessed, partial words start to reveal themselves (e.g., "_HE_" → "THE_"). Fill in letter by letter until the full message is revealed.

💡
Pro tip: Look for single-letter words in ciphertext → must be A or I.
Two-letter words most likely: OF, TO, IN, IT, IS, BE, AS, AT, SO, WE, HE, BY, OR, AN, DO.
Three-letter words most likely: THE, AND, FOR, ARE, BUT, NOT, YOU, ALL, CAN, HER.
📐

Index of Coincidence (IC)

Determine cipher type and estimate Vigenère key length

IC = Σ Fᵢ(Fᵢ − 1) / n(n − 1)
Fᵢ = count of letter i in ciphertext  ·  n = total ciphertext length
🔵
IC ≈ 0.065 — Mono-alphabetic substitution or transposition
Distribution is "rough" like natural English.
🟠
IC ≈ 0.038 — Polyalphabetic substitution (Vigenère, long key)
Distribution is nearly flat / uniform.
Decision rule:
IC > 0.065 → Mono-alphabetic (simple substitution)
IC ≈ 0.065 exactly → Transposition (same letters, shuffled)
0.038 ≤ IC < 0.065 → Polyalphabetic (Vigenère etc.)

✦ Full Worked Example (from the lecture)

Ciphertext — 120 letters EEAHR RFOWW TGDTE SCHES ROEST EMCNEAOOTL AKNEE TSSEO AVXNC STPOO OEOEATASBI OAEER AXHEE RADNF PSINO ISEAURPNED XEPSE PFCDL LZTER JAETY RETHE

Step 1 — Build the frequency table

LetterFF(F−1)LetterFF(F−1)LetterFF(F−1)
A11110J10S1090
B10K10T1090
C412L36U10
D412M10V10
E24552N630W22
F36O11110X36
G10P520Y10
H412Q00Z10
I36R856Σ = 1110

Step 2 — Compute IC

ΣF(F−1) = 110+0+12+12+552+6+0+12+6+0+0+6+0+30+110+20+56+90+90+0+0+2+6+0+0+0 = 1110

n(n−1) = 120 × 119 = 14,280

IC = 1110 / 14280 = 0.0777
Conclusion
IC = 0.077 > 0.065 → Mono-alphabetic substitution cipher ✓

Using IC to Estimate Vigenère Key Length

Once you know the cipher is polyalphabetic, estimate the key length k using:

k ≈ (0.0265 × N) / ((N−1) × IC − 0.065N + 0.0385)
N = total ciphertext length  ·  IC = computed index of coincidence
🧪
Example: IC = 0.047, N = 200 letters
k ≈ (0.0265 × 200) / ((199 × 0.047) − (0.065 × 200) + 0.0385)
k ≈ 5.3 / (9.353 − 13 + 0.0385) ≈ 5.3 / (−3.61)…
Try k = 4, 5, 6: for each k, split text and compute IC of each sub-sequence. The k giving sub-IC ≈ 0.065 is the correct key length.

Master Cryptanalysis Flowchart

1

Compute IC. Classify: Mono (IC≈0.065), Poly (IC≈0.038), or Transposition (IC≈0.065 with same letter set as plaintext)

2

If Transposition: Test array dimensions (divisors of ciphertext length). Try reading rows/columns. Look for English words in rows.

3

If Mono-alphabetic: Count letter frequencies. Map most frequent ciphertext letter → E (12.7%). Use digraph/trigraph patterns. Fill in partial words iteratively.

4

If Affine: Use top 2 frequent letters, set up 2-equation system. Solve for a,b. Check gcd(a,26)=1. Compute a⁻¹ mod 26. Decrypt and verify.

5

If Poly (Vigenère): Use Kasiski test or IC sub-sequence method to find key length k. Split into k groups. Solve each group as Caesar cipher using frequency analysis.

6

Verify: Decrypted text must be readable, grammatical English. If not, revisit your frequency assumptions and try the next candidate.