heroCTF2025 crypto/Tarsnap
Why do encrypted ZIPs exist but not encrypted TARs ? Anyway, I made a 100% super secure online backup service because I’m truely paranoid.
This challenge pretends to offer a “super secure online encrypted tar archive service”. You can:
1. Add flag to encrypted archive
2. Add your own file
3. Export the encrypted archive
The server claims your archive is encrypted, so how could we ever recover the flag ?
The trick is that the encryption (ChaCha20) does not hide the size of the compressed data.
And the compression algorithm leaks incredible information about the flag.
compression + encryption + attacker-controlled input = catastrophic leakage
1 - How the server works internally
The server likely does something equivalent to this:
tar_data = build_tar(files)
compressed = zlib.compress(tar_data)
encrypted = chacha20_encrypt(KEY, NONCE, compressed)
return encrypted
Important details:
- The flag and our file are both stored inside the same TAR.
- The TAR is then compressed.
- Then the compressed result is encrypted.
- ChaCha20 does NOT change the length of the data.
- The challenge shows us the hex length of the encrypted bytes.
This means: The output size == the size of the compressed TAR.
So:
If compression behaves differently, the encrypted output size changes.
This is the root vulnerability.
2 - Compression 101
Compression algorithms like DEFLATE (used by zlib) work by finding repeated patterns and replacing them with shorter references.
Example:
Input: ABCABCABC
The compressor first writes:
ABC
Then it notices the pattern “ABC” repeats multiple times, so instead of writing the actual bytes again, it outputs:
(distance=3, length=3)
(distance=3, length=3)
This drastically reduces the output size.
Key principle: The more the input contains repeated sequences, the more compression shrinks it.
3 - Where the vulnerability comes from
The server compresses both the secret flag and our controlled data together in the same stream.
Something like:
tar = [flag.txt | guess.txt]
compressed = zlib.compress(tar)
encrypted = chacha20(compressed)
return encrypted_size
This is a fundamental mistake. Because now compression sees:
- SECRET:
Hero{5h0uld...} - USER INPUT:
Hero{5h0uld...???
If our input starts with the correct prefix of the flag:
Hero{5h0uld_h4v3
then compression will say:
“Hey, I’ve seen this before in the flag!”
→ big compression gain
→ final size shrinks.
If our guess is wrong:
Hero{XXXXXXXXXXXX
then compression finds fewer matches
→ size grows.
Thus: By measuring the size of the encrypted output, we can tell if our guess matches the flag.
This is called a compression side-channel attack. This class of bugs is real and dangerous :
- CRIME
- BREACH
- TIME
- HEIST
All based on the exact same principle.
4 - Turning the leak into an oracle
We want to guess the next character of the flag. We know the flag begins with Hero{.
Let’s try guessing the next character.
We test input = “Hero{a” We send a file whose content contains many repetitions of:
salt1 + "Hero{a"
salt2 + "Hero{a"
...
saltN + "Hero{a"
We add the flag (option 1), add our file (option 2), then export (option 3). We note the encrypted length. Then test input = “Hero{b” Same process.
Then “Hero{c”, “Hero{0”, “Hero{_”, … etc.
For each candidate character c, we calculate: length(compress(flag + our_payload “prefix+c”))
The candidate producing the smallest size is the correct next character. We append it to our known prefix and repeat.
5 - Amplifying the leakage
The difference in size between correct and incorrect guesses can be small (sometimes 1–3 bytes). To make the detection reliable, we amplify the effect:
- Many repetitions (“CHUNKS”)
Instead of writing:
prefix + c
we send:
salt1 || prefix+c
salt2 || prefix+c
...
salt800 || prefix+c
The repetition of “prefix+c” strongly boosts compression if c is correct.
- Random salts
We prepend every repetition with random bytes:
salt + prefix + c
Why? To avoid unintended compression on other parts of the TAR structure. This ensures only the prefix+c region is the repeated pattern the compressor sees.
- Multiple measurements (SAMPLES)
Each guess is tested multiple times because:
- server may introduce noise
- network jitter may reorder packets
- zlib occasionally has small non-deterministic effects depending on input alignment
We take the minimum or median to smooth randomness.
- Narrowing down TOP_K
We take only the 5 best candidates (smallest size) and re-test them to select the true best.
- Final convergence check
If two characters are extremely close in score, we re-test until a clear winner appears.
6 - Exploitation script
A simplified version of the final exploit:
payload = b"".join(s + (known_prefix + candidate)
for s in salts)
ct_len = query_length(payload.hex().encode())
This is repeated for all characters in the charset:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_{}-
Once the character that minimizes size is found, we append it and move to the next one. The prefix grows like:
Hero{5
Hero{5h
Hero{5h0
Hero{5h0u
...
Hero{5h0uld_h4v3_u53d_3ncryp7_7h3n_c
Hero{5h0uld_h4v3_u53d_3ncryp7_7h3n_c0
Hero{5h0uld_h4v3_u53d_3ncryp7_7h3n_c0m
...
Hero{5h0uld_h4v3_u53d_3ncryp7_7h3n_c0mpr355_1n5734d}
Exactly like solving a crossword puzzle where the server reveals “hotter/colder” based on compression.
The entire script is :
import os
import random
import string
from pwn import context, remote
# Configuration
HOST = "crypto.heroctf.fr"
PORT = 9002
PREFIX = b"Hero{5h0uld_h4v3_u53d_3ncryp7_7h3n_"
CHARSET = string.ascii_letters + string.digits + "_{}-"
CHUNKS = 800
SALT_LEN = 8
SAMPLES = 5
TOP_K = 5
context.log_level = "error"
def build_payload(prefix, salts):
return b"".join(s + prefix for s in salts)
class Oracle:
def __init__(self):
self.host = HOST
self.port = PORT
self.conn = self._connect()
def _connect(self):
try:
if getattr(self, "conn", None):
self.conn.close()
except Exception:
pass
return remote(self.host, self.port, timeout=10)
def query_length(self, hexdata):
for attempt in (1, 2):
try:
r = self.conn
r.sendlineafter(b"> ", b"1")
r.sendlineafter(b"> ", b"2")
r.sendlineafter(b"Filename: ", b"a")
r.sendlineafter(b"Content: ", hexdata)
r.sendlineafter(b"> ", b"3")
line = r.recvline(timeout=5)
while line and b"Encrypted content:" not in line:
line = r.recvline(timeout=5)
if not line:
raise EOFError
cipher_hex = line.split(b":", 1)[1].strip()
return len(cipher_hex) // 2
except Exception:
if attempt == 2:
raise
self.conn = self._connect()
def main():
salts = [os.urandom(SALT_LEN) for _ in range(CHUNKS)]
oracle = Oracle()
current_prefix = PREFIX
charset_bytes = [c.encode() for c in CHARSET]
print(f"[+] Starting: {current_prefix.decode()}")
while True:
base_scores = []
for ch in charset_bytes:
guess = current_prefix + ch
payload = build_payload(guess, salts)
hexdata = payload.hex().encode()
length = oracle.query_length(hexdata)
base_scores.append((length, ch, hexdata))
base_scores.sort(key=lambda x: x[0])
best = None
for base_len, ch, hexdata in base_scores[:TOP_K]:
lengths = [base_len]
for _ in range(SAMPLES - 1):
lengths.append(oracle.query_length(hexdata))
score = min(lengths)
if best is None or score < best[0]:
best = (score, ch)
score, best_char = best
current_prefix += best_char
print(f"[+] Found: {best_char.decode()} -> {current_prefix.decode()}")
if best_char == b"}":
break
if __name__ == "__main__":
main()
The final recovered secret was:
Hero{5h0uld_h4v3_u53d_3ncryp7_7h3n_c0mpr355_1n5734d}
By Roockbye