Why do encrypted ZIPs exist but not encrypted TARs ? Anyway, I made a 100% super secure online backup service because I’m truely paranoid.

This challenge pretends to offer a “super secure online encrypted tar archive service”. You can:

1. Add flag to encrypted archive
2. Add your own file
3. Export the encrypted archive

The server claims your archive is encrypted, so how could we ever recover the flag ?

The trick is that the encryption (ChaCha20) does not hide the size of the compressed data.
And the compression algorithm leaks incredible information about the flag.

compression + encryption + attacker-controlled input = catastrophic leakage

1 - How the server works internally

The server likely does something equivalent to this:

tar_data = build_tar(files)
compressed = zlib.compress(tar_data)
encrypted = chacha20_encrypt(KEY, NONCE, compressed)
return encrypted

Important details:

  • The flag and our file are both stored inside the same TAR.
  • The TAR is then compressed.
  • Then the compressed result is encrypted.
  • ChaCha20 does NOT change the length of the data.
  • The challenge shows us the hex length of the encrypted bytes.

This means: The output size == the size of the compressed TAR.

So:
If compression behaves differently, the encrypted output size changes. This is the root vulnerability.

2 - Compression 101

Compression algorithms like DEFLATE (used by zlib) work by finding repeated patterns and replacing them with shorter references.

Example:

Input:  ABCABCABC

The compressor first writes:

ABC

Then it notices the pattern “ABC” repeats multiple times, so instead of writing the actual bytes again, it outputs:

(distance=3, length=3)
(distance=3, length=3)

This drastically reduces the output size.

Key principle: The more the input contains repeated sequences, the more compression shrinks it.

3 - Where the vulnerability comes from

The server compresses both the secret flag and our controlled data together in the same stream.

Something like:

tar = [flag.txt | guess.txt]
compressed = zlib.compress(tar)
encrypted = chacha20(compressed)
return encrypted_size

This is a fundamental mistake. Because now compression sees:

  • SECRET: Hero{5h0uld...}
  • USER INPUT: Hero{5h0uld...???

If our input starts with the correct prefix of the flag:

Hero{5h0uld_h4v3

then compression will say: “Hey, I’ve seen this before in the flag!”
→ big compression gain
→ final size shrinks.

If our guess is wrong:

Hero{XXXXXXXXXXXX

then compression finds fewer matches
→ size grows.

Thus: By measuring the size of the encrypted output, we can tell if our guess matches the flag.

This is called a compression side-channel attack. This class of bugs is real and dangerous :

  • CRIME
  • BREACH
  • TIME
  • HEIST

All based on the exact same principle.

4 - Turning the leak into an oracle

We want to guess the next character of the flag. We know the flag begins with Hero{. Let’s try guessing the next character.

We test input = “Hero{a” We send a file whose content contains many repetitions of:

salt1 + "Hero{a"
salt2 + "Hero{a"
...
saltN + "Hero{a"

We add the flag (option 1), add our file (option 2), then export (option 3). We note the encrypted length. Then test input = “Hero{b” Same process.

Then “Hero{c”, “Hero{0”, “Hero{_”, … etc. For each candidate character c, we calculate: length(compress(flag + our_payload “prefix+c”))

The candidate producing the smallest size is the correct next character. We append it to our known prefix and repeat.

5 - Amplifying the leakage

The difference in size between correct and incorrect guesses can be small (sometimes 1–3 bytes). To make the detection reliable, we amplify the effect:

  • Many repetitions (“CHUNKS”)

Instead of writing:

prefix + c

we send:

salt1 || prefix+c
salt2 || prefix+c
...
salt800 || prefix+c

The repetition of “prefix+c” strongly boosts compression if c is correct.

  • Random salts

We prepend every repetition with random bytes:

salt + prefix + c

Why? To avoid unintended compression on other parts of the TAR structure. This ensures only the prefix+c region is the repeated pattern the compressor sees.

  • Multiple measurements (SAMPLES)

Each guess is tested multiple times because:

  • server may introduce noise
  • network jitter may reorder packets
  • zlib occasionally has small non-deterministic effects depending on input alignment

We take the minimum or median to smooth randomness.

  • Narrowing down TOP_K

We take only the 5 best candidates (smallest size) and re-test them to select the true best.

  • Final convergence check

If two characters are extremely close in score, we re-test until a clear winner appears.

6 - Exploitation script

A simplified version of the final exploit:

payload = b"".join(s + (known_prefix + candidate)
                   for s in salts)

ct_len = query_length(payload.hex().encode())

This is repeated for all characters in the charset:

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_{}-

Once the character that minimizes size is found, we append it and move to the next one. The prefix grows like:

Hero{5
Hero{5h
Hero{5h0
Hero{5h0u
...
Hero{5h0uld_h4v3_u53d_3ncryp7_7h3n_c
Hero{5h0uld_h4v3_u53d_3ncryp7_7h3n_c0
Hero{5h0uld_h4v3_u53d_3ncryp7_7h3n_c0m
...
Hero{5h0uld_h4v3_u53d_3ncryp7_7h3n_c0mpr355_1n5734d}

Exactly like solving a crossword puzzle where the server reveals “hotter/colder” based on compression.

The entire script is :

import os
import random
import string
from pwn import context, remote

# Configuration
HOST = "crypto.heroctf.fr"
PORT = 9002
PREFIX = b"Hero{5h0uld_h4v3_u53d_3ncryp7_7h3n_"
CHARSET = string.ascii_letters + string.digits + "_{}-"
CHUNKS = 800
SALT_LEN = 8
SAMPLES = 5
TOP_K = 5

context.log_level = "error"

def build_payload(prefix, salts):
    return b"".join(s + prefix for s in salts)

class Oracle:
    def __init__(self):
        self.host = HOST
        self.port = PORT
        self.conn = self._connect()

    def _connect(self):
        try:
            if getattr(self, "conn", None):
                self.conn.close()
        except Exception:
            pass
        return remote(self.host, self.port, timeout=10)

    def query_length(self, hexdata):
        for attempt in (1, 2):
            try:
                r = self.conn
                r.sendlineafter(b"> ", b"1")
                r.sendlineafter(b"> ", b"2")
                r.sendlineafter(b"Filename: ", b"a")
                r.sendlineafter(b"Content: ", hexdata)
                r.sendlineafter(b"> ", b"3")

                line = r.recvline(timeout=5)
                while line and b"Encrypted content:" not in line:
                    line = r.recvline(timeout=5)
                if not line:
                    raise EOFError

                cipher_hex = line.split(b":", 1)[1].strip()
                return len(cipher_hex) // 2
            except Exception:
                if attempt == 2:
                    raise
                self.conn = self._connect()

def main():
    salts = [os.urandom(SALT_LEN) for _ in range(CHUNKS)]
    oracle = Oracle()
    current_prefix = PREFIX
    charset_bytes = [c.encode() for c in CHARSET]

    print(f"[+] Starting: {current_prefix.decode()}")

    while True:
        base_scores = []
        for ch in charset_bytes:
            guess = current_prefix + ch
            payload = build_payload(guess, salts)
            hexdata = payload.hex().encode()
            length = oracle.query_length(hexdata)
            base_scores.append((length, ch, hexdata))

        base_scores.sort(key=lambda x: x[0])

        best = None
        for base_len, ch, hexdata in base_scores[:TOP_K]:
            lengths = [base_len]
            for _ in range(SAMPLES - 1):
                lengths.append(oracle.query_length(hexdata))

            score = min(lengths)
            if best is None or score < best[0]:
                best = (score, ch)

        score, best_char = best
        current_prefix += best_char
        print(f"[+] Found: {best_char.decode()} -> {current_prefix.decode()}")

        if best_char == b"}":
            break

if __name__ == "__main__":
    main()

The final recovered secret was:

Hero{5h0uld_h4v3_u53d_3ncryp7_7h3n_c0mpr355_1n5734d}

By Roockbye