Encryption Guide ========================= Why Encryption Matters ------------------------------------------------------------------------------- Encryption is the process of transforming readable data (plaintext) into an unreadable form (ciphertext) that can only be reversed by someone who holds the correct key. It is the foundation of modern data security, protecting information at two distinct points in its lifecycle: * **Data at rest**: files on disk, database records, backups. If a storage medium is stolen or a database is breached, encrypted data is useless without the key. * **Data in transit**: messages, API payloads, tokens sent over a network. Without encryption an eavesdropper on the same network can read every byte. Beyond confidentiality, modern cipher modes (see below) also provide **integrity** (detecting accidental corruption) and **authenticity** (proving the data was not tampered with by an attacker). These three properties, often written as *CIA: Confidentiality, Integrity, Authenticity*, are the goals of any secure encryption scheme. Choosing the Right AES Mode ------------------------------------------------------------------------------- AES is the global standard symmetric cipher. The *mode* controls how AES processes data beyond a single 16-byte block. Choosing the wrong mode is the most common source of cryptographic mistakes. .. list-table:: :header-rows: 1 :widths: 12 12 12 64 * - Mode - Auth tag - IV / Nonce - When to use * - **GCM** - Yes - Nonce - **Default choice.** Authenticated encryption with high throughput. Parallelisable, widely supported. Use for almost everything. * - **EAX** - Yes - Nonce - Good alternative to GCM when simplicity matters more than raw speed. Easier to implement correctly from scratch. * - **CCM** - Yes - Nonce - Constrained environments (embedded, IoT). Requires the message length to be known upfront. * - **SIV** - Yes - None (self-generated) - **Nonce-misuse resistant.** Safe even if the same key is used twice with the same input, the tag itself acts as the IV. Requires a double-length key (32, 48, or 64 bytes). Slightly slower. * - **OCB** - Yes - Nonce - Fastest authenticated mode. Use when throughput is critical. * - **CBC** - No - IV - Legacy interoperability only. Provides confidentiality but **no integrity**. Vulnerable to padding-oracle attacks. Avoid in new code. * - **CTR** - No - Nonce - Turns AES into a stream cipher. Parallelisable reads. Use when you need streaming decryption and handle integrity separately (e.g. with an HMAC). * - **CFB / OFB** - No - IV - Legacy stream-like modes. Prefer CTR for new designs. * - **ECB** - No - None - **Never use for real data.** Identical plaintext blocks produce identical ciphertext, patterns in the plaintext are visible in the ciphertext (the famous "ECB penguin" problem). Choosing a Key Size ------------------------------------------------------------------------------- AES supports three key lengths. All are considered secure; the difference is the margin of safety against future advances in cryptanalysis. * **128-bit (16 bytes)**: sufficient for virtually all applications today. Used as the default by ``AESCipher`` when no key is provided. * **192-bit (24 bytes)**: rarely needed; provides extra headroom. * **256-bit (32 bytes)**: maximum security; use for long-lived secrets or highly sensitive data (medical records, financial data, government use). .. note:: ``MODE_SIV`` requires a *double-length* key because it uses two independent AES keys internally: 32 bytes (AES-128-SIV), 48 bytes (AES-192-SIV), or 64 bytes (AES-256-SIV). ``AESCipher`` auto-generates a 32-byte key when ``MODE_SIV`` is used without an explicit key. Security Rules ------------------------------------------------------------------------------- Following these rules prevents the most common real-world mistakes: 1. **Prefer authenticated modes (GCM, EAX, CCM, SIV, OCB).** Without an authentication tag, an attacker can silently modify the ciphertext and your application will decrypt garbage, or worse, a crafted payload. 2. **Never reuse a nonce with the same key.** Each encryption call with GCM, EAX, CTR, etc. must use a unique nonce. Reusing a nonce with the same key completely breaks confidentiality in stream-like modes, and breaks authenticity in GCM. ``AESCipher`` auto-generates a fresh random nonce on every ``encrypt()`` call. 3. **Store the full encrypted dict, not just the ciphertext.** The nonce/IV and authentication tag are required for decryption and verification. ``AESCipher.encrypt()`` returns them together for this reason. 4. **Keep keys secret and separate from ciphertext.** Storing a key next to the data it encrypts is equivalent to locking a door and leaving the key in the lock. Use a secrets manager, environment variable, or key-derivation function (KDF) like PBKDF2 or Argon2 to derive keys from passwords. 5. **Validate before decrypting sensitive outputs.** Authenticated modes raise ``ValueError`` on tag mismatch, always let the exception propagate rather than catching and ignoring it. 6. **Avoid ECB for anything other than single-block operations or protocol compatibility.** If you must use a non-authenticated mode, pair it with an HMAC (Encrypt-then-MAC pattern). Common Use Cases ------------------------------------------------------------------------------- **Encrypting a token or session payload stored in a database:** .. code-block:: python from core_ciphers.aes_cipher import AESCipher # Key must be loaded from a secure source (env var, secrets manager, etc.) cipher = AESCipher(key=SECRET_KEY) # Encrypt before storing record = cipher.encrypt(payload) db.save(record) # stores {'ciphertext': '...', 'tag': '...', 'nonce': '...'} # Decrypt on read, raises ValueError if the record was tampered with payload = cipher.decrypt(db.load()) **Protecting data that must survive nonce-reuse (e.g. deterministic IDs):** .. code-block:: python from Crypto.Cipher import AES from core_ciphers.aes_cipher import AESCipher # SIV produces the same ciphertext for the same plaintext+key, safe by design cipher = AESCipher(key=SECRET_KEY_32, mode=AES.MODE_SIV) encrypted = cipher.encrypt(user_id) **High-throughput streaming data with manual integrity (Encrypt-then-MAC):** .. code-block:: python import hmac import hashlib import json from Crypto.Random import get_random_bytes from core_ciphers.aes_cipher import AESCipher from Crypto.Cipher import AES enc_key = get_random_bytes(16) mac_key = get_random_bytes(32) large_payload = "sensitive data ..." # Encrypt cipher = AESCipher(key=enc_key, mode=AES.MODE_CTR) encrypted = cipher.encrypt(large_payload) # Compute MAC over the serialised ciphertext dict (Encrypt-then-MAC) serialised = json.dumps(encrypted).encode() mac = hmac.new(mac_key, serialised, hashlib.sha256).hexdigest() # --- on the receiving side --- # 1. Verify MAC before touching the ciphertext expected = hmac.new(mac_key, serialised, hashlib.sha256).hexdigest() if not hmac.compare_digest(mac, expected): raise ValueError("MAC verification failed — data was tampered with.") # 2. Each AESCipher instance is stateful; use a fresh one to decrypt decryptor = AESCipher(key=enc_key, mode=AES.MODE_CTR) plaintext = decryptor.decrypt(encrypted) # 'sensitive data ...'