From 63f05187d93d78e6a2349f1eeaee64b4afad0c7a Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Wed, 21 Sep 2016 21:06:24 -0800 Subject: Bcachefs, encryption updates --- Bcachefs.mdwn | 5 ++++- Encryption.mdwn | 56 +++++++++++++++++++++++++++++++++++++++----------------- 2 files changed, 43 insertions(+), 18 deletions(-) diff --git a/Bcachefs.mdwn b/Bcachefs.mdwn index 71e301c..469b98b 100644 --- a/Bcachefs.mdwn +++ b/Bcachefs.mdwn @@ -141,7 +141,7 @@ possible. awhile in favor of making the core functionality production quality - replication is not currently suitable for outside testing. - - Encryption + - [[Encryption]] Implementation is finished, and passes all the tests. The blocker on rolling it out is finishing the design doc and getting outside review (as feedback @@ -186,6 +186,9 @@ Please ask questions and ask for them to be added here! * a feature bits field * bring some structure to the variable length portion, so we can add more crap later - do it like inode optional fields + * on clean shutdown, write current journal sequence number to superblock - + help guard against corruption or an encrypted filesystem being tampered + with * More bits (once we have feature bits) for "has this feature ever been used", e.g. * encryption - if we don't have encrypted data, we don't need to load cyphers diff --git a/Encryption.mdwn b/Encryption.mdwn index ec9d85f..6852d10 100644 --- a/Encryption.mdwn +++ b/Encryption.mdwn @@ -36,6 +36,10 @@ security and robustness, and is meant to defend against a wider variety of adversarial models than is typical in existing filesystem level or block level encryption. +In particular, the goal is to be secure even when the attacker controls the +storage device itself, and can see reads and writes as they happen and return +arbitrary data from read requests. + ## Filesystem vs. directory encryption We do not currently offer per directory encryption; instead, we take an "encrypt @@ -61,14 +65,8 @@ everything after the header for that particular metadata write - will not leak. By virtue of working within a copy on write filesystem with provisions for ZFS style checksums (that is, checksums with the pointers, not the data), we’re able to use a modern AEAD style construction. We use ChaCha20 and Poly1305. We -use the cyphers directly instead of using the kernel AEAD library (and thus -means there's a bit more in the design that needs auditing). - -The current design uses the same key for both ChaCha20 and Poly1305, but my -recent rereading of the Poly1305-AES paper seems to imply that the Poly1305 key -shouldn't be used for anything else. Guidance from actual cryptographers would -be appreciated here; the ChaCha20/Poly1305 AEAD RFC appears to be silent on the -matter. +use the cyphers directly instead of using the kernel AEAD library. However, we +do follow pretty closely the approach of [[RFC 7539|https://tools.ietf.org/html/rfc7539]]. Note that ChaCha20 is a stream cypher. This means that it’s critical that we use a cryptographic MAC (which would be highly desirable anyways), and also avoiding @@ -96,6 +94,10 @@ key, which is stored in the superblock - also with ChaCha20. The master key is encrypted with an 8 byte header, so that we can tell if the correct key was supplied. +TODO: Add a field to the superblock specifying the key derivation function, so +that we can transition to newer KDFs later (e.g. Argon2) or specify cost +parameters. + ### Metadata Except for the superblock, no metadata in bcache/bcachefs is updated in place - @@ -166,15 +168,35 @@ sized chunks of data, and we store one checksum/MAC per extent, not per block: a checksum or MAC might cover up to 64k (extents that aren't checksummed or compressed may be larger). Nonces are thus also per extent, not per block. -Currently, the Poly1305 MAC is truncated to 64 bits - due to a desire not to -inflate our metadata any more than necessary. Guidance from cryptographers is -requested as to whether this is a reasonable option; do note that the MAC is not -stored with the data, but is itself stored encrypted elsewhere in the btree. We -do already have different fields for storing 4 byte checksums and 8 byte -checksums; it will be a trivial matter to add a field allowing 16 byte checksums -to be stored, and we will add that anyways - so this isn't a pressing design -issue, this is just a matter of what the defaults should be and what we should -tell users. +By default, for data extents the Poly1305 MAC is truncated to 80 bits, for space +efficiency reasons. Optionally the full 128 bit macs may be stored, at the cost +increasing the size of extents by 8 bytes (with 80 bit macs, an extent with a +single replica will typically be 32 bytes, or 40 bytes with 128 bit macs). + +This should be completely safe for the vast majority of uses cases. Most uses of +cryptographic MACs are in networked applications, where an attacker may be able +to send an unlimited number of forged messages: in that environment, a 64 bit +mac is clearly insufficient - if an attacker is able to send 2^32 forgery +attempts (not a huge number these days), probability of success is 1 / 2^32 - +which is not considered a remotely safe margin by cryptographers. + +However, with a filesystem, even in the case of a completely compromised device +(say an attacker has compromised the firmware on the disk, and is able to return +whatever they want when we read a sector) - if the MAC doesn't match (because +the attacker is attempting to forge data), we consider the device to be failing +and very shortly we're going to stop using it - we won't attempt to reread data +that appears to be corrupt indefinitely. So, attacker gets a very small (on the +order of 10) attempts to forge a particular extent. In the very worst case, if +we're trying very hard to migrate data off a device that appears to be bad, the +attacker might get ~10 attempts multiplied by the number of extents on the +device - but the number of forgery attempts should be clearly bounded. + +If the user is in an environment where transient failures/corruption are +expected should be tolerated, instead of assuming the device is bad (e.g. the +disks are accessed over the network, and the network path is known to corrupt +data) - in that situation 128 bit macs should be used (and in the future we may +enforce that if the maximum number of read retries is set to more than a small +number, 128 bit macs must be used if encryption is in use). #### Extent nonces -- cgit v1.2.3