加密

上次我们讨论过如何用git和dropbox进行可靠节省的版本控制,这次我们来说说如何同时利用数据加密来实现外包的数据安全。

##############################################################
# GIT transparent encryption
#############################################################

##############################################################
Description
##############################################################
When working with a remote git repository which is hosted on a
third-party storage server, data confidentiality sometimes becomes
a concern. This article walks you through the procedures of setting
up git repositories for which your local working directories are as
normal (un-encrypted) but the committed content is encrypted.

##############################################################
The Story
##############################################################
I use ‘git’ [1] and ‘dropbox’ [2] as a reliable, highly available, cost
saving and distributed version control solution (see [3]), and have really
found it convenient and effective. One thing that is not addressed in
this solution is data privacy/confidentiality. As ‘dropbox’ is a third-party
data storage service with Amazon S3 as its backend data store, a paranoid
user like myself would always be concerned about the dropbox hosted data
being disclosed to others, accidentally or deliberately. After all, putting
unconditional trust on a third-party provider never seems to be a perfect
rescue.

User controlled end-to-end encryption solves the problem: before data is
pushed to the remote repository to store, it is encrypted with an encryption
key which is known only to the data owner itself. Management of the
encryption key(s) and the encryption/decryption processes is always tedious
and easy to get wrong. In the following, we shall demonstrate how to use
‘git’ with encryption in a way transparent to the end user.

Before we start the demonstration, the following software packages need
to be installed: git (version 1.7.1 for the demonstration), openssl [4]
(version 0.9.8o 01 Jun 2010 for the demonstration). The operating system
for the demonstration is Linux (Ubuntu 10.10).

The idea is to leverage git’s smudge/clean filter, hinted by the discussion
at [5]. Unlike [5], in which GPG is proposed as the encryption method,
we use OpenSSL’s symmetric key cipher as it is a better suitable solution.

The procedures are as follows.

1. Before the git repository is created, in your home directory

~$ mkdir .gitencrypt
~$ cd !$

# Create three files
~$ touch clean_filter_openssl smudge_filter_openssl diff_filter_openssl
~$ chmod 755 *

These files will be the clean/smudge/diff handler/hook for the git
repository which we are going to work with. The content of the three
files are:

— File: clean_filter_openssl —
#!/bin/bash

SALT_FIXED=<your-salt>
PASS_FIXED=<your-passphrase>

openssl enc -base64 -aes-256-ecb -S $SALT_FIXED -k $PASS_FIXED

— end of clean_filter_openssl —

Here replace <your-salt> with a random hexadecimal string and replace
<your-passphrase> with a passphrase you will use as a mater secret for
the symmetric key encryption/decryption. We are using AES-256 ECB mode
as the encryption algorithm, as it turns out a deterministic encryption
works best with ‘git’ (we will explain later).

— File: smudge_filter_openssl —
#!/bin/bash

# No salt is needed for decryption.
PASS_FIXED=<your-passphrase>

# If decryption fails, use `cat` instead.
# Error messages are redirected to /dev/null.
openssl enc -d -base64 -aes-256-ecb -k $PASS_FIXED 2> /dev/null || cat

— end of smudge_filter_openssl —

— File: diff_filter_openssl —
#!/bin/bash

# No salt is needed for decryption.
PASS_FIXED=<your-passphrase>

# Error messages are redirect to /dev/null.
openssl enc -d -base64 -aes-256-ecb -k $PASS_FIXED -in “$1” 2> \
/dev/null || cat “$1”

— end of diff_filter_openssl —

Files in the .gitencrypt directory should be locally kept and never shared
with anyone you do not want to have access to your data, as they contain
your decryption passphrase.

2. Change to the project directory where the ‘git’ repository is to be
created. Suppose this directory is ~/myproj/.

~/myproj$ git init

# Create .gitattributes file
~/myproj$ touch .gitattributes

Add the following content to .gitattributes

— File: .gitattributes —
* filter=openssl diff=openssl

[merge]
renormalize = true

— end of .gitattributes —

In this file, the ‘filter’ and ‘diff’ attributes are assigned to drivers
named ‘openssl’, which should be defined in .git/config as follows.

— Snippet: .git/config —

[filter “openssl”]
smudge = ~/.gitencrypt/smudge_filter_openssl

clean = ~/.gitencrypt/clean_filter_openssl

[diff “openssl”]
textconv = ~/.gitencrypt/diff_filter_openssl

— end of snippet —

3. Now ‘git add’ relevant files to the staging area. When you do this,
the ‘clean’ filter is applied to files in your working directory, i.e.,
it encrypts the files before they are checked into the staging area. Note
that as a best practice, ‘.gitattributes’ should not be added. At this time,
you can use ‘git diff’ as usual, as the ‘diff’ filter is properly
configured to compare the difference of only plain text data (it first
decrypts if needed).

4. Apply ‘git commit’ to commit the changes to the repository.

5. Now suppose the repository ‘myproj’ is connected to a remote repository
named ‘dropbox’ at ‘file://~/myproj-remote.git’, and you have ‘pushed’ all
the committed changes to it. Suppose you want to create another git
repository in directory ‘~/myproj-1’. First clone the remote repository
without checking out the HEAD.

~$ git clone -o dropbox -n file://myproj-remote.git myproj-1
~$ cd myproj-1

Now create under ‘myproj-1’ a file .gitattributes with the same content
as shown in Step 2. Then add/append the code snippet in ‘.git/config’ in
Step 2 to ‘myproj-1/.git/config’. Then reset the HEAD.

~/myproj-1$ git reset HEAD

You can now view a list of missing files by issuing ‘git ls-files -d’.
You can now check them out

~/myproj-1$ git ls-files -d | xargs git checkout —

When the files are checked out, the ‘smudge’ filter is automatically
applied, decrypting the files in the repository and putting the decrypted
files into your working directory. The reason non-deterministic encryption
(what GPG does) does not work very well here is because the same file
is transformed to a different ciphertext each time it is encrypted, doing
a ‘git status’ always shows the pulled files at ‘modified’, even though
a ‘git diff’ shows no difference. Checking in such ‘modified’ files only
replaces the old ciphertext with a new one which decrypts to the same
file. If you work in two different local repositories synced to the same
remote, the push/pull process will never end even if nothing is changed
in your working directories. Using AES ECB mode with a fixed salt, although
not semantically secure, resolves this problem while providing reasonable
confidentiality.

From now on, you can work in the local repositories, push to or pull from
the remote repository as usual, without noticing the encryption/decryption
in the background.

I think this is cool.

##############################################################
References
##############################################################

[1] “Git: the fast version control system”. http://git-scm.com/

[2] “Dropbox homepage”. http://www.dropbox.com/

[3] “Version control with Git and Dropbox”.
https://syncom.wordpress.com/2010/05/06/%E7%89%88%E6%8E%A7/

[4] “OpenSSL: Cryptography and SSL/TLS Toolkit”. http://www.openssl.org/

[5] “Web discussion: Transparently encrypt repository contents with GPG”.
http://git.661346.n2.nabble.com/Transparently-encrypt-repository-contents-with-GPG-td2470145.html

Advertisements

One Comment on “加密”

  1. […] 在重新思考外包数据安全的技术实现这个问题之后,我意识到除了可以在应用层达到这个目的,也可以借助系统工具,例如开源软件TrueCrypt,在更低层实现同样的效果。TrueCrypt的虚拟加密磁盘文件功能能够用文件系统里的一个常规文件来虚拟一个加密的文件系统。由于TrueCrypt在Windows、Mac OS X还有Linux上都可以运行,将这个文件云端共享(例如,利用Dropbox),在各个系统上都可以随时挂(mount)上使用。很显然,将git的相关文件存入这个云端共享的包含加密文件系统的文件,就从另一个角度解决了过去讨论过的版本控制数据外包安全的问题。 […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s