March 27, 2018

Simply scalable Pritunl VPN deployments

Simply Scalable Pritunl VPN Deployments | Mixmax

Being secure

Every security minded organization knows the need for a secure manner to access their private networks, but even in this modern “Infrastructure as a Service” world, VPNs often have to be built manually. When they work well, no one knows that they’re there. When there is even the slightest issue though, everyone notices - accessing internal portals takes an appreciable amount of time due to large latency spikes, teams have difficulty interacting on private resources due to flakey connections... it’s not a pretty world. To ensure these issues never arise, VPNs either need to be oversized or they need to be able to autoscale - they must be highly available (HA). Today we’re going to talk about autoscaling Pritunl - our preferred VPN solution at Mixmax.

Pritunl

We love Pritunl at Mixmax - it’s relatively simple to setup and it’s built to be highly available. It also has single sign on, which makes getting users set up with their credentials much easier than with OpenVPN. It’s also more secure than OpenVPN’s alternative, because Pritunl will create temporary, authorized download links for users to retrieve their personal credentials, whereas in normal OpenVPN deployments credentials have to be shared in some manner (via USB, email, etc). Pritunl also has built in auditing of user activity as well as visualization of the load on your deployment.

All of this sounds great, so what’s the problem?

The problem

While deploying an HA Pritunl configuration is much easier than other systems, it’s still a manual process. In addition to that, due to the manual nature of adding new nodes to the cluster, Pritunl can’t easily autoscale out of the box. While this is fine for most users, we wanted a VPN solution that was as hands off as possible. In order to be able to do this, there were a few problems to solve:

  • We need to know the correct Mongo URI for the Pritunl node to start up with, otherwise it won’t be able to identify other nodes to coordinate with.
  • We need to register the new host as part of our server set that defines the Pritunl nodes, otherwise any new nodes won’t be able to register themselves to accept user traffic.
  • We need to disable the Source/Destination check on the EC2 instances, otherwise they would refuse to proxy network traffic.

Good thing we like solving problems!

UKkes2qN2T70s

Getting creative

Let’s walk through how we solved the previous problems in the user data template file that every new Pritunl node starts with.

Bootstrapping the necessary data

Bootstrapping data is a difficult problem, or rather, it’s a difficult problem if you don’t use a secret management system. Here at Mixmax, we use Vault for storing secrets and auditing access to them. As such, we were able to use Vault in order to retrieve three sensitive credentials that each node needs during its initial boot sequence (which we run as the instance’s user-data).

# Retrieve the Vault binary for our platform.
wget https://releases.hashicorp.com/vault/0.9.3/vault_0.9.3_linux_amd64.zip

# Unzip the downloaded zip file to access the `vault` binary.
unzip vault_0.9.3_linux_amd64.zip

# Move the binary into location known to our $PATH.
sudo cp vault /usr/local/bin/

# Get the instance's PKCS7 signed document.
pkcs7=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/pkcs7 | tr -d '\n')

# Make sure we know where the correct Vault is.
# Note that these variables are passed in via our Terraform template file provider.
export VAULT_ADDR=${vault_addr}

# Authenticate to Vault.
result=$(./vault write -field=token auth/aws/login role=${pritunl_node_role} pkcs7=$pkcs7)

# Now we can login with the token.
./vault login $result

# Let's next grab the Mongo connection URI to join the cluster.
mongo_uri=$(./vault read -field=value secret/${mongo_uri_location})

# Once connected to the cluster, we'll need to register this server so
# get the API token and secret.
apiToken=$(./vault read -field=value secret/${api_token_location})
apiSecret=$(./vault read -field=value secret/${api_secret_location)

Once we have the necessary credentials, we need to tell our local Pritunl service about the Mongo URI.
# We need to stop the service before we modify the Mongo connection URI.
sudo stop pritunl

# Time to join the cluster!
sudo pritunl set-mongodb $mongo_uri

# Now that we've modified the Mongo connection URI, let's restart the server.
sudo start pritunl

Awesome! Now our node knows how to communicate and learn about all other nodes in our deployment.

Registering the host for work

Now that our node is connected to the rest of our deployment, we need to register it as able to accept network traffic. Thankfully Pritunl has an API that will allow us to do this.

# We also need to figure out the ID of the host that we're on so we can register it.
# Use a script to install Mongo 3.4.
# We use it instead of inlining it here as we don't need to make inline changes
# to it.
bash <(curl -s https://gist.githubusercontent.com/ttacon/98da5515e1662441c7093d83386cd610/raw/dc100a899dd0bbbaa7af6c47a3c3cad96d2afd8c/install-mongo-tools.sh)
cat <<EOF > host.js
db = db.getSiblingDB('pritunl');
hostname = '$(hostname)';
host = db.hosts.findOne({ hostname }, { _id: 1 })
print(host._id)
EOF

# Run the script to get the `hostId`
hostId=$(mongo --quiet $mongo_uri host.js | tail -1)

# Create a python script that we'll use to add the host to the known server
# block.
cat <<EOF > setup.py
import sys, argparse, requests, time, uuid, hmac, hashlib, base64
BASE_URL = '${vpn_base_url}'
API_TOKEN = '$apiToken'
API_SECRET = '$apiSecret'

# Setup known arguments.
parser = argparse.ArgumentParser(prog='pritunl-host-modification (phm)')
parser.add_argument('--host', help='The host to either remove or add to the server block')
parser.add_argument('--action', help='Either to add or remove the host from the server block')

# Parse the arguments from the command line.
args = parser.parse_args()


def auth_request(method, path, headers=None, data=None):
    """ Makes an authorized HTTP API request to our Pritunl server for the given path and method. """
 
    # Create the auth params that we'll need in order to sign the request.
    auth_timestamp = str(int(time.time()))
    auth_nonce = uuid.uuid4().hex
    auth_string = '&'.join([API_TOKEN, auth_timestamp, auth_nonce,
        method.upper(), path])
    auth_signature = base64.b64encode(hmac.new(
        API_SECRET, auth_string, hashlib.sha256).digest())
    auth_headers = {
        'Auth-Token': API_TOKEN,
        'Auth-Timestamp': auth_timestamp,
        'Auth-Nonce': auth_nonce,
        'Auth-Signature': auth_signature,
    }

    # Any any extra headers that were passed in.
    if headers:
        auth_headers.update(headers)

    # Make the request.
    return getattr(requests, method.lower())(
        BASE_URL + path,
        verify=True,
        headers=auth_headers,
        data=data,
    )

# Seatbelts for script usage.
if not args.host:
    print 'Must provide a host identifier'
    sys.exit(1)

# Allow both the ability to add and the ability to remove hosts to and from
# server blocks.
if args.action == 'add':
    print 'Adding host "{}" to the server block'.format(args.host)
    response = auth_request(
        'PUT',
        '/server/${server_id}/host/{}'.format(args.host),
        data={
            'id': args.host,
            'server': '${server_id}'
        }
    )
    print response.status_code
elif args.action == 'remove':
    print 'Removing host "{}" from the server block'.format(args.host)
    response = auth_request(
        'DELETE',
        '/server/${server_id}/host/{}'.format(args.host)
    )
    print response.status_code
else:
    print 'Must provide an action of either add or remove'
    sys.exit(1)

EOF


# HACK: occasionally the servers take a few seconds to propagate the changes
# via Mongo :(
sleep 10

# Add the node to the server block.
python setup.py --host $hostId --action add

Perfect, now our host can accept traffic as part of our VPN.

Disabling the source/destination check

Lastly, we need to disable the source/destination check all EC2 instances in AWS start up with by default.

# Lastly, we need to disable the source/dest check for this instance.
# We need to do this for any nodes that need proxy network traffic that isn't
# specifically for that node (i.e. VPN nodes and NAT nodes).
instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
aws ec2 modify-instance-attribute --no-source-dest-check --instance-id=$instance_id --region=us-east-1

Et Voila!

With those three steps you’ve got all you need to be able to setup an autoscaling group in AWS that can register new nodes as you need to scale up and scale down! In the near future, we’re also hoping to open source the Terraform module that we use for this at Mixmax so others can use it as well!

Enjoy working on problems that you can't copy-paste a solution for? Drop us a line.

You deserve a spike in replies, meetings booked, and deals won.

Try Mixmax free