Terraform, Docker Swarm, and AWS

This is a guide to using Terraform to create docker swarm clusters (swarm mode, not swarm engine) in AWS. The goal that I started out with was to have a single terraform configuration set that would automatically bring up a docker swarm cluster. I've also added some example configuration for lighting up services within that cluster once it is created.

Requirements

Before you start, you'll need both packer and terraform installed locally.

$ terraform --version
Terraform v0.7.10
$ packer --version
0.12.0

You'll also need AWS credentials either set as environment variables for both terraform and packer to read or configured with the AWS command line tool.

Lastly, this guide assumes you have an SSH key in the environment you are creating the cluster in. Please check for references to the foo SSH key in the configuration in this guide and replace it with your own.

Terraform Variables

To start, create a file called variables.tf and put the following configuration inside of it. In this guide, I'm using us-west-2, so be aware of where that region is referenced.

variable "aws_region" {
  description = "AWS region to launch servers."
  default     = "us-west-2"
}

variable "vpc_key" {
  description = "A unique identifier for the VPC."
  default     = "nickg"
}

variable "cluster_manager_count" {
    description = "Number of manager instances for the cluster."
    default = 1
}

variable "cluster_node_count" {
    description = "Number of node instances for the cluster."
    default = 3
}

VPC Configuration

I like to use VPCs as a way of isolating environments and resources. The first thing I do is create some configuration for all of the vpc/environment infrastructure. VPCs also provide some added security and force you to think through access controls. Using Terraform, creating VPC configuration is relatively painless.

provider "aws" {
  region = "${var.aws_region}"
}

resource "aws_internet_gateway" "main" {
  vpc_id = "${aws_vpc.vpc.id}"

  tags {
    Name = "${var.vpc_key}-ig"
    VPC = "${var.vpc_key}"
    Terraform = "Terraform"
  }
}

resource "aws_network_acl" "network" {
  vpc_id = "${aws_vpc.vpc.id}"
  subnet_ids = [
    "${aws_subnet.a.id}",
    "${aws_subnet.b.id}",
    "${aws_subnet.c.id}"
  ]

  ingress {
    from_port = 0
    to_port = 0
    rule_no = 100
    action = "allow"
    protocol = "-1"
    cidr_block = "0.0.0.0/0"
  }

  egress {
    from_port = 0
    to_port = 0
    rule_no = 100
    action = "allow"
    protocol = "-1"
    cidr_block = "0.0.0.0/0"
  }

  tags {
    Name = "${var.vpc_key}-network"
    VPC = "${var.vpc_key}"
    Terraform = "Terraform"
  }
}

resource "aws_route_table" "main" {
  vpc_id = "${aws_vpc.vpc.id}"

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = "${aws_internet_gateway.main.id}"
  }

  tags {
    Name = "${var.vpc_key}-route"
    VPC = "${var.vpc_key}"
    Terraform = "Terraform"
  }
}

resource "aws_route_table_association" "a" {
  route_table_id = "${aws_route_table.main.id}"
  subnet_id = "${aws_subnet.a.id}"
}

resource "aws_route_table_association" "b" {
  route_table_id = "${aws_route_table.main.id}"
  subnet_id = "${aws_subnet.b.id}"
}

resource "aws_route_table_association" "c" {
  route_table_id = "${aws_route_table.main.id}"
  subnet_id = "${aws_subnet.c.id}"
}

resource "aws_subnet" "a" {
  vpc_id = "${aws_vpc.vpc.id}"
  cidr_block = "${cidrsubnet(aws_vpc.vpc.cidr_block,8,1)}"
  availability_zone = "${var.aws_region}a"
  map_public_ip_on_launch = false

  tags {
    Name = "${var.vpc_key}-a"
    VPC = "${var.vpc_key}"
    Terraform = "Terraform"
  }
}

resource "aws_subnet" "b" {
  vpc_id = "${aws_vpc.vpc.id}"
  cidr_block = "${cidrsubnet(aws_vpc.vpc.cidr_block,8,2)}"
  availability_zone = "${var.aws_region}b"
  map_public_ip_on_launch = false

  tags {
    Name = "${var.vpc_key}-b"
    VPC = "${var.vpc_key}"
    Terraform = "Terraform"
  }
}

resource "aws_subnet" "c" {
  vpc_id = "${aws_vpc.vpc.id}"
  cidr_block = "${cidrsubnet(aws_vpc.vpc.cidr_block,8,3)}"
  availability_zone = "${var.aws_region}c"
  map_public_ip_on_launch = false

  tags {
    Name = "${var.vpc_key}-c"
    VPC = "${var.vpc_key}"
    Terraform = "Terraform"
  }
}

resource "aws_vpc" "vpc" {
  cidr_block           = "10.25.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true
  instance_tenancy     = "default"

  tags {
    VPC = "${var.vpc_key}"
    Name = "${var.vpc_key}-vpc"
    Terraform = "Terraform"
  }
}

output "vpc_id" {
  value = "${aws_vpc.vpc.id}"
}

output "vcp_cidr_1" {
  value = "${cidrhost(aws_vpc.vpc.cidr_block,1)}"
}
output "vcp_cidr_sub_1" {
  value = "${cidrsubnet(aws_vpc.vpc.cidr_block,8,1)}"
}

output "vpc_subnet_a" {
  value = "${aws_subnet.a.id}"
}
output "vpc_subnet_b" {
  value = "${aws_subnet.b.id}"
}
output "vpc_subnet_c" {
  value = "${aws_subnet.c.id}"
}

The above configuration belongs in a file named provider.tf. With it, the VPC, along with subnets, routes, and gateways are created and a few variables made available to terraform output.

Next, we'll create a security group for our cluster. Normally, I'd recommend creating multiple security groups that are port specific (i.e. sg.http.tf would be a security group that exports ports 80, 8080, and 443.), but because swarm clusters may have a variety of services running on them, you can easily hit the 5 security group limit on EC2 instances. Instead, put the following configuration into a file named swarm-sg.tf.

resource "aws_security_group" "swarm" {
  name        = "${var.vpc_key}-sg-swarm"
  description = "Security group for swarm cluster instances"
  vpc_id      = "${aws_vpc.vpc.id}"

  ingress {
      from_port   = 2375
      to_port     = 2377
      protocol    = "tcp"
      cidr_blocks = [
        "${aws_vpc.vpc.cidr_block}"
      ]
  }

  ingress {
      from_port   = 7946
      to_port     = 7946
      protocol    = "tcp"
      cidr_blocks = [
        "${aws_vpc.vpc.cidr_block}"
      ]
  }

  ingress {
      from_port   = 7946
      to_port     = 7946
      protocol    = "udp"
      cidr_blocks = [
        "${aws_vpc.vpc.cidr_block}"
      ]
  }

  ingress {
      from_port   = 4789
      to_port     = 4789
      protocol    = "tcp"
      cidr_blocks = [
        "${aws_vpc.vpc.cidr_block}"
      ]
  }

  ingress {
      from_port   = 4789
      to_port     = 4789
      protocol    = "udp"
      cidr_blocks = [
        "${aws_vpc.vpc.cidr_block}"
      ]
  }

  ingress {
      from_port   = 80
      to_port     = 80
      protocol    = "tcp"
      cidr_blocks = [
        "0.0.0.0/0"
      ]
  }

  ingress {
      from_port   = 443
      to_port     = 443
      protocol    = "tcp"
      cidr_blocks = [
        "0.0.0.0/0"
      ]
  }

  ingress {
      from_port   = 22
      to_port     = 22
      protocol    = "tcp"
      cidr_blocks = [
        "0.0.0.0/0"
      ]
  }

  egress {
      from_port   = 0
      to_port     = 0
      protocol    = "-1"
      cidr_blocks = [
        "0.0.0.0/0"
      ]
  }

  tags {
    Name = "${var.vpc_key}-sg-swarm"
    VPC = "${var.vpc_key}"
    Terraform = "Terraform"
  }
}

output "sg_swarm" {
  value = "${aws_security_group.swarm.id}"
}

In the above configuration, there are a handful of rules defined.

  • The egress (outbound) rule allows for instances in the security group to make outgoing connections to any IP addresses on any port.
  • TCP connections made to ports 2375 through 2377 can be made from any instance within the VPC. These ports are used by docker and are insecure, so we want to prevent outside access to them. The reason the insecure docker port is used is explained later in this guide.
  • TCP and UDP connections to 7946 and then TCP connections to 4789 are also possible from instances within the VPC. These ports are used by docker for cluster management.
  • TCP connections to ports 22, 80, and 443 can be made from any IP address.

I'm a fan of using bastion servers, so I'm going to add a few steps to create one. The security group for that EC2 instance allows outbound connections and inbound SSH connections. The following configuration should be put into a file named bastion-sg.tf.

resource "aws_security_group" "bastion" {
  name        = "${var.vpc_key}-sg-bastion"
  description = "Security group for bastion instances"
  vpc_id      = "${aws_vpc.vpc.id}"

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = [
      "0.0.0.0/0"
    ]
  }

  egress {
      from_port   = 0
      to_port     = 0
      protocol    = "-1"
      cidr_blocks = [
        "0.0.0.0/0"
      ]
  }

  tags {
    Name = "${var.vpc_key}-sg-bastion"
    VPC = "${var.vpc_key}"
    Terraform = "Terraform"
  }
}

output "sg_bastion" {
  value = "${aws_security_group.bastion.id}"
}

A quick run of terraform plan should show 12 resources to be added.

$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but
will not be persisted to local or remote state storage.


The Terraform execution plan has been generated and is shown below.
Resources are shown in alphabetical order for quick scanning. Green resources
will be created (or destroyed and then created if an existing resource
exists), yellow resources are being changed in-place, and red resources
will be destroyed. Cyan entries are data sources to be read.

Note: You didn't specify an "-out" parameter to save this plan, so when
"apply" is called, Terraform can't guarantee this is what will execute.

+ aws_internet_gateway.main
...
+ aws_network_acl.network
...
+ aws_route_table.main
...
+ aws_route_table_association.a
...
+ aws_route_table_association.b
...
+ aws_route_table_association.c
...
+ aws_security_group.bastion
...
+ aws_security_group.swarm
...
+ aws_subnet.a
...
+ aws_subnet.b
...
+ aws_subnet.c
...
+ aws_vpc.vpc
...
Plan: 12 to add, 0 to change, 0 to destroy.

Our 12 resources are ready and the next step is to run terraform apply.

$ terraform apply
...
aws_route_table_association.a: Creation complete
aws_route_table_association.b: Creation complete
aws_route_table_association.c: Creation complete
aws_network_acl.network: Creation complete

Apply complete! Resources: 12 added, 0 changed, 0 destroyed.

The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.

State path: terraform.tfstate

Outputs:

sg_bastion = sg-11a68368
sg_swarm = sg-1ea68367
vcp_cidr_1 = 10.25.0.1
vcp_cidr_sub_1 = 10.25.1.0/24
vpc_id = vpc-6d8b4b0a
vpc_subnet_a = subnet-ff0def98
vpc_subnet_b = subnet-86dcd8f0
vpc_subnet_c = subnet-f3d3a5ab

Docker AMI

Now, we are going to switch gears a little and turn to packer to make the rest of the process a little easier. The following packer configuration describes the build and setup process for an AMI that is based on Ubuntu 16.10 that has docker installed, configured, and ready to use.

{
  "builders": [
    {
      "ami_name": "docker-swarm ",
      "ami_virtualization_type": "hvm",
      "associate_public_ip_address": "true",
      "instance_type": "t2.small",
      "region": "us-west-2",
      "source_ami_filter": {
        "filters": {
          "name": "*ubuntu-yakkety-16.10-amd64-server-*",
          "root-device-type": "ebs",
          "virtualization-type": "hvm"
        },
        "most_recent": true
      },
      "ssh_username": "ubuntu",
      "subnet_id": "subnet-ff0def98",
      "tags": {
        "OS_Version": "Ubuntu",
        "Release": "16.10"
      },
      "type": "amazon-ebs",
      "security_group_ids": ["sg-11a68368"]
    }
  ],
  "post-processors": null,
  "provisioners": [
    {
      "destination": "/tmp/docker.options",
      "source": "docker.options",
      "type": "file"
    },
    {
      "execute_command": "{{ .Vars }} sudo -E sh '{{ .Path }}'",
      "inline": [
        "apt-get install -y aptitude",
        "aptitude -y update",
        "aptitude install -y docker docker-compose unzip",
        "mv /tmp/docker.options /etc/default/docker",
        "systemctl enable docker.service",
        "usermod -aG docker ubuntu"
      ],
      "type": "shell"
    }
  ]
}

In the above file, docker.json, a new AMI is defined that installs some packages and uploads a docker options file. The reason I do this extra step is because performing package installation for each instance with terraform can be slow and sometimes inconsistent. When using this for your own images, be sure to change the subnet_id and security_group_ids attributes to the outputted vpc_subnet_a and sg_bastion values. The reason we use the bastion security group is packer must be able to remote in via SSH and then fetch packages from package repositories.

DOCKER_OPTS=-H unix:///var/run/docker.sock -H tcp://0.0.0.0:2375

The above block is the contents of the docker.options file. This file is used by the docker daemon on startup to ensure that the daemon is accessible both through the insecure 2375 port as well as the /var/run/docker.sock unix pipe.

Create the AMI using the packer executable:

$ packer build docker.json
amazon-ebs output will be in this color.

==> amazon-ebs: Prevalidating AMI Name...
    amazon-ebs: Found Image ID: ami-cbd276ab
==> amazon-ebs: Creating temporary keypair: packer_5831ec0d-07df-fdcb-a57c-cd4629838918
==> amazon-ebs: Launching a source AWS instance...
    amazon-ebs: Instance ID: i-e884067d
==> amazon-ebs: Waiting for instance (i-e884067d) to become ready...
==> amazon-ebs: Waiting for SSH to become available...
==> amazon-ebs: Connected to SSH!
==> amazon-ebs: Uploading docker.options => /tmp/docker.options
==> amazon-ebs: Provisioning with shell script: /var/folders/w8/pm2dt6252rs86wc8zsn_70sh0000gn/T/packer-shell852678079
    amazon-ebs: Reading package lists...
    ...
==> amazon-ebs: Stopping the source instance...
==> amazon-ebs: Waiting for the instance to stop...
==> amazon-ebs: Creating the AMI: docker-swarm 1479666701
    amazon-ebs: AMI: ami-1a6cc07a
==> amazon-ebs: Waiting for AMI to become ready...
==> amazon-ebs: Adding tags to AMI (ami-1a6cc07a)...
    amazon-ebs: Adding tag: "OS_Version": "Ubuntu"
    amazon-ebs: Adding tag: "Release": "16.10"
==> amazon-ebs: Tagging snapshot: snap-f90080ae
==> amazon-ebs: Terminating the source AWS instance...
==> amazon-ebs: Cleaning up any extra volumes...
==> amazon-ebs: No volumes to clean up, skipping
==> amazon-ebs: Deleting temporary keypair...
Build 'amazon-ebs' finished.

==> Builds finished. The artifacts of successful builds are:
--> amazon-ebs: AMIs were created:

us-west-2: ami-1a6cc07a

The process of creating the AMI can take a few minutes, your mileage may vary.

To make this process even better, create a throw-away VPC with a subnet and security group ahead of time. The instance type of the AMI we are building requires that the instance used to create the AMI inside of a VPC, but it can be any VPC and does not have to be the one we plan on deploying it to.

If it isn't obvious, once you create the AMI for one VPC, you can use the same AMI in other VPC configurations provided they are in the same region.

Bastion Setup

Next we can go back to creating Terraform configuration for the bastion and docker swarm instances.

resource "aws_instance" "bastion" {
    ami = "ami-1a6cc07a"
    instance_type = "t2.small"
    count = "1"
    associate_public_ip_address = "true"
    key_name = "foo"
    subnet_id = "${aws_subnet.a.id}"
    vpc_security_group_ids = [
      "${aws_security_group.bastion.id}"
    ]

    root_block_device = {
      volume_size = 10
    }

    connection {
      user = "ubuntu"
      private_key = "${file("~/.ssh/foo")}"
      agent = false
    }

    tags {
      Name = "${var.vpc_key}-bastion"
      VPC = "${var.vpc_key}"
      Terraform = "Terraform"
    }
}

output "bastion_host" {
  value = "${aws_instance.bastion.public_dns}"
}

Put the contents of the above configuration block into the file bastion.tf. The bastion host is a t2.small because it is only used as a gateway to run commands within the VPC. The bastion host is also a good candidate for installing VPN software (open VPN).

$ terraform plan
...
Plan: 1 to add, 0 to change, 0 to destroy.

The output of terraform plan should confirm that there is one resource being added.

terraform apply
aws_vpc.vpc: Refreshing state... (ID: vpc-6d8b4b0a)
aws_subnet.c: Refreshing state... (ID: subnet-f3d3a5ab)
...
aws_instance.bastion: Creating...
...
aws_instance.bastion: Still creating... (10s elapsed)
aws_instance.bastion: Still creating... (20s elapsed)
aws_instance.bastion: Still creating... (30s elapsed)
aws_instance.bastion: Creation complete

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.

State path: terraform.tfstate

Outputs:

bastion_host = ec2-35-162-132-222.us-west-2.compute.amazonaws.com
sg_bastion = sg-11a68368
sg_swarm = sg-1ea68367
vcp_cidr_1 = 10.25.0.1
vcp_cidr_sub_1 = 10.25.1.0/24
vpc_id = vpc-6d8b4b0a
vpc_subnet_a = subnet-ff0def98
vpc_subnet_b = subnet-86dcd8f0
vpc_subnet_c = subnet-f3d3a5ab

The bastion_host variable should now be available in the output. Next we create our docker swarm cluster.

Swarm Setup

The next and final step is to create the configuration for our swarm managers and nodes. Be sure to update the ami attributes with the id of your own AMI.

resource "aws_instance" "swarm-manager" {
    ami = "ami-1a6cc07a"
    instance_type = "t2.small"
    count = "${var.cluster_manager_count}"
    associate_public_ip_address = "true"
    key_name = "foo"
    subnet_id = "${aws_subnet.a.id}"
    vpc_security_group_ids      = [
      "${aws_security_group.swarm.id}"
    ]

    root_block_device = {
      volume_size = 100
    }

    connection {
      user = "ubuntu"
      private_key = "${file("~/.ssh/foo")}"
      agent = false
    }

    tags {
      Name = "${var.vpc_key}-manager-${count.index}"
      VPC = "${var.vpc_key}"
      Terraform = "Terraform"
    }

    provisioner "remote-exec" {
      inline = [
        "sudo docker swarm init"
      ]
    }

    depends_on = [
      "aws_instance.bastion"
    ]
}

resource "aws_instance" "swarm-node" {
    ami = "ami-1a6cc07a"
    instance_type = "t2.small"
    count = "${var.cluster_node_count}"
    associate_public_ip_address = "true"
    key_name = "foo"
    subnet_id = "${aws_subnet.a.id}"
    vpc_security_group_ids = [
      "${aws_security_group.swarm.id}"
    ]

    root_block_device = {
      volume_size = 100
    }

    connection {
      user = "ubuntu"
      private_key = "${file("~/.ssh/foo")}"
      agent = false
    }

    tags {
      Name = "${var.vpc_key}-node-${count.index}"
      VPC = "${var.vpc_key}"
      Terraform = "Terraform"
    }

    provisioner "remote-exec" {
      inline = [
        "docker swarm join ${aws_instance.swarm-manager.0.private_ip}:2377 --token $(docker -H ${aws_instance.swarm-manager.0.private_ip} swarm join-token -q worker)"
      ]
    }

    depends_on = [
      "aws_instance.swarm-manager"
    ]
}

resource "null_resource" "cluster" {
  triggers {
    cluster_instance_ids = "${join(",", aws_instance.swarm-node.*.id)}"
  }

  connection {
    host = "${aws_instance.bastion.public_dns}"
    user = "ubuntu"
    private_key = "${file("~/.ssh/foo")}"
    agent = false
  }

  provisioner "remote-exec" {
    inline = [
      "docker -H ${element(aws_instance.swarm-manager.*.private_ip, 0)}:2375 network create --driver overlay appnet",
      "docker -H ${element(aws_instance.swarm-manager.*.private_ip, 0)}:2375 service create --name nginx --mode global --publish 80:80 --network appnet nginx"
    ]
  }
}

output "swarm_managers" {
  value = "${concat(aws_instance.swarm-manager.*.public_dns)}"
}

output "swarm_nodes" {
  value = "${concat(aws_instance.swarm-node.*.public_dns)}"
}

The above configuration is put into the file swarm.tf. There are a few tricks here that are worth pointing out.

  • The count attribute is used for both the manager and node instance resources. With it, we can quickly scale up or down the swarm cluster size.
  • The swarm manager is created first by having a depends_on block in the swarm node instance resource. This instructs Terraform to ensure all of the managers are created before attempting to create the node instances. The swarm manager also has a depends_on block that references the bastion instance. To see a graph of what configuration dependencies looks like, you can run terraform graph | dot -Tpng > graph.png provided you have the dot executable installed.
  • When the manager instance is brought up, it will run docker swarm init to initialize a docker swarm and set itself as the manager. With the above configuration, each manager will attempt to initialize a swarm which probably isn't what you want. Instead, the initialization process for managers that are not manager 0 should be to join the first manager. Pull requests are welcome.
  • When a node is brought up, it will attempt to join the cluster as a node. This is done through a little bit of trickery to first get the worker token from the docker manager using the insecure port 2375 and then join that manager on port 2377.
  • Finally, a null resource is used to initialize the services in the newly create docker swarm cluster. In this example, a network is created and the nginx container deployed as a global service. Instead of ssh'ing into the manager or one of the notes, we use the bastion host previously created.

A quick run of terraform plan should show 5 resources being added.

$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but
will not be persisted to local or remote state storage.

aws_vpc.vpc: Refreshing state... (ID: vpc-6d8b4b0a)
...

+ aws_instance.swarm-manager
...
+ aws_instance.swarm-node.0
...
+ aws_instance.swarm-node.1
...
+ aws_instance.swarm-node.2
...
+ null_resource.cluster
...

Plan: 5 to add, 0 to change, 0 to destroy.

If everything looks good, wrap up by running terraform apply.

terraform apply
aws_vpc.vpc: Refreshing state... (ID: vpc-6d8b4b0a)
...
aws_instance.swarm-manager: Creating...
aws_instance.swarm-manager: Still creating... (30s elapsed)
aws_instance.swarm-manager: Provisioning with 'remote-exec'...
...
aws_instance.swarm-manager: Still creating... (1m20s elapsed)
aws_instance.swarm-manager (remote-exec): Connecting to remote host via SSH...
aws_instance.swarm-manager (remote-exec):   Host: 35.161.74.23
aws_instance.swarm-manager (remote-exec):   User: ubuntu
aws_instance.swarm-manager (remote-exec):   Password: false
aws_instance.swarm-manager (remote-exec):   Private key: true
aws_instance.swarm-manager (remote-exec):   SSH Agent: false
aws_instance.swarm-manager (remote-exec): Connected!
aws_instance.swarm-manager: Still creating... (1m30s elapsed)
aws_instance.swarm-manager: Still creating... (1m40s elapsed)
aws_instance.swarm-manager (remote-exec): Swarm initialized: current node (48zhf6irfztflzs8hayvh00de) is now a manager.
aws_instance.swarm-manager (remote-exec): To add a worker to this swarm, run the following command:
aws_instance.swarm-manager (remote-exec):     docker swarm join \
aws_instance.swarm-manager (remote-exec):     --token SWMTKN-1-1bv8qd5uhbbpsnvsldtdzrey9i1zjicdt7tl0vc19rd2gjatua-b4sj9z04bcoxma6y2mqa895b6 \
aws_instance.swarm-manager (remote-exec):     10.25.1.183:2377
aws_instance.swarm-manager (remote-exec): To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
aws_instance.swarm-manager: Creation complete
aws_instance.swarm-node.0: Creating...
...
aws_instance.swarm-node.2: Creating...
...
aws_instance.swarm-node.1: Creating...
...
aws_instance.swarm-node.2: Still creating... (1m20s elapsed)
aws_instance.swarm-node.2 (remote-exec): Connected!
aws_instance.swarm-node.1 (remote-exec): This node joined a swarm as a worker.
aws_instance.swarm-node.1: Creation complete
aws_instance.swarm-node.0 (remote-exec): This node joined a swarm as a worker.
aws_instance.swarm-node.0: Creation complete
aws_instance.swarm-node.2: Creation complete
null_resource.cluster: Creating...
...
null_resource.cluster: Provisioning with 'remote-exec'...
null_resource.cluster (remote-exec): Connecting to remote host via SSH...
null_resource.cluster (remote-exec):   Host: ec2-35-162-132-222.us-west-2.compute.amazonaws.com
null_resource.cluster (remote-exec):   User: ubuntu
null_resource.cluster (remote-exec):   Password: false
null_resource.cluster (remote-exec):   Private key: true
null_resource.cluster (remote-exec):   SSH Agent: false
null_resource.cluster (remote-exec): Connected!
null_resource.cluster (remote-exec): b9wf0tp83vffrvr75e1wyosck
null_resource.cluster (remote-exec): 9r8w1l62ndmed6ezn4j7dfzgg
null_resource.cluster: Creation complete

Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.

State path: terraform.tfstate

Outputs:

bastion_host = ec2-35-162-132-222.us-west-2.compute.amazonaws.com
sg_bastion = sg-11a68368
sg_swarm = sg-1ea68367
swarm_managers = [
    ec2-35-161-74-23.us-west-2.compute.amazonaws.com
]
swarm_nodes = [
    ec2-35-163-94-198.us-west-2.compute.amazonaws.com,
    ec2-35-160-137-242.us-west-2.compute.amazonaws.com,
    ec2-35-163-36-83.us-west-2.compute.amazonaws.com
]
vcp_cidr_1 = 10.25.0.1
vcp_cidr_sub_1 = 10.25.1.0/24
vpc_id = vpc-6d8b4b0a
vpc_subnet_a = subnet-ff0def98
vpc_subnet_b = subnet-86dcd8f0
vpc_subnet_c = subnet-f3d3a5ab

KABOOM BABY

And with that, the cluster is up and the nginx service is running. We can verify that the nginx container has started and is running by making an HTTP request to it on port 80.

$ curl -vvs ec2-35-163-94-198.us-west-2.compute.amazonaws.com | head -n 10
* Rebuilt URL to: ec2-35-163-94-198.us-west-2.compute.amazonaws.com/
*   Trying 35.163.94.198...
* Connected to ec2-35-163-94-198.us-west-2.compute.amazonaws.com (35.163.94.198) port 80 (#0)
> GET / HTTP/1.1
> Host: ec2-35-163-94-198.us-west-2.compute.amazonaws.com
> User-Agent: curl/7.49.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.11.5
< Date: Sun, 20 Nov 2016 19:11:27 GMT
< Content-Type: text/html
< Content-Length: 612
< Last-Modified: Tue, 11 Oct 2016 15:03:01 GMT
< Connection: keep-alive
< ETag: "57fcff25-264"
< Accept-Ranges: bytes
<
{ [612 bytes data]
* Connection #0 to host ec2-35-163-94-198.us-west-2.compute.amazonaws.com left intact
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
...

Misc

I organize my configuration by having folders for VPCs as well as folders for clusters:

vpcs/nickg/
clusters/nickg-consul/
clusters/nickg-stuff/

Each cluster has its own tfstate, but that leads to referencing things like the ID of the VPC the cluster is deployed to. To resolve this, I run the following command to create a tf.json file that can be put into the folder of my cluster configuration.

$ terraform output -json > output.json
$ cd path/to/cluster/config
$ cp ../path/to/output.json vpc-variables.json
$ jo variable="$(cat vpc-variables.json | jq 'with_entries(.value |= {type: .type, default: .value})')" > vpc-variables.tf.json

This uses the jo and jq tools to transform the output json into a format that terraform can use.

If you have any questions, feedback, or see a bug then please email me or message me on Twitter.

Dumb Phone

This is a reminder to myself: You are chained to that smart phone and shouldn't forget it. You aren't chained to a smart phone and shouldn't forget it.

I've been giving some long and hard thought to giving up my iPhone in favor of a dumb phone and simplifying my day-to-day communication tech. The result that I came to was that I'm stuck with it. At this point in my life, these are the things I'm having issue separating with: I started writing this post to convince myself and serve as a reminder that I don't want to give up some conveniences in exchange simplicity. With some time and thought, I realized that I was wrong and now this article's purpose is the opposite. I started creating a list of apps and rituals that I don't want to change or give up, but the list split into two: Things I can't do without and things that would be costly to change. The first list is now empty.

These are the things I can't live without:

  • Google Authenticator -- You may not know this, but the Google Authenticator application can be used for non-Google sites and services like Amazon (AWS) and GitHub. Sure, some 2 factor auth support SMS, but having the app is really convenient.
  • Audible -- I listen to audio books regularly, especially on road trips and when working out. If I got a dumb phone, I'd have to change a reading habit that is important to me.
  • Google Music / Pandora -- I listen to a lot of music, including when I drive and am on road trips.

This second list are the things that I can change, but would be difficult to do or wouldn't be cheap to do so.

  • Smart Things -- Without a smart phone, I'd have to carry around a key fob. Also, to use the routines that I've programmed would require a separate device. Having that wouldn't be the end of the world, but it would be a change. If I had a tablet, I could get away with having a key fob with me (leaving it in my car most likely). I primarily use Smart Things routines when I'm at home, so it wouldn't be that much of a deal.
  • Scannable -- I scan important documents and store them in Evernote. Scannable is a really great app that makes that a lot easier. I don't have a separate device for scanning, so I'd either have to get one or change my filing system. If I had a tablet, like an iPad Mini, this wouldn't be an issue. The change would be small because everything gets filed and organized at home anyway.
  • Starbucks -- I go to Starbucks a lot and I use my phone to pay for stuff. This wouldn't be a deal breaker because I can always carry the card with me, but it would be one more thing to keep in my wallet.
  • Slack -- I work remotely so staying connected is important. We use slack and it is one of the ways that people get in touch with me. After I thought about it, not being 100% available via slack is what I'm actually going for.
  • Google Music / Pandora -- I listen to a lot of music, including when I drive and am on road trips. Not having the level of access that I currently do would be an adjustment, but I think that I could make do. I do use XM too and bringing an iPad on road trips is reasonable. Most of the music I listen to is on my computer anyway. For day to day driving, maybe not being distracted with selecting music isn't a bad thing.
  • Audible -- I listen to audio books regularly, especially on road trips and when working out. If I got a dumb phone, I'd have to change a reading habit that is important to me. The biggest change would be while working out, but I could get used to it.
  • Google Authenticator -- You may not know this, but the Google Authenticator application can be used for non-Google sites and services like Amazon (AWS) and GitHub. Sure, some 2 factor auth support SMS, but having the app is really convenient. After a little investigating, all of the services I use support SMS.

Web applications with TypeScript and Sequelize

I've been using TypeScript professionally for a short while now and enjoy working with it. I think it really smooths out some of the rough parts of developing server software in JavaScript. At Colibri, I've been working on a NodeJS project that is starting to move to TypeScript with some success. One part of that project, the storage subsystem, hasn't been ported yet and I've done some research to best understand how to tackle it.

With that, I created a small proof of concept application that demonstrates how Express web applications that use Sequelize (ORM) on top of Postgres can be written in TypeScript.

https://github.com/ngerakines/express-typescript-sequelize

I think there is a time and a place for tools like gulp and grunt, but most of the time they aren't necessary. For this project, I'm using the scripts block to create a chain of actions. The lint target uses the tslint tool to verify that standards are enforced. That target is dependency of the build target the runs the tsc command that compiles the TypeScript code into JavaScript. The build target is a dependency of the test target that executes the unit tests through mocha and istanbul. After tests are run, that same tool verifies that coverage requirements are met.

The end result is that I use two commands to build and run this application:

$ npm run build && npm start

This application has a handful of dependencies but the main ones are express, sequelize, and TypeScript. I've also gotten pretty used to including bluebird, moment, lodash, and node-uuid by default in everything that do. For template rendering, I'm using dustjs, which I had never used before and find appealing.

The application code is split into three areas:

  • src/index.ts is where the express application is constructed and configured.
  • src/routes.ts contains the definition of the ApplicationController that does all of the request handling work.
  • src/storage.ts contains the storage manager, including the sequelize implementation.

The storage manager is a fairly simple interface that is used to define and interact with the Account and Address objects. For this small example application, I just have the two. An account is a registered site user (register, login, logout, etc) and they have one or more addresses as managed on the settings page. The relationship between accounts and addresses is one to many.

When using the Sequelize library in a TypeScript application, you need to know how the object interfaces are defined and relate to each other. The important thing to know is that for each application object (account, address, etc) there are 3 interfaces that need to be defined: An attribute definition interface, an instance interface, and a model interface.

export interface AccountAttribute {
    id?:string;
    name?:string;
    email?:string;
    password?:string;
}

export interface AccountInstance extends Sequelize.Instance<AccountAttribute>, AccountAttribute {
}

export interface AccountModel extends Sequelize.Model<AccountInstance, AccountAttribute> { }

In the above code block, the AccountAttribute is defined. That object has 4 managed fields including the id (a generated uuid), name, email, and password. That interface is then referenced by the AccountInstance and AccountModel interfaces. The instance interface is used to describe what an instantiated instance of an account object looks like and how it behaves. It includes all of the attributes of the attribute interface but also some sequelize specific methods like updateAttributes and save. The model interface is used to describe how, through sequelize, instances of objects are managed. The model interface includes methods like find and create.

When the sequelize implementation of the storage manager is created, within the constructor the define method is called which binds the schema definition and model interface to an instance of the model and it can be used.

this.Account = this.sequelize.define<AccountInstance, AccountAttribute>("Account", {
        "id": {
            "type": Sequelize.UUID,
            "allowNull": false,
            "primaryKey": true
        },
        "name": {
            "type": Sequelize.STRING(128),
            "allowNull": false
        },
        "email": {
            "type": Sequelize.STRING(128),
            "allowNull": false,
            "unique": true,
            "validate": {
                "isEmail": true
            }
        },
        "password": {
            "type": Sequelize.STRING(128),
            "allowNull": false
        }
    },
    {
        "tableName": "accounts",
        "timestamps": true,
        "createdAt": "created_at",
        "updatedAt": "updated_at",
    });

In the above code block, the private Account variable is set and the schema defined. In the third param I'm instructing sequelize to manage the created at and updated fields, but I'm specifying what the column names should be. Later in that same storage manager implementation, I reference those model instances like this:

register(name:string, email:string, rawPassword:string):Promise<any> {
    return this.sequelize.transaction((transaction:Sequelize.Transaction) => {
        let accountId = uuid.v4();
        return this.hashPassword(rawPassword)
            .then((password) => {
                return this.Account
                    .create({
                        id: accountId,
                        name: name,
                        email: email,
                        password: password
                    }, {transaction: transaction})
            });
    });
}

The rest of the application is pretty standard. The storage manager is created early and used in both creating the express application as well as the application handler. Open an issue on GitHub or message me on twitter @ngerakines if you've got any questions or comments.

Warrant Canaries

Wikipedia defines a warrant canary as:

... a method by which a communications service provider informs its users that the provider has not been served with a secret United States government subpoena.

Practically, this ends up being a file or web location that states something to the effect of, "As of date, we have not received a subpoena." The notice usually includes a disclosure stating that no warrants have been served to the entity or its employees and no searches or seizures have been performed on the entity or the employees of the entity's assets. It will also include a date as to when the notice was updated and may also include links to external websites with time-relevant information such as news articles, major headlines, tweets, etc.

The most important part of the warrant canary is the signature and signed content. All of the above information is cryptographically signed, and the public key made available to verify the signature. The act of signing the notice increases the difficulty in forging a warrant canary.

There are many cases where warrant canaries exist and are used by commercial and non-commercial entities. One of the oldest and well-known instances is the rsync.net warrant canary. Other examples include:

There is, however, speculation that warrant canaries have questionable legal ground or could be used as an effective way to indirectly communicate said legal action by a government agency or court. At this time, there have been no cases where warrant canaries have been upheld. For more information, see the EFF Warrant Canary FAQ.

Creating A Warrant Canary

Creating a warrant canary is a fairly simple process. It requires just a small amount of time to become familiar with tools like GPG. After creating a warrant canary notice, it can be published by anyone with access to your website.

Before You Begin

To begin, you will need to install GPG and create a signing key. Create the signing key by following the official GPG Getting Started guide. A key is only created once and will be used to update your canary in the future; it is crucial that the same key be used for subsequent canary updates.

Creating The Notice

As with previous examples, the notice should contain the disclosures that are most relevant to your needs as well as information and data that can sufficiently be determined as both accurate and time relevant. This often includes the current date, sports scores, weather information, etc. For example:

It is Friday, December 26th, 2014 at 4:50 pm EST.

To this date no warrants, searches or seizures of any kind have ever been performed on my assets or any assets belonging to members of my household.

Headlines from http://www.npr.org/sections/news/archive?date=12-31-2014
Body Of Catholic Priest Found In Southern Mexico
Businesses Buzz With Anticipation In Wake Of U.S.-Cuba Thaw
Military Policy Impedes Research On Traumatic Brain Injuries
In The Nation's Capital, A Signature Soup Stays On The Menu
Already Bleak Conditions Under ISIS Deteriorating Rapidly

Week 16 NFL Scores
Giants 37 Rams 27
Cols 7 Cowboys 42
Bills 24 Raiders 26
Seahawks 35 Cardinals 6

You can verify this document using the public key 953023D848C35059A2E2488833D43D854F96B2E4.

With your notice saved as warrant_canary.txt, sign it with your GPG key.

$ gpg --clearsign warrant_canary.txt

Running this command will create a file named warrant_canary.txt.asc.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

It is Friday, December 26th, 2014 at 4:50 pm EST.

To this date no warrants, searches or seizures of any kind have ever been performed on my assets or any assets belonging to members of my household.

Headlines from http://www.npr.org/sections/news/archive?date=12-31-2014
Body Of Catholic Priest Found In Southern Mexico
Businesses Buzz With Anticipation In Wake Of U.S.-Cuba Thaw
Military Policy Impedes Research On Traumatic Brain Injuries
In The Nation's Capital, A Signature Soup Stays On The Menu
Already Bleak Conditions Under ISIS Deteriorating Rapidly

Week 16 NFL Scores
Giants 37 Rams 27
Cols 7 Cowboys 42
Bills 24 Raiders 26
Seahawks 35 Cardinals 6

You can verify this document using the public key 953023D848C35059A2E2488833D43D854F96B2E4.
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCAAGBQJUndq1AAoJEDPUPYVPlrLk2fYP/jGFb1vxR2sXEu5DzHJU9urd
Q8ia1srhm4UogchTuN6nGv39zlBgpT1H75xwLYYSyiEbjpV7CYPqwYOgZvv8xF5D
hMRGoHu2WE7RCllQr49cKyzro0m9TEWHUt8HLxlaV/Go58Q2i3TbiKo5z0QdlB7B
XXyQSA5ZDFSKqrdMl6oVqHI1dJhM3TRGpxmkrF/mD7RRpdqw0yJKMefqxGRFLavI
Vg8su3XlYgl6xmlL+BAcd0Pc0SiSCH/IIiLbpBrNaWeOFeEnaAbeC4apYn45np5G
jXPQ7+xdfcxmyt+VUSJ9aSw6WxHSYYBR2YhOvnunssCI6dev06Ot3p5+zOkgsFZt
2rqvNFKjp92J/vB8cKCoFi8UwizftcyvrwZHHtzFcLPEg4mhqWQp4DE3ToMOp37o
wieVqWbYhqRDMlFgQGr9Zdx0xPipnz5JwcSeaJuUZTOYUbN2L4w5s25yvCtuyT4p
yac0D+mxoFhG96UuSXsQjtwbiot7Kddt0TeaXzfbR7nk7n9Cv5thEEQlgtoV4Htv
f8jXua2/L3+Cl8j+WM+C9S5lXXR3t3RGy555lYcssDXAAcWsSY4UJasHaVU0vRTu
CqDPfOJmCnqI9Pv7tlP4iBWMkkAVV9ToqyRoM4fIQ41jTDn+ncc52du4M1+LZNJq
2tQPWQHVW8/oQtwo2W7W
=HQN5
-----END PGP SIGNATURE-----

That file is your current warrant canary and should be made available as you see fit. The most common url used to present your canary is "/canary". In this case, the canary is available at http://ngerakines.me/canary.

Next Steps

With your canary online and available, you'll need to be sure that the signing key used to sign the notice is also available. Please refer to the Exchanging Keys and Distributing Keys documentation to export your key to share with others and make available through a GPG key server.

It may also be in your interest to have third parties verify your key and identity. This allows other key owners to demonstrate trust. More information can be found on the GPG documentation: Validating other keys on your public keyring.

Securing Your Canary

When a canary is not updated or is removed, it means that several things may have happened.

The first is simply human error. For safety and security purposes, the act of signing a warrant canary is a manual process. That means that a human has to be at a computer and run the commands to create and sign the warrant canary notice. There are plenty of reasons from sickness to changing companies and even simply forgetfulness that could be the reason why a canary is not updated.

It could also mean that the the entity no longer wants to include the notice. A change in management or ownership could result in the canary is neglected or removed.

Lastly, it could mean that harm, detention or a lack of control is preventing the canary from being updated.

A watchdog or Dead man's switch can be used mitigate damage or loss of data or reputation.

Dead Man Switch: Revocation

Using a revocation certificate, a trusted third party can publicly revoke the key used to cryptographically sign canaries.

A core component of the warrant canary is the signature. The signature is used to determine that the contents of the canary have not been tampered with and provide a way to identify the owner of the signing key through the web of trust. When the GPG key used to sign the canary is created, a revocation certificate should be created along with it.

If you forget your passphrase or if your private key is compromised or lost, this revocation certificate may be published to notify others that the public key should no longer be used. -- The GNU Privacy Handbook

A revocation certificate can be securely given to a trusted third party responsible for publishing the revocation certificate under certain conditions.

Conditions could range from:

  • The warrant canary not being updated after a certain period of time.
  • Unusual behavior or contact with the company.
  • A cue or hint that it should be done so through information contained in a dead drop or press release.

Multiple Signers

As a way to reduce the risk of human error from raising false concern, multiple signers can sign a canary or multiple canaries can be published used. The most common way to do this would be to have two or more members of an organization create signatures of the canary and append it to the notice.

This can be done by creating one or more detached signatures along with the canary.

$ gpg --output canary.sig1 --detach-sig canary

When the above command is run, a file named canary.sig1 is created that contains a signature of the canary file. You can publish these additional signatures along-side the canary or append them to the bottom of the canary file.

IOT and Home automation, 10 months later

In December of 2013 I was given a SmartThings kit and that kicked off a home automation project. I didn't go all-out and try to automate all the things, but instead tackled a single area, specifically my home office. Nearly 10 months later, I hardly think about it but use it every single day.

  • If the office lights are on and I leave the house, the lights are automatically turned off.
  • When I enter the house from being away, turn on the lights in the office.
  • When I'm at home, but there isn't any movement in my office after 15 minutes, turn the lights off automatically.
  • When I'm home, there is movement in my office and the lights are off then turn the lights on.

This started with some z-wave light switches and the SmartThings kit. I use a motion/presence detector (battery powered) that detects movement in the office. The z-wave light switches are used to software control the lights and then my phone is connected to the SmartThings system to determine when I'm nearby.

That isn't the only system in place, but it is the one that I use the most. Additionally, I've got motion sensors in other parts of the house, including on the garage door, and presense fobs in the cars. I also have a nest installed as well as door locks that support z-wave.

So what is next? I'd like to get the rest of the light switches in the house replaced with z-wave switches and find a way to automate the garage door. Having an "away mode" able to turn off all of the lights, lock the doors and set the nest tempature would be nice too. Maybe someday.

Parts: