Monday 7 November 2011

CEPH in Ubuntu 12.04 Precise Pangolin

While at UDS (the Ubuntu Developer Summit), I attended a very interesting session on CEPH. I hadn't even heard of CEPH a week ago but I have to say that I'm impressed.

If you haven't heard of CEPH, let me give you some background:

CEPH is a distributed network storage system originally created by the folks at DreamHost. At it's most basic, CEPH stores small blocks (default 4MB) across a cluster of unreliable nodes. This is common in many modern NoSQL stores these days: no "master node" or single point of failure, easy to add more nodes, replication of data across different nodes.

On top of this basic capability, CEPH adds an S3-compatible object store and an EBS-style block device store.

The distributed block storage support has great potential.  Coming up in Ubuntu 12.04 Precise Pangolin, the block storage driver will be integrated into the kernel and also integrated into KVM.  This provides an impressive capability to run virtual machines with CEPH blockstore root devices, offering resilient storage and snapshot support for home-grown clouds. OpenStack has supported CEPH block stores since the Cactus release.  Ubuntu 12.04 Precise Pangolin will ship with OpenStack Essex which will have the latest CEPH block device storage.

For more information on CEPH, take a look at their website:


Thursday 3 November 2011

What's New for ARM

David Brash, Architecture Program Manager at ARM, gave a presentation at the Ubuntu Developer Summit for Ubuntu 12.04 Precise Pangolin on the future of the ARM processor core.

The big highlight is the ARM v8 core, announced 27-Oct-2011. This a a new core from ARM adding 64 bit capability (AArch64) that is compatible with the existing 32 bit cores (AArch32). The focus of the new core is "power, performance, area, partnership". The core maintains the low power heritage of ARM and maintains the momentum of the ARM v7. The ARM v8 core is too new to be supported by Ubuntu 12.04 Precise Pangolin but expect full support very soon in the Linux kernel.

Another impressive update is the release of the ARM A7 and A15 processors. These are 32-bit v7 cores with 64-bit memory addressing. This allows 1TB of physical address space. The A7 is the most exciting to me. It is touted as "probably the most efficient core from ARM". It halves the performance of the A8 but reduces the power consumption by 6 times. This core should run the kernel in Precise Pangolin but without the advantage of the higher memory addressing.

ARM predicts that a $500 phone today will be available for $100 in a few years. The A7 is targeted at these low end smartphones.

The A7, A15 release also introduces a new concept called big-LITTLE. To meet demands for high performance and low power consumption, big-LITTLE puts an A7 and A15 onto the same die. When the phone is idle, code runs on the A7. When more performance is required, the running software seamlessly migrates to the A15. This combination gives the best of both worlds and ARM claims "up to 70% energy savings on common workloads."

It will be very interesting to see what devices are enabled by these new advancements.

Tuesday 4 October 2011

AWS CloudFront Secure Streaming

This is a followup to an answer I wrote on stackoverflow a few months ago about how to set up signed URLs with AWS CloudFront private streaming.

At the time, the python boto library had limited support for signed URLs and some of the steps were fairly hacky.  Since then, I've submitted some code to boto which will make secure signed URLs much easier. I've rewritten my answer here, making use of the new code. This code requires the 2.1 version of boto. Once version 2.1 of boto is commonly released I will update the stackoverflow answer as well.

To set up secure private CloudFront streaming with signed URLs you need to perform the following steps which I will detail below:
  1. Connect, create your s3 bucket, and upload some objects
  2. Create a Cloudfront "Origin Access Identity" (basically an AWS account to allow cloudfront to access your s3 bucket)
  3. Modify the ACLs on your private objects so that only your Cloudfront Origin Access Identity is allowed to read them (this prevents people from bypassing Cloudfront and going direct to s3)
  4. Create a cloudfront distribution that requires signed URLs
  5. Test that you can't download private object urls from s3 or the signed cloudfront distribution
  6. Create a key pair for signing private URLs
  7. Generate some private URLs using Python
  8. Test that the signed URLs work
Each step show a code snippet to perform that step.  All the snippets are combined into a single script for reference at the end.
1 - Connect, Create Bucket, and upload object
The easiest way to upload private objects is through the AWS Console but for completeness I'll show how using boto.  Boto code is shown here:
import boto

#credentials stored in environment AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
s3 = boto.connect_s3()
cf = boto.connect_cloudfront()

#bucket name MUST follow dns guidelines
new_bucket_name = ""
bucket = s3.create_bucket(new_bucket_name)

object_name = "video.mp4"
key = bucket.new_key(object_name)

2 - Create a Cloudfront "Origin Access Identity"
This identity can be reused for many different distributions and keypairs. It is only used to allow cloudfront to access your private S3 objects without allowing everyone. As of now, this step can only be performed using the API. Boto code is here:
# Create a new Origin Access Identity
oai = cf.create_origin_access_identity(comment='New identity for secure videos')

print("Origin Access Identity ID: %s" %
print("Origin Access Identity S3CanonicalUserId: %s" % oai.s3_user_id)

3 - Modify the ACLs on your objects
Now that we've got our special S3 user account (the S3CanonicalUserId we created above) we need to give it access to our private s3 objects. We can do this easily using the AWS Console by opening the object's (not the bucket's!) Permissions tab, click the "Add more permissions" button, and paste the very long S3CanonicalUserId we got above into the "Grantee" field of a new permission.  Make sure you give the new permission "Open/Download" rights.

You can also do this in code using the following boto script:
# Add read permission to our new s3 account
key.add_user_grant("READ", oai.s3_user_id)

4 - Create a cloudfront distribution
Note that custom origins and private distributions are only recently supported in boto version 2.1. To use these instructions you must get the latest release.

There are two important points here:
First, we are specifying an origin with an Origin Access Identifier. This allows CloudFront to access our private S3 objects without making the S3 bucket public. Users must use CloudFront to access the content.

The Second, is that we are specifying a "trusted_signers" parameter of "Self" to the distribution. This is what tells CloudFront that we want to require signed URLs. "Self" means that we will accept signatures from any CloudFront keypair in our own account. You can also give signing rights to other accounts if you want to allow others to create signed URLs for the content.
# Create an Origin object for boto
from boto.cloudfront.origin import S3Origin
origin = S3Origin("" % new_bucket_name, oai)

# Create the signed distribution
dist = cf.create_distribution(origin=origin, enabled=True,
                              comment="New distribution with signed URLs")

# Or, create a signed streaming distribution
stream_dist = cf.create_streaming_distribution(origin=origin, enabled=True,
                              comment="New streaming distribution with signed URLs")

5 - Test that you can't download unsigned urls from cloudfront or s3
You should now be able to verify:

- - should give AccessDenied
- - should give MissingKey (because the URL is not signed)

6 - Create a keypair for CloudFront
I think the only way to do this is through Amazon's web site. Go into your AWS "Account" page and click on the "Security Credentials" link.  Click on the "Key Pairs" tab then click "Create a New Key Pair". This will generate a new key pair for you and automatically download a private key file (pk-xxxxxxxxx.pem). Keep the key file safe and private. Also note down the "Key Pair ID" from amazon as we will need it in the next step.

7 - Generate some URLs in Python
In order to generate signed CloudFront URLs with boto, you must have the M2Crypto python library installed.  If it is not installed the following commands will raise a NotImplementedError.

For a non-streaming distribution, you must use the full cloudfront URL as the resource, however for streaming we only use the object name of the video file.

#Set parameters for URL
key_pair_id = "APKAIAZCZRKVIO4BQ" #from the AWS accounts page
priv_key_file = "cloudfront-pk.pem" #your private keypair file
expires = int(time.time()) + 300 #5 min

# For a downloading (normal http) use the full name
http_resource = 'http://%s/video.mp4' % dist.domain_name # your resource
# Create the signed URL
http_signed_url = dist.create_signed_url(http_resource, key_pair_id, expires, private_key_file=priv_key_file)

# For a streaming (rtmp) distribution use just the base filename
stream_resource = "video"
# Create the signed URL
stream_signed_url = stream_dist.create_signed_url(stream_resource, key_pair_id, expires, private_key_file=priv_key_file)

# Some flash players don't like query params so we have to escape them
def encode_query_param(resource):
    enc = resource
    enc = enc.replace('?', '%3F')
    enc = enc.replace('=', '%3D')
    enc = enc.replace('&', '%26')
    return enc

stream_signed_url = encode_query_param(stream_signed_url)

print("Download URL: %s" % http_signed_url)
print("Streaming URL: %s" % stream_signed_url)

8 - Try out the URLs

Hopefully your streaming url should look something like this:


Put this into your js and you sould have something which looks like this:
var so_canned = new SWFObject('','mpl','640','360','9');

Post a comment if you have any trouble.


Sunday 25 September 2011

Playing with (an) Orchestra

I have recently been working on a project using Orchestra.  Orchestra is a great provisioning server for automatically deploying Ubuntu machines on hundreds of servers.  I wanted to play with it a bit before diving in but I didn't have any "bare metal" handy.  This is my virtualized test setup for experiments using my laptop.

Internet (via the vm host)  <-->  Orchestra Server (vm guest)  <-->  Client machines (vm guests)

Build the Orchestra server
First get the ubuntu oneiric iso for our Orchestra server:
$ wget

Next, install the packages we need on the laptop:
$ sudo apt-get install qemu-kvm kvm-pxe

Now build a virtual disk image:
$ qemu-img create -f qcow2 orchestra.img 10G

Launch a virtual machine to install the orchestra server:
$ qemu -m 2047 -hda orchestra.img -net nic,vlan=0 -net user,vlan=0 -redir tcp:5022::22 -redir tcp:5080::80 -net nic,vlan=1 -net socket,vlan=1,mcast= -net dump,vlan=1,file=capture.pcap -cdrom ubuntu-11.10-beta2-server-i386.iso

This boots up a machine with two network cards.  The first (eth0) will use your laptop's network connection to give you a connection to the internet.  The second (eth1) is connected to the virtual switch and will be used to talk to fresh machines that need to be provisioned.

Note, we are also using QEMU's "-redir" command to port forward from our host machine into the vm instance. With the configuration above, host ports 5022 and 5080 are redirected to the orchestra server vm ports 22 and 80 respectively. This will allow us to use ssh and http from our host.

Install the ubuntu server as normal.  Select eth0 as your primary network card.

We're going to set up the orchestra server to also act as our internet gateway for newly provisioned machines.  Please note that the network described below is not secure and should not be used in a production deployment.  Once the server has booted, set up the networking as follows:

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet dhcp

auto eth1
iface eth1 inet static
Uncomment the following line in /etc/ufw/sysctl.conf
Change the default FORWARD and INPUT firewall rule to ACCEPT in /etc/default/ufw
And add the following lines to the TOP of /etc/ufw/before.rules
# nat Table rules

# Forward traffic from eth1 through eth0.


# don't delete the 'COMMIT' line or these nat table rules won't be processed
Now enable the firewall:
$ sudo ufw disable && sudo ufw enable
Reboot the VM to make sure the networking configuration takes effect
$ sudo shutdown -r now
Installing Orchestra
Once the networking is configured, update it and install Orchestra:
$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install ubuntu-orchestra-server

Use the following settings:
Password for cobbler user:cobbler (or any other password, just dont't forget!)
Boot and pxe server IP address:
Enable Orchestra managed DNS/DHCP:yes
Network range for DHCP clients:,
Default gateway for dhcp clients:
Domain name for DHCP clients:<leave blank>

Now sync the cobbler server to activate the changes:
$ sudo cobbler sync

First PXE Boot
Now that Orchestra is up and running, let's get right to the good stuff and PXE boot a new VM.

We need a new disk image:
$ qemu-img create -f qcow2 client.img 10G

Now we just need to start it up:
$ qemu -hda client.img -net nic -net socket,mcast= -boot once=nc

The "-boot once=nc" tells qemu to try booting off the network first ("n"), then off the hard disk ("c").

You should get a nice menu on your client VM.  Scroll down to "oneiric-i386-ju ju" and hit enter.  Watch as your new machine is automatically installed!

While you wait for that, have a poke around the web interface for cobbler:

Web Interface
On your host machine, point your browser to:
The username is cobbler and the password is cobbler (unless you set a different password during the install)

Here's a little snippet of python to provision a server via the API (See for full docs)
import xmlrpclib

server = xmlrpclib.Server("http://localhost:5080/cobbler_api")
token = server.login("cobbler","cobbler")
system_id = server.new_system(token)

server.modify_system(system_id, "name","new-machine",token)
server.modify_system(system_id, "hostname","",token)
server.modify_system(system_id, "modify_interface", {
    "macaddress-eth0"   : "10:20:30:40:50:60",
    "ipaddress-eth0"    : "",
    "dnsname-eth0"      : "",
    }, token)

server.save_system(system_id, token)

Now start up a new instance with the MAC address we used above:
$ qemu-img create -f qcow2 client-lucid.img 10G
$ qemu -hda client-lucid.img -net nic,macaddr=10:20:30:40:50:60 -net socket,mcast= -boot once=nc


Saturday 24 September 2011

QEMU Networking

I've been playing with PXE booting and I needed a test setup that I could run from my laptop.  I wanted to create a small isolated network between a few virtual machines.  I generally use virtualbox but for this project I decided to QEMU.

I had just a few requirements:
  • I wanted to record the traffic to pcap files for wireshark analysis.  This is essential for diagnosing PXE failures and for understanding exactly what's going on.
  • I needed to add and remove machines easily without changing the configuration on the other virtual machines.
  • I wanted it to be easy.  No root access, no messing with bridging or tun/tap interfaces.
QEMU Networking
QEMU has a few handy features which make this really easy.  For the full story check out the QEMU Networking page that Mark McLoughlin put together.

Multicast networks allow multiple VMs to communicate with each other just like if they were all connected to a single hub.  Any ethernet frame sent to the network interface on one machine gets sent to all other machines.
-net socket,mcast=
The second feature is the "dump" network type which will dump any packet to a file.
-net dump,file=log.pcap
Because the multicast "hub" sends all frames to all virtual machines you only need to use the "-net dump" on one of the machines and you will capture all packets.

Launching the VMs
Launch the first machine like this:
$ qemu -hda one.img -net nic -net socket,mcast= -net dump,file=log.pcap
And all other machines like this:
$ qemu -hda two.img -net nic -net socket,mcast=
Multiple NICs
If you have more than one network interface on a machine, you must use the "vlan" option to make sure the options are applied to the correct interface.

If I wanted a gateway VM with eth0 connected through the host to the internet, and eth1 connected to all other virtual machines, it would be launched like this:
$ qemu -hda one.img -net nic,vlan=0 -net user,vlan=0 -net nic,vlan=1 -net socket,vlan=1,mcast= -net dump,vlan=1,file=log.pcap
Live capture in Wireshark
If you'd like to see the packet capture in realtime you can use mkfifo to create a FIFO to stream the packets into wireshark's live capture display.  Set it up like this:
$ mkfio live.pcap
$ wireshark -k -i live.pcap &
$ qemu -hda one.img -net nic -net socket,mcast= -net dump,file=live.pcap