# System Circuit Breakers

A problem every online system needs to contend with, is bringing back a failed resource server, such as a database, or a payment server, after that server has gone down, and without stopping the entire system. Queued requests hitting a server as it is coming up, can cause it to go down again. This is sometimes known as the “Fire hose effect”. Often network admins are forced to redirect all site requests to a “Server offline for scheduled maintenance” page or similar, frustrating users unnecessarily.

A single failed server can bring the entire system to a crawl, as requests are hanging, waiting for network timeouts, and more and more service requests being queued.

Most online systems run in a typical 90/10 mode, whereby 90% of all requests are for cached data or data from memoized functions, and about 10% are requests that invoke back end servers.

The temporary loss of a back-end server should not result in an entire system being unavailable.

One solution to this issue is the Circuit Breaker pattern from Michael T. Nygard book, “Release It!”. The pattern is implemented in two parts, the circuit breaker, and the mechanism for tripping it.

The Circuit Breaker is a function that you call, before invoking the server that is protected with the breaker. The call simply returns a boolean value, indicating whether the server is available or not.

The trick here is that the function does not actually call the server, but rather reads a value from a cache. Any well designed system is going to use a distributed cache such as Memcache, EhCahce or Coherence, so this is the ideal place to store this information. My own systems normally have an internal system cache where I store various counters, so I also store the breaker statuses there. This cache is distributed across all my nodes, so the information is available locally all my code.

By checking before we call, we achieve two things, we fail fast, allowing us to react in a predictable way, and we avoid queuing requests to a downed server. The failures are recorded via JMX counters, and are monitored via a custom MC4J Console.

Implementing the circuit breaker check using an Aspect allows you to add this functionality to any class or group of classes, without changing your code. I prefer to avoid littering my code with system checks like these, and tend to use Aspects instead. The code is in a single file, making it easier to reason about and debug. If, in the future, I decide not to use it anymore, I simply remove the Aspect from the build.

This Aspect will perform the necessary check for all my DAO classes.

After this aspect is woven with my DAO’s byte-code at compile time, any calls that my servers make to their DAO layers, will result in an immediate exception, if the database server is offline. If the server is online, the calls are passed through normally. This check does add a slight amount of overhead to each DAO call, but the benefits, in a production system, will out weigh any performance hits.

So how do we trip the circuit to begin with? This part must be handled separately from the main code. I have a very basic bean that opens a connection to the database, and if it fails, it then updates the cache entry to indicate that the server is offline.

I use the excellent Quartz Scheduler to call this bean once every 3 seconds. Once the circuit has tripped, the only code that will call the database is this bean. When it detects that the server is back online, it will update the system cache entry, thus allowing the woven aspect to permit future DAO calls to proceed.

Manually tripping the circuit can be useful for testing the system’s response to downed servers, and testing the recovery protocols when the breakers are reset.

The Circuit Breaker pattern works well with many types of servers, and is an elegant solution to handling offline servers.

Most developers know that end user passwords should never be stored by the system. Put any three random developers into a room to discuss this issue, and the conversation will invariably revolve around the best hash algorithm to use, and about salting the passwords. Everyone agrees that passwords that are hashed, and not salted prior to hashing, are practically useless. All that remains is convincing people to use a random salt for each password, rather than the same salt for all passwords.

This is the conventional wisdom among most developers, and it’s wrong. Perhaps it wasn’t always wrong, and using a random salt of the same length as the password, for each password, will still give a measure of security, but there is a better way.

Lets look at the problem. Wikipedia, defines a hash function as “A cryptographic hash function is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the (cryptographic) hash value, such that an accidental or intentional change to the data will change the hash value.” So far, so good. There are a wide range of hash algorithms available, including the SHA-0, SHA-1, SHA-2, and Whirlpool families. They are used to generate one way hashes of emails, downloadable files, database records, and documents. Even git uses them to tag code commits. All of these applications use one way hashes to ensure the integrity of their data, and this is the prime reason for using hashes. Due to their widespread use, their implementations tend to be fast. And therein lies the problem.

As Moore’s Law and multi-core architecture continues to provide us with faster and faster hardware, cryptography becomes more secure, faster, and more widespread. Its can also stay ahead of potential attackers of secure systems. In all areas but one, user generated passwords.

Most users still use a six to eight character password with low entropy, and this practice is often enforced by large e-commerce sites, and even some financial establishments. Further compounding this problem is the fact that most users, use the same password for all sites. As most sites use an email address as a user ID, a single hacked database can wreck havoc on your customers lives.

The attack vector to most databases goes through the application server, so using a single salt for all your passwords is not secure. With the enormous amount of cheap computing power available from cloud providers, even generating random salts and storing them with each password, is not future proof.

When it comes to securing passwords, speed is your enemy. You need an algorithm that is slow, and secure. Slow seem counter intuitive, but the time it takes to encrypt the password is in proportion to the time it takes to attack it. Niels Provos and David Mazière describe one such method in their paper, A Future-Adaptable Password Scheme.

bcrypt is built on a modified version of the Blowfish block cipher algorithm, and for hashing, uses its keying schedule.

bcrypt begins by generating a random salt of length 128 bits, and using this salt and the passed in cost parameter, derives a symmetric encryption key of length 192 bits. This key is then used to encrypt a well known string. The resulting cipher text is then concatenated to a string containing the bcrypt version number, the cost parameter (as a power of two), and the salt and cipher text.

To reverse the process, bcrypt breaks the key and salt out, and using the plaintext password, generates a new cipher text. This is then compared to the stored text. If they match, then the passwords match.

Using the Blowfish algorithm as the base block cipher is inspired, as the algorithm is well studied, and any flaws in bcrypt should also be in Blowfish.

By specifying the cost parameter, the developer can match the speed of the algorithm to the speed of the latest processors, thereby future proofing it.

bcrypt is the standard encryption used on OpenBSD.

The full details of bcrypt including the cost function, and comparisons to other algorithms can be found in Niels and David’s paper, but as an implementor of a login system that collects passwords, all you need to do is swap out your current hash function for bcrypt.

And from this, in Smalltalk

to this

If you application data may be exposed to other internal or customer applications, then additional measures may be taken. Generate and store, within your applications a large random nonce. Use this nonce as ‘pepper’ to the user’s password and generate a HMAC from the pair. Then use Bcrypt to hash the resulting digest. Now even if the database is compromised, the hashed passwords will be of little use.

All developers who collect user passwords and email addresses have a responsibility to secure this data. The issue is not what an attacker can do to your site with the data, but what he can do elsewhere.

# Fallacies of Distributed Computing

Again and again, I forget one or more of these. Many thanks to Peter Deutsch and Bill Joy.

“Essentially everyone, when they first build a distributed application, makes the following eight assumptions. All prove to be false in the long run and all cause big trouble and painful learning experiences.”

• The network is reliable
• Latency is zero
• Bandwidth is infinite
• The network is secure
• Topology doesn’t change
• Transport cost is zero
• The network is homogeneous

# Building and Distributing Development Virtual Machines With Vagrant

By far the most frustrating part of developing a medium to large size application, is the deployment to test and production servers, that are not configured exactly like your development machine. It’s even more frustrating when working in a team. Everyone is familiar with the “it works on my machine” line. The causes of these problems are usually out of synch libraries and drivers, different config files, or even different patch levels on the OS. Not all team members may be knowledgeable enough to configure the various Web, Proxy, Routing, Application, and Messaging servers that your system requires.

I have been using VMWare images for many years, as part of my development process. Typically I bring up a Ubuntu image, and as I install drivers, and additional servers, I snapshot the images, and begin working on my own servers. I have two major downsides with this, one, I cannot share my configuration with other team members, only a large image file, and two, when I need to deploy to staging servers, I must re-install and build everything anew.

I recently discovered Vagrant, Chef, and VirtualBox. VirtualBox is an OSS VM supervisor originally released by Sun Microsystems, and now supported by Oracle. It behaves like VMWare on a laptop, and allows you to create multiple VMs with different guest OSs. Supported guest systems are Windows, Linux, OpenBSD, OS/2, Solaris, and OpenSolaris. A limitation of VirtualBox is a 4GB limit per VM. Vagrant ignores this limitation, but you should not.

Vagrant describes itself as, “a tool for building and distributing virtualized development environments.” Its provides a Ruby based DSL for configuring and building you custom VMs. Full SSH access is available to all the VMs, and port forwarding can be configured to allow local apps to access your VMs and allow multiple VMs to talk to each other. Shared folders between the host and the guest OSs allow you to compile and deploy directly to the VM.

Chef is an OSS systems integration framework which automates tasks through programmable cookbooks. Each cookbook comprises the recipes, resources, templates and additional files that are need to install, configure, and start a server. Opscode, the company behind Chef, hosts a repo of Chef cookbooks for major open source projects. Cloning the git repo of cookbooks is the place to start, before writing your own cookbooks. Chef can be used to build EC2 instances, VMWare images, and of course, VirtualBox images. Chef can be deployed in server mode, where it centrally manages company-wide cookbooks, but for development teams, Chef Solo, which runs locally, is enough.

To get started, install VirtualBox

Install Vagrant

Vagrant uses the term ‘box, to describe a packaged Vagrant environment. This is a file that will build a base VM for VirtualBox. A nice repos of available boxes can be found at Vagrantbox.es, but for now we will use the 32bit Ubuntu based box available from the Vagrant site.

This installs the box to ~/.vagrant/boxes. Additional boxes can also be installed here, and used as bases for your VMs. To test what we have so far we do the following :-

After a few minutes, the VM should be build and loaded. We now have a fully functioning Ubuntu VM running in the background. We can see what VMs are loaded by

and then ssh into the running VM with

The current VM is not very interesting, but will now serve as a base to create some development VMs. Before creating our own recipes, its a good idea to begin to clone or fork the opscode repo on GitHub,

This will give us cookbooks for most of the OSS projects that we will need. (A number of other collections are available on GitHub, including one from 37Signals) In our test-vm directory, we create a sub-directory called cookbooks, and copy to there to the following cookbooks

• build-essentials
• apt
• java
• activemq
• daemon-tools
• postgresql

When we did the original vagrant init, Vagrant created a file called Vagrentfile in our root directory. This file describes the VM that we are to create, tells Vagrant where our cookbooks are located, and loads the cookbooks via Chef. For this test project, I want to create two VMs, one for a Postgres database, and one running ActiveMQ, and my own servers. I also want to assign each VM its own IP, so we can configure our servers to talk to the DB etc. The initial file looks like this :-

We begin by naming the base box that we want to use, and then configure two VMs based on this box. Each VM is assigned a shared directory with the host OS, its own IP address, a name and the memory requirements. Port forwarding is also enabled. In the chef_sole block, we tell Chef were to locate our cookbooks, and what cookbooks to load. In this case, we only load one for each VM. These cookbooks in turn will load all the other cookbooks required to configure the VMs.

For creating and provisioning cookbooks to a Chef server, Chef provides a tool called knife. Knife is used when managing cookbooks on a Chef server, and is not used when using Chef-Solo. However we can use it to create an initial template for one of our custom cookbooks.

Create a cookbook with knife

This will create the necessary directory structure, and include some default files. The first one to look at is the default.rb file in the recipe directory. In this file I include the standard recipes that I use, my custom recipes for my servers, and instructions to install some standard packages that I like to have on all my development servers. So a basic recipe for my prospera core server would look like this

This will install the apt package manager, the GNU build essentials, the latest Java, my own MQ and Core servers, and finally configure and installed standard packages such as zsh, wget, curl etc.

My MQ server is ActiveMQ with custom config files, and startup scripts. I normally keep my current development builds stored as tarballs on an EC2 instance that my team can access. So my recipe to install the MQ server will first create a user on the VM, download and extract the tarball, and finally install it as a daemon using the excellent daemontools package. Rather than hard-coding the version numbers etc. in a recipe, you can use a Chef attributes file for all configurable data, and Chef can access it when evaluating each recipe. The complete script is here :-

Now we can do a vagrant up and Vagrant will rebuild and load both VMs. We can then access each one by calling vagrant ssh. As you move from project to project, a vagrant down will stop the VM, and a vagrant destroy will delete the image, but of course, a vagrant up command, will recreate it. When re-visiting old projects, it nice to be able to recreate the exact environment that you were working on when you left it last.

The combination of Vagrant, Chef, and VirtualBox make creating custom development VMs an easy task and more importantly, a repeatable and consistant way to distribute your VMs, and to recreate custom VMs for each of your projects. As usual, all the files are stored in a git repo, and can be forked and tracked by team members, as need be.

# Dot Underscore

I recently helped a friend with the horrible dot underscore issue that occurs when you use the OSX finder to copy files, and then later put those files into a tarball and copy then to a file system that does not support the HFS+ attributes. When Finder moves or copies files, it preserves all the extended file attributes, resource fork, flags, dates etc., for those files. However if the destination volume does not support these, it creates a dot underscore file using the original file’s name, and places them there. e.g. start.sh will have a ._start.sh companion.

As support for this kind of system meta-data is implemented in the BSD layer, it also effects programs like tar and cp. Normally this isn’t a problem, and the extended attributes are very useful, but when copying a tarball to a Linux box, you will end up with each file in the tarball being duplicated with a dot underscore file. Apart from looking ugly and wasting space, these extra files can play havoc with the programs that are trying to read them.

More details are available from this Ars Technica article.

The solution is to export COPYFILE_DISABLE=true in you shell startup script. Versions earlier that 10.4 should use COPY_EXTENDED_ATTRIBUTES_DISABLE=true

# My Daily Git (Part 1)

I track all my projects using Git, and I have, at a minimum two branches, master and dev, in each project. The master branch contains the latest release code, or latest stable beta code. Daily development is done on the dev branch, with other branches created from dev, for new features etc.

Once I’m happy with a new feature or piece of code, I’ll merge that branch back with the dev branch. I’m generally a lot slower merging to master, so in order to make my latest code available to others, I push remote branches to the shared repo also.

A remote branch is created by

Note, if you omit one, git assumes both names are the same. The -u sets up upstream tracking, so that subsequent pulls do the right thing. My normal setup is then to have two remote branches, master and dev, and any number of local branches.

Often when I am starting work on a new feature, I realize that another piece of code needs to be created or modified, before that new feature can properly work. Committing the unfinished changes creates noise in my logs, and based on experience, I know that I may never even need that code, once I start on the other code.

Now I have a clean dev branch again, and can do the prep work that my new feature will need. At any time I can

and see all my stashes. Once I am ready to work with that code again, I commit any changes to the dev branch and

I find myself using git stash quite a lot. The man page gives a description of addition actions that can be performed with git stash.

Working from the command line, I often want to run a quick diff to see what changes I’ve made. In my ~/.gitconfig I have

This puts all the diffs in less, and allows me to quickly move between diffs by using n and p. (Less will have pattern matched the — at the beginning of each diff)

I find less to be slightly annoying when I have less than a page of data to view, and am forced to use q to exit it, and the fact that it clears the screen. In my ~/.zshrc file I have

These options tell less to not clear the screen, and to exit if the data to be displayed is less than a screenful.

I often find that after I make a commit, I immediately make some minor changes to the code, sometimes only a formatting change. Of course, this creates a new set of un-staged changes for git. Committing these changes requires a new log entry, which I don’t want or need. Git allows you to commit changes to the previous commit, resulting in a single commit messages and a single SHA1 ref for all of the changes.

I use this often enough to have an alias, ca, in my ~/.gitconfig.

# OSX Pasteboards

Two useful utilities that I recently discovered in OSX are, pbcopy and pbpaste. They allow you to copy from stdin, on the command line, to the system pasteboard (clipboard). By default the text goes to the general pasteboard, but you can select which of; general, font find or ruler is used. Ruler and Font are used for copying formatting information.

Find is an interesting one, as any text pasted there will not disturb the main pasteboard, but will appear by default in any Find/Search dialog that you open.

On any given day, I find myself copying output in the shell to an email, or into Emacs. The text could be a log tail or a stack trace, either on my laptop or a remote server. To redirect stdout to the pasteboard is just

Sending the pasteboard to stdout is

#!bash
∴ [dev‣2] ~/Dev/repos/git/prospora-core
λ pbpaste
# On branch dev
# Changed but not updated:
#  (use "git add ..." to update what will be committed)
#  (use "git checkout -- ..." to discard changes in working directory)
#
#       modified:   docs/description.html
#       modified:   docs/description.org
#
no changes added to commit (use "git add" and/or "git commit -a")


Even sweeter,

Man page is at pbcopy

# Joan Mitchell

My paintings are titled after they are finished. I paint from
remembered landscapes that I carry with me - and remembered feelings
of them, which of course become transformed. I could certainly never
mirror nature. I would more like to paint what it leaves with me.

# Abstractions, Numbers and Floating Point

Numbers are an abstract concept invented, or discovered by humans. They are the purest of all abstractions. We represent these abstractions by lines in sand, scratches on walls, and marks on paper. We say that these symbols represent these numbers, or rather we should say that. Instead we say that these symbols are these numbers, and therein lies much confusion and mis-representation in both the teaching and use of mathematics.

Instead of teaching Algebra as the manipulation of symbols according to a set of rules, students are constantly trying to assign some meaning to the symbols, and being confounded by the results. Unfortunately, this mindset, is all too common among developers. In this case, we have a conflation between the actual numbers, their representation both by the programming language, and the CPU’s registers, and finally, their display representations.

The classic example,

All language primitives, int, double, float etc., have maximum and minium values they can contain. In Java the maximum number that a signed 32bit variable can hold is 2,147,483,648. Interestingly, this is a larger number than a Java float can handle. (Prefer doubles over floats) The numbers that we are trying to represent in these containers are from infinite sets. Each of these sets have properties associated with them.

The set of real numbers is dense, i.e. there exists an infinite amount of numbers between 0 and 1, it is ordered, infinite and uncountable, and it is a field. Being a field means that it is a group over addition and a group over multiplication. The first axiom of addition is that the number system must be closed. That is, the result of adding any two number must also be a number in that system. The first axiom of multiplication is also that the system be closed, so that the product of any two numbers must be a number from that same system.

Thinking about it that way, its obvious that there are trade-offs being made. What is not so obvious it that the rules for manipulating these symbols are not consistant or correct across these containers. Floating point operations are not associative or distributive. In particular, they are not closed over addition or multiplication.

Associative,

 (3.1415 + 1000.1) - 1000.0
= 1003.2 - 1000.0
= 3.2000


but,

3.1415 + (1000.1 - 1000.0)
= 3.1415 + .10000
= 3.2415.


Small errors cascading up through each successive operation.

 (3.1415+1.0000 * 1040) - 1.0000 * 1040 = 0.0000


and,

 3.1415+(1.0000 * 1040 - 1.0000 * 1040) = 3.1415


Distributive. No floating-point system is distributive.

 .99995 * (1.0001 + 1.0001)
= .99995 * 2.0002
= 2.0001


but

 .99995&times;1.0001 + .99995&times;1.0001
= 1.0000 + 1.0000
= 2.0000.


During the first Gulf War, an American Patriot Missile battery stationed in Saudi Arabia, failed to intercept an incoming Iraqi Scud missile. The Scud struck an American Army barracks and killed twenty eight American soldiers. The cause of the failure was traced to a rounding error in a 24 bit register of the on-board computer of the Patriot.

The system clock counted time in 0.1 of a second, and the software multiplied this by 1/10 to get seconds. 1/10 expands to an infinite decimal number and rounded to 24 bits, the size of the register, gave an error of 0.000000095 decimal for each second. At the time of the failure, the system has been running for 100 hours, leading to a cascading series of these errors, cumulating in an error of 0.34 seconds.

This example from The Joy of Clojure illustrates this error,

In the above code, Clojure represents 1/10 accurately.

This is a deep and complex subject, but it behoves every developer to at least change his or her mental model of numbers to something closer to how a computer stores and manipulates them. In particular, they should remember that computers store numbers in binary, and not decimal. This is why 1/3 is actually 0.33333333333333… Only numbers, whose denominator is a power of 2, are represented correctly.

The classic paper on this is What Every Computer Scientist Should Know About Floating-Point Arithmetic, by David Goldberg

# Renaming a Group of Files With Zsh

Every now and then, we all need to rename a group of files, often by removing a common part of the name, or adding a new extension. These ‘simple’ tasks can be a pain, as the shell will normally expand glob patterns before passing them to a program, when we need each item in the list expanded, and then passed.

I bonded with zsh many moons ago, and no offence to Bash, it’s the only shell I would ever use. It shares the characteristics of a classic novel, each time you return to it, a new window of understanding opens.

I had a group of files that began with juliet-* and I wanted to rename them all to prospera-*.

Consider the elegance of my solution in zsh,

Note: Depending on your .zshrc, you may need to

zmv takes two parameters, a matching pattern, and a string to replace the pattern. So to remove the .sh extensions from a group of shell scripts,

In this example, $1 is the file name with the .sh removed, as$1 is the first matched glob pattern.

Passing -n to zmv will show you what zmv would do, without doing anything. This is handy when testing a new pattern.