Cloud, Software Development, Startups

What I’m Reading – Week of Oct 6th 2013

It’s been a while since I’ve posted to this blog.  Like a lot of people, twitter has become my tool of choice.  I use to share technical links that I think are high quality and relevant to the areas that I work in and follow. Given this, I’ve been thinking what role my blog should have.  While twitter is great for quickly sharing, a blog is a great place to curate the various tweets and create a summary .  As a result, each week I’ll be posting the a summary my favourite technical web links.  The focus will be mainly on cloud computing but with a variety of other interesting things.

So here we go. This week’s reading list is heavily focused on OpenStack.

  • Creating reproducible dev and test environments using OpenStack & CloudEnvy  Talks about CloudEnvy, a simple tool that allows you to define you application in a simple YAML file and will then deploy it to OpenStack.  It’s like Heat but much much simpler.
  • Using Nested Resources in Heat for OpenStack (Havana edition)  Most of the effort in the cloud space has been focused on just building a cloud.  What most of us really need is to deploy or move our applications. Heat is the part of OpenStack that deals with this.   The link talks about some of Heat’s new features.
  • Refactoring Monolithic Rails apps.  Most of us using rails have applications that have grown over time.  When you end up with is apps that take a long time for the automated tests to run.  This video talks about how to refactor your app into multiple smaller apps.
  • Using Docker with RedHat’s Openshift  Details how dotCloud and RedHat are going to work together to allow Docker to work on RedHat/Fedora and how Openshift will support Docker.  As far as I know, this is the first PaaS that is formally supporting Docker.
  • Startup Launching 28 Micro-Satellites Sometimes you hear about a startup doing something so ambitious, you just go wow!  With just $13M, a PlanetLabs will launch 28 micro-satellites to blanket the Earth and provide 7×24 imaging.  Each satellite will cost just $300K.  This is a tiny, tiny fraction of what it normally costs to build a satellite.


What Amazon’s Elastic Beanstalk Can & Can’t Do

What is it?

Today Amazon announced a new service, Elastic Beanstalk. What it is, is a combination of a complete Java stack (other languages supposedly coming later) along with a management wrapper around some of the existing EC2 tools such as load balancing, auto-scaling and monitoring.  This definitely touches a pain point that developers are facing when they deploy their apps to the cloud.  In fact, it’s part of the reason that Heroku and Engine Yard have done so well.   While its pretty easy to setup a server in the cloud and load up your software, a setup that basic normally won’t scale and will not be  easy to manage over time.  As a result, developers either spend a lot of time rolling their own solutions or they pay for a Platform-as-a-Service (PaaS) like the ones mentioned earlier.  What Elastic Beanstalk does is provide developers some of the advantages of a PaaS without the costs and without given up total control of the server.

So let’s dig into Elastic Beanstalk a bit more.  There are several pieces to the service.

  • Java stack that Amazon has put together that is comprised of Linux, Apache, Tomcat and Java. There is even an Eclipse plug-in to integrate Elastic Beanstalk into the IDE.  The stack is pretty standard.  It does not include things like Spring or any other popular variations of the Java stack.  Note, Amazon has indicated there will be other stacks in the future.  In fact, Engine Yard is supposed to be helping to create a Rails stack.
  • Management GUI built into the AWS console that allows developers to setup and configure their application.
  • Use of some the AWS services.  That includes: the load-balancer, auto-scaling and cloud-watch.  S3 is also used to store version of your app and for log consolidation. Normally, you would need to setup these up yourself if you wanted to use them.  Elastic Beanstalk will automatically setup all these services when it deploys your app.
  • Support for application versioning.  Elastic Beanstalk uses S3 to store each of the versions of your application which it can use to either deploy the lastest release or rollback to the previous version.
  • An API to allow developers to build Elastic Beanstalk into their development process.  I can see teams using Hudson or another continuous integration using the API to push new releases/versions to Elastic Beanstalk.

Using Elastic Beanstalk is pretty straightforward.  First you load your application on to S3.  Then you use the AWS Console to create a new application.  The configuration options allow you to specify a variety of options including the S3 bucker where to find your application and the rules for when Amazon should scale your application. At that point, you are able to launch the application.  Elastic Beanstalk will setup the load-balancer, create a EC2 instance, install your application, setup auto-scaling and cloud-watch.  The final thing you need to do is point the domain for your app to the CNAME that Elastic Beanstalk provides.  At this point, your app is live and setup in a configuration that should be pretty easy to monitor and manage.

What’s Missing?

Clearly there is a lot to like about Elastic Beanstalk.   Is there anything that got missed?

First, as previously mentioned, right now the service only supports a Java stack.  For a lot of people using some of the other popular languages (like Ruby, Python and PHP), that is a non-starter.  Hopefully, it won’t be long until Amazon has stacks for the key popular languages.

But the other important area where Beanstalk falls short is in that it is oblivious to databases. I can’t image there are very many applications that don’t depend on some sort of database.  That means that you need to take care of setting up your database yourself.  And since Elastic Beanstalk does not know about your DB, it is not able to monitor it or scale it either.   This means that even though Elastic Beanstalk takes care of your application, you need to find another solution for the database.  To be fair to Amazon, there are a lot different databases that people are using today (like MongoDB, CouchDB, Cassandra, etc).  But I think they should have been able to support MySQL, especially since they already have the Relational Database Service (RDS).

You should also think about is whether you application needs another other than the basic stack.  Does it use Memcached, Redis, or some other piece of software that is not part of the standard stack.  If so, you still need to handle that yourself.  Either you need to put it on another service or figure out some way to automate the installation when your app comes up on a server.   If you look at Heroku, they now have an add-on program that is quickly building a robust list of options.  Amazon needs to consider something similar.

And finally, if you move to using Elastic Beanstalk, you are pretty well committing yourself to keep your app on Amazon.  I don’t see that a big issue but it’s something that you need to keep in mind.

So is it Worth Using?

In short, this is still an important step forward for Amazon.  For developers that have been building their own software stacks and manually deploying it to EC2, this has the potential to take several things off their plate.  Clearly, Elastic Beanstalk will not be 100% of the solution but having the things that Elastic Beanstalk handles taken off a develoepers plate will make things simpler, even if they still need to handle some things themselves.  One thing i forgot to mention is that Elastic Beanstalk is free (although you still pay for the underlying services).  That in itself should get a lot EC2 developers to give it a look.

Cloud, Software Development, System Administration

Using CloudInit to automate Linux EC2 setup

Ubuntu has had a great way to automate the building of your Amazon EC2 server through something called ClouldInit. What this allows for is that during the creation of a new instance, you can also pass in a variety of data needed to setup your server & application. Normally what is passed is a script but there are several other options. The script can be created in bash, perl, python or awk. The script normally installed any packages needed by your app, configures the various services, loads any startup data and finally installs your application. By scripting the setup, you are ensured that your server is 100% built the same way each time you create a new server. As of yesterday, CloudInit can be used by those of you more comfortable with Redhat/Centos as Amazon has announced their own CentOS-based linux AMI image that includes CloudInit. So now, there is a standard way to automate the building of your server, no matter what flavour of Linux you use.

I’ll talk more about CloudInit in a minute but first I wanted to review some of the other options that people have to setup their server instance and why automating using CloudInit is most likely the right tool to use.

  • manual setup. This involves SSH’ing into the instance once it is up and running and manually entering the commands to install your application and its requirements. While this is acceptable as a starting point while you are in development, no application should be deployed into production on a server built this way. If your server ever goes down, you are in for a lot of pain (and stress) when you have to recreate your server on a moments notice during an outage. I’ve seen a fair number of startups using servers built this way. They start with a ‘dev’ server that was hand built and somehow that ends up being the production box. It’s really important teams take the time to rebuilt the production server cleanly before launching.
  • using a pre-built AMI. this is where you manually setup your server and then create a new AMI image from it. Or more likely you are using someone else’s AMI. Ec2onrails is a perfect example of this. The advantage of using pre-built AMIs is that the server always comes up in a known (good) state. This is a big step forward from manually setting up the server. The downside is that if you want to make any changes to the setup, you need to save a new AMI, which is a slow process. And if you are using someone else’s AMI, you may not be able to do so. In this age of agile development, this can be a handicap.
  • capistrano. This is a build tool from the Ruby world. It can be used to deploy non-Ruby apps as well but the scripts must be in Ruby (this may or may not be an issue for you). Overall, there is a lot to recommend about capistrano in that it is also a scripted solution.Only normally Everything is done through your Capfile. This is where you script the setup of your application.
    The way that capistrano works is by SSH’ing into the server instance and running commands. This happens after the server is up and running. Normally, you start the server instance manually and then plugin the IP address or server hostname into your Capfile. The only downside to capistrano is that is runs from the developer’s desktop. Which may be fine for smaller teams. The minute you have a NetOps team, you probably want something that is not tied to a singled developer station.

Instead of the above, you should take a look at using CloudInit. What that lets you do is pass a script to the server instance that is run during the boot process. So how does CloudInit work and what are the key options. CloudInit allows you to use the ‘user-data’ field of the ec2-run-instance command to pass in a variety of information that CloudInit will use to setup your new server instance. The options include:

  • Cloud Config Data This is the simplest use and allows the common tasks like update linux, install packages, set the hostname and load ssh keys. The full set of options are described here. Using config data will perform some core items but will not be enough to bring up your app.
  • Run a script. If you can’t do what you need with the config data, you can create a script that will handle the extra items. The script can be in bash, python or any other language installed. For my servers, I use this to setup the database, load the application from git and register the server with my dynamic DNS provider. In fact, if you prefer, you don’t need to use the config data and can put everything you need in a script. Note, the script is run late in the boot process. This is normally a good thing but if you have a need to run something earlier take a look at Boothooks.
  • upstart job. if you need to, you can provide a single upstart script that will be installed to /etc/init.
  • a combination of the above. Finally, it is possible to combine all of the above. CloudInit supports creating a mime multi-part file that is a combination of any of the above items. There is a helper tool called write-mime-multipart that will take a set of inputs and generate the mime encoded data to pass to user-data. Note, the maximum total size that can be passed to user-data is 16K. If you are above that limit, you can gzip the data or you can move the items to files accessable through a URL and pass the URLs to user-data

As you can see CloudInit is very flexible and should allow you fully automate the building of your servers. Finally, I’ll note that CloudInit is not limited to EC2 but will also work with Ubuntu Enterprise Cloud (UEC).

Happy scripting!!