Cloud, Software Development, Startups

What I’m Reading – Week of Oct 6th 2013

It’s been a while since I’ve posted to this blog.  Like a lot of people, twitter has become my tool of choice.  I use to share technical links that I think are high quality and relevant to the areas that I work in and follow. Given this, I’ve been thinking what role my blog should have.  While twitter is great for quickly sharing, a blog is a great place to curate the various tweets and create a summary .  As a result, each week I’ll be posting the a summary my favourite technical web links.  The focus will be mainly on cloud computing but with a variety of other interesting things.

So here we go. This week’s reading list is heavily focused on OpenStack.

  • Creating reproducible dev and test environments using OpenStack & CloudEnvy  Talks about CloudEnvy, a simple tool that allows you to define you application in a simple YAML file and will then deploy it to OpenStack.  It’s like Heat but much much simpler.
  • Using Nested Resources in Heat for OpenStack (Havana edition)  Most of the effort in the cloud space has been focused on just building a cloud.  What most of us really need is to deploy or move our applications. Heat is the part of OpenStack that deals with this.   The link talks about some of Heat’s new features.
  • Refactoring Monolithic Rails apps.  Most of us using rails have applications that have grown over time.  When you end up with is apps that take a long time for the automated tests to run.  This video talks about how to refactor your app into multiple smaller apps.
  • Using Docker with RedHat’s Openshift  Details how dotCloud and RedHat are going to work together to allow Docker to work on RedHat/Fedora and how Openshift will support Docker.  As far as I know, this is the first PaaS that is formally supporting Docker.
  • Startup Launching 28 Micro-Satellites Sometimes you hear about a startup doing something so ambitious, you just go wow!  With just $13M, a PlanetLabs will launch 28 micro-satellites to blanket the Earth and provide 7×24 imaging.  Each satellite will cost just $300K.  This is a tiny, tiny fraction of what it normally costs to build a satellite.

Mobile, Software Development

Javascript Reverse Geo options (that work outside the US)

A HTML5 mobile app that I was working needed to get the city and province (ie state for you non-Canadians) that the user was in. Using the HTML5 geolocation method made it easy to get the latitude and longitude. But the missing piece was how to translate that into city, county (if applicable) and province/state.

Google maps is the defacto solution in the geolocation space. But their API include a term that requires you to display their map as one of the conditions of using their API (see terms of service). This works for some apps but not always (and not in my case).

SimpleGeo is another popular option. What it has going for it is that there are very few restrictions and the API is very easy to implement. I was able to have code working in less than 30 minutes. The problem is that the data is really only good for locations in the US. If that works for your app, definitely take a look at this. When I tried it with some Canadian locations, there were gaps in the results and some were just wrong. Now to be fair, SimpleGeo is still in beta and maybe somewhere down the line, they will have better support for data outside the USA. Note, when using simplegeo, remember to allow either ‘*’ or localhost in the admin panel if you will be testing from your own machine or a mobile phone. The default is just ‘’ which will cause your API calls to be rejected.

The only other reasonable option left was Yahoo’s Placefinder API. Now, at first I didn’t expect Yahoo to have something that would be of use. They has been getting getting a trashing over the last year in the press. But as I started to look at the API, I realized that’s actually pretty powerful. And it seems to have good support for outside the United States. Definitely the data that I’ve tried for Canada has been very good. One thing to note is that Yahoo only allows 10K requests per day per app. If you need more than this, you need to contact them and work out an agreement.

The API is pretty straightforward to implement as its a simple HTTP GET. You can give it a latitude and longitude that you got from the HTML5 geolocation method and Yahoo will return all sorts of useful data about the location.

If you want to give the API a quick try, take a look at GeoPlanet Explorer web interface created by one of their ex-Developer Evangelists. In the ‘drilldown’ mode, locations are organized as a tree and the UI lets you see things like a locations parents (ie province & country) and it neighbours.

Have geocoding!

Cloud, Software Development, System Administration

Using CloudInit to automate Linux EC2 setup

Ubuntu has had a great way to automate the building of your Amazon EC2 server through something called ClouldInit. What this allows for is that during the creation of a new instance, you can also pass in a variety of data needed to setup your server & application. Normally what is passed is a script but there are several other options. The script can be created in bash, perl, python or awk. The script normally installed any packages needed by your app, configures the various services, loads any startup data and finally installs your application. By scripting the setup, you are ensured that your server is 100% built the same way each time you create a new server. As of yesterday, CloudInit can be used by those of you more comfortable with Redhat/Centos as Amazon has announced their own CentOS-based linux AMI image that includes CloudInit. So now, there is a standard way to automate the building of your server, no matter what flavour of Linux you use.

I’ll talk more about CloudInit in a minute but first I wanted to review some of the other options that people have to setup their server instance and why automating using CloudInit is most likely the right tool to use.

  • manual setup. This involves SSH’ing into the instance once it is up and running and manually entering the commands to install your application and its requirements. While this is acceptable as a starting point while you are in development, no application should be deployed into production on a server built this way. If your server ever goes down, you are in for a lot of pain (and stress) when you have to recreate your server on a moments notice during an outage. I’ve seen a fair number of startups using servers built this way. They start with a ‘dev’ server that was hand built and somehow that ends up being the production box. It’s really important teams take the time to rebuilt the production server cleanly before launching.
  • using a pre-built AMI. this is where you manually setup your server and then create a new AMI image from it. Or more likely you are using someone else’s AMI. Ec2onrails is a perfect example of this. The advantage of using pre-built AMIs is that the server always comes up in a known (good) state. This is a big step forward from manually setting up the server. The downside is that if you want to make any changes to the setup, you need to save a new AMI, which is a slow process. And if you are using someone else’s AMI, you may not be able to do so. In this age of agile development, this can be a handicap.
  • capistrano. This is a build tool from the Ruby world. It can be used to deploy non-Ruby apps as well but the scripts must be in Ruby (this may or may not be an issue for you). Overall, there is a lot to recommend about capistrano in that it is also a scripted solution.Only normally Everything is done through your Capfile. This is where you script the setup of your application.
    The way that capistrano works is by SSH’ing into the server instance and running commands. This happens after the server is up and running. Normally, you start the server instance manually and then plugin the IP address or server hostname into your Capfile. The only downside to capistrano is that is runs from the developer’s desktop. Which may be fine for smaller teams. The minute you have a NetOps team, you probably want something that is not tied to a singled developer station.

Instead of the above, you should take a look at using CloudInit. What that lets you do is pass a script to the server instance that is run during the boot process. So how does CloudInit work and what are the key options. CloudInit allows you to use the ‘user-data’ field of the ec2-run-instance command to pass in a variety of information that CloudInit will use to setup your new server instance. The options include:

  • Cloud Config Data This is the simplest use and allows the common tasks like update linux, install packages, set the hostname and load ssh keys. The full set of options are described here. Using config data will perform some core items but will not be enough to bring up your app.
  • Run a script. If you can’t do what you need with the config data, you can create a script that will handle the extra items. The script can be in bash, python or any other language installed. For my servers, I use this to setup the database, load the application from git and register the server with my dynamic DNS provider. In fact, if you prefer, you don’t need to use the config data and can put everything you need in a script. Note, the script is run late in the boot process. This is normally a good thing but if you have a need to run something earlier take a look at Boothooks.
  • upstart job. if you need to, you can provide a single upstart script that will be installed to /etc/init.
  • a combination of the above. Finally, it is possible to combine all of the above. CloudInit supports creating a mime multi-part file that is a combination of any of the above items. There is a helper tool called write-mime-multipart that will take a set of inputs and generate the mime encoded data to pass to user-data. Note, the maximum total size that can be passed to user-data is 16K. If you are above that limit, you can gzip the data or you can move the items to files accessable through a URL and pass the URLs to user-data

As you can see CloudInit is very flexible and should allow you fully automate the building of your servers. Finally, I’ll note that CloudInit is not limited to EC2 but will also work with Ubuntu Enterprise Cloud (UEC).

Happy scripting!!

Software Development, System Administration

Finally Amazon adds a micro instance. No more need for rackspace / slicehost

Amazon is clearly the right answer for most people’s cloud services. But when you are developing software and just need a small server to do some testing, their smallest instances was about 6 times more expensive than competitive offerings. As a result, a lot of developers also had a rackspace or slicehost account. Now that AWS has announced their new ‘micro’ instance, most of us can get back to the simplicity of using a single cloud.

The new ‘micro’ install actually is not that small. It has a decent amount of RAM at 613MB. That’s much more than most small VPSs from other vendors that only have 128MB or 256MB. Also, while it has one ECU (compute unit), it can burst up to two ECUs. Again, not bad compared to other vendor’s basic offering. Finally, one different is that the ‘micro’ does not include much hard disk space so you will need to add a EBS volume. If you can live with a small 10GB EBS, that will only add another $1 per month (remember we are talking about test / development servers).

So for now, I’m back to using AWS for my test servers. It will be interesting if Rackspace and others respond to Amazon’s aggressive move.

Social Networking, Software Development

LinkedIn API vs Facebook API

Today I was investigating the LinkedIn API. Most developers who want to create a social app have tended to use the Facebook API but I was looking at an idea that was business focused so LinkedIn would be a better fit. While the API is fairly full featured, there are some big differences compared to what Facebook offers. Most of those differences focus on how you discovery the app and how you use it.

On Facebook, apps are tightly integrated into the Facebook UI. Apps appear right in Facebook pages, you can discover new apps in the global directory and apps can add make updates to the news stream. All these means that if you have a good app, you can get away with a fraction of the marketing that you normally have to do. This opportunity has driven a lot of developers to create a Facebook app. Today there are over 500K apps that have being created (500K by Facebook’s own stats).

With LinkedIn, your app does not live inside the LinkedIn site. In fact, the LinkedIn API is more like Facebook Connect, which is geared for companies that already have their own website. It allows your site to not require users to have to create a new account on your site and allows you to access the user’s Facebook data. But as mentioned, all this happens on your own site. It is up to you to find ways to drive new traffic to your app. While there is an app directory on LinkedIn, it only has 13 apps on it. In terms of the actual API, it’s fairly robust. You are able to get at all of a user’s profile information. Also, you can get their connections and do updates to a user’s status.

So as long as you already have an installed base or feel comfortable building your vistor/customer base in the traditional way, the LinkedIn API does allow you to add social type features.

Software Development

New version of jQuery released – even faster

On the weekend, a new version of jQuery was relesed (1.4.2).  What was already a good library has been optimized and runs even faster.  Those that make heavy use of the library might want to try and upgrade.  And obviously, any new projects should start with the latest version.

If you are currently using the 1.3.x version and want to know what’s changed in 1.4.x

Software Development

Zend launches

A lot of people are excited about cloud computing these days. But as with most new technologies, there aren’t a lot of standards defined yet. So vendors are adding new features using their own proprietary approaches. What this means is that once you have moved a site to a given cloud, you are partially locked into that cloud. What I mean by locked-in is not that you can’t move your site but that there will be a certain amount of pain. The amount of pain will depend on how many of the proprietary features you have used of your current cloud vendor.

Zend last week launched something called the Simple Cloud API It’s only in the early stages but has the goal of creating a set of APIs to allow PHP developers to use features of the cloud in a standard way. This means that PHP code written to work on one vendor’s cloud will work on another vendors cloud (as long as they are part of the group). Along with Zend, Simple Cloud has the backing of several players in the cloud space including Rackspace, GoGrid, Microsoft and IBM. Interestingly, the Simple Cloud website is currently hosted on EC2 so maybe the group will be supporting Amazon’s cloud as well.

This partnership is definitely something that is needed. The last thing that developers need are a half dozen (or more) clouds, each different APIs than the others. At this point, the group is targeting 3 APIs: file storage, document storage and simple queues. The Simple Clould website has draft APIs for each of these areas. If you look at Amazon’s cloud, there would be a number of things that would not be covered by these APIs (cloudfront and simple as examples). So the group needs to move beyond the intial 3 APIs to really reach their stated goal of making PHP code portable from one cloud to another. Also, the group seems to only be focusing on PHP (which makes sense since Zend is driving things). Really though, the APIs should be available for all the key programming languages.

Still seems to be a step in the right direction. Let’s hope that the group is able to create some momentum and keep pushing beyond the initial 3 APIs.

Software Development

Mac OS X becoming my favorite OS (to my great surprise)

The more I play with Mac OS X, the more I love it.  This is a surprising new adventure for me as I’ve never really given the Mac OS much thought until recently.  I’ve long been a Windows / Linux guy.  It’s been over 20 years that I’ve been creating solutions for Windows and once open source & the Internet took off, I switched to Linux for the server work that I was involved in.

Windows has a rich API but as more and more of the world moves to linux based systems (including bsd, solaris and mac os x), it’s becoming more and more of an island.  On the linux side, there is a lot to like but it’s always been a little rough around the edges.  The IDEs available are no match for Visual Studio and some utilities (like Xen) are a  pain to get working.  Still, for server work, Linux has a lot going for it.  First and foremost, the large selection of open source libraries.  Every now and then I lust over some of the very cool features in Solaris (like DTrace and ZFS) but the community was too small and many libraries did not have support for it.

Given the small installed base of Macs, especially on the server side, I never seriously considered the Mac OS X.   Then a funny thing happened.  I was investigating an idea fo the iPhone and was forced to get a Mac to do development.   And the more I used the Mac and learned about Mac OS X, the more I liked it.   This was a real surprise to me.

This morning, I’ve spent some time learning about what’s new in Snow Leopard (the latest version of OS X) and I’m impressed.  Apple is doing a lot of cool things.  First of all, OS X is sufficiently close to Linux that I could use it as my development enviroment.  And recently, Apple added DTrace (so no need to consider Solaris anymore).  In Snow Leopard, Apple is laying the foundation for easier multi-threading.  They’ve include something called Grand Central Dispatch and Blocks.  While all OSs support multi-threading, coding for it is a fairly advanced topic.   The best part is that Apple has released the source for Grand Central Dispatch as open source.  Let’s hope that this gets ported to Linux in short order.  Another cool technologies is OpenCL.  This allows you to create code that can run on the system GPU.  Traditionally, to harness the power available in today’s GPUs required chip specific coding.  OpenCL is generic and can even run on the CPU if it makes sense.

Now there are some things that I’m not sure I’m crazy about.  Objective-C for one.  No other OS has much support for this so anything created in Objective-C is OS X only.  I’m a cross-platform kind of guy so this is not a path that makes sense for me.  And some of the new features in Snow Leopard are OS X specific right now.  I like that Apple is willing to open source some of the key items though (like Grand Central Dispatch).  While my focus on software APIs, Mac OS X is a surprisingly nice client OS. I love how little extra software I’ve had to add. On a PC, the first thing I do on a new machine is load up a large number of extra software packages.  It’s easily several hours worth of work.  Other than added XCode (the Mac IDE), an out of the box Mac is pretty good to go. And Time Machine is very cool.  Even though I”ve had several hard disk crashes over the years, I don’t really do enough backups.  Time Machine is something every OS should have.  I’m surprised that Apple is the 1st company to get it right.

So today, I still use a PC as my main development enviroment.  But I’m at the point where I could easily see myself switching to the Mac as my primary computer.  Hats off to Apple for creating an OS that even a hard-code PC coder could love.

Software Development

virtual hosting evolved

The last couple of months I’ve been playing with a new hosting company called slicehost.  For those of us that are constantly in need for a server to just a few hours or a few days, slicehost is a god send. While I have been using  various virtual hosting companies for things like our website, most of the various development has been done using a server here in our office.  You don’t want to know how many times I’ve installed centos or windows over the years.   Well, slicehost makes that a thing of the past.    With slicehost, you have a web control panel and you can create and delete servers as required.  And since you pay a daily rate, it’s not a problem to create a server, use it for a few hours and then delete it.  And slicehost supports a fair number of linux distributions so if you want to test an app you wrote on a number of distros, its easy to do.

Another nice thing about slicehost is that you can upsize or downsize your slicehost at will.  So if you start with a small slice and find that the load is too heavy, its trivial to upgrade the slice to a large size.

Also, I should note that I find slicehost a good compliment for Amazon EC2.  I’m using EC2 for a service that we are working on.  EC2 is good for scalability and robustness.  But it also has a starting price of about $75/month.  Slicehost starts at $20/month.    So our serious stuff ends up on EC2 and we use slicehost for various research projects or ‘whenever you need a quick server’.  The two really are complimentary.

Having gotten rid of the time that we waste installed the OS (again and again), there is still another area for improvement:  that’s server configuration.   Once you have your server, you still need to install the software that you want to use.  For those of you used to cpanel (or other related solutions), this might be a bit of a surprise to you.   Both EC2 and Slicehost give you a plain vanilla server.  No apache or mysql.   For those of us in the software development area, this is perfect.  I do a lot of work in the area of VoIP and I don’t want apache on most of my servers.   Also, our subversion server don’t need mysql or any other software (except bugzilla).

But sometimes getting everything installed can take a long time.  We had one server configuration that took several hours to build.  That was because several items had to be compiled, patched and built.   The solution to this time waster is obviously scripting.

As we move to using these ‘cloud computing’ solutions, we are turning to more and more server scripting.  There has been some debate as to what we should use but for now we are starting with the basics.  All of our servers are setup using bash scripts.   This fits nicely with a 3rd-party monitoring solution that we use.   Bash scripting is not pretty but its amazing how you can do pretty much everything you can do with a normal programming language.  As a guy who has traditionally done everything in C/C++, I was quite surprised how powerful bash is.

So in the end, we are are now able to create servers on demand and using a script, have them setup and configured the way we want in minutes.  Life is good!

Software Development

subversion house cleaning

We’ve been hosting our svn repository in the Toronto office.  We did it because we thought it was the safest place to keep our important source code.  Yet, over the years, our office DSL connection has been out at least once ever couple of months.  That ends up being a real hassle for the team in India.  Also, the upload speed is pretty poor (768K) so big checkouts in India can take quite some time.

Yesterday, we had another outage and decided it was time to move svn to a hosted solution.  I had tried once before but the svn dump is over 15GB and the upload was going to take several days and interfere with checkin/checkouts during business hours.

So here I am again figuring out what to do.  This time I’m going to create separate dumps for each 1000 revisions.  Then I’ll upload those to the hosted server slowly when the indian office is closed.  Once all the dumps are on the hosted solution, I can load them all and finally move the team over to the new server.

Since I was already doing admin work, I decided to also cleanup the tree.  I’m going to do that by splitting the repository in sevearl.  One for each team/product.  You need to use a dump and svndumpfilter to do that.  Overall, not a hard process but during the planning I ran into a couple of issues.  First, the dump can not be made with –delta.  For small repositories, thats not a problem.  For larger ones, a full backup without deltas can be huge.  Also, the documentation for svndumpfilter only mentions how to include or exclude one folder. It took a bit of searching on the net to find out that you can include more than one directory after the include or exclude.

So now, we are finally ready to start moving forward with the move to a new hosted server.  Let the fun begin!!