Chapter 1.1 : Philosophy

Understanding the how and why I came up with Tlaloc might help some readers better understand its purpose, and eventually better help in extending the vision and reach of the software.

The idea of writing this, very complex, piece of software came to me after many years of failed experiments, life experiences and business cases. The broadness of the software didn't really scare me, but rather its not so modest intentions in wanting to control everything.

Its main purpose is to become (as it is not really yet), a better cloud management solution than the likes of Amazon. I've used and still use the Amazon AWS services for particular client projects, but everytime I'm faced with facepalming situations, which remind me too much of my corporate days debunking Microsoft documentation and code.

You see, I'm an ex-Microsoft coder. I've exiled myself from the corporate world, intentionally, by making PHP my language of choice and refusing any project involving MS programming. I simply followed my heart, even if I regret this move sometimes, no matter, I'm living a real paradise now.

So, lately, I was confronted with the elastic load balancer who for some reason ditches the Remote IP in favor of the not so wise Proxied_For server variable. A real nightmare that couldn't be handled at such a last minute, so a decision was made to switch back to plain old EC2 instances and be gone with the novelty.

You see, (besides hoping an Amazon staff will read this), I wished I hadn't coded so much PHP relying on the Remote_IP.

The proper way of handling this particular nitty problem is to use the Proxied_For variable instead, which may or may not contain a LIST of IPs. Obviously that meant changing a LOT of code, something which couldn't be done on the eve of a website launch.

All in all, pretty irritating experiences. Cloud computing is not what marketers have us believe, yet.

So, this is an effort to look beyond each vendor's proposed methodologies and integrate them into a working philosophy, instead of working our ways the other way around, adapting business procedures around new core technologies.

I chose to derive most of this work from the Puppet & Chef models, because I know them well enough, having had to hack my way around portability issues on openbsd. But mostly, I like the way they operate, albeit a couple of non-obvious design flaws that just don't make it viable for cloud computing.

Before I delve in those details, let me explain a little bit more.

Not too long ago, a project fell into my lap, and I was told, see what you can do to make this work on a long term basis. It was a big multi-node, multi-cluster system, leftover of a defunct company gone crazy with their risk investments. A couple of millions later, with a 10,000$ monthly bill with Amazon, and no more employees, I had to manage it. And I was very strong-minded about it, I did manage it.

The hiccup lied somewhere between the lines I had to administer. A very complex soup of clusters (just storing the daily data required a cluster of 7 beefy ElasticSearcharble nodes.), analysing, crunching, gathering the data constituting the underside of the iceberg, I had a lot of learning, I didn't sleep much the first 2 months...

Anyhoo, the problem was managing so many systems, leftover by someone else, who didn't want to be bothered with so-said systems. Puppet couldn't cut it, as integrating each node required development, debugging and so on. The other issue was the fact they had spent their millions on crazy-stupid ideas, thinking they would revolutionize the world. Their way of tackling this was in writing a complex multi-server application system, with a lot of single failure points, and rewriting a good part of Ruby, which simply couldn't handle their loads. Hourra! Money well-spent.

So, I got to re-thinking about my cloud management idea, and I worked on it really hard to give it something that I would consider ground-breaking. I decided to work on the idea of removing the masters out of the equation, completely, for good, for ever. At first the idea seemed to me a bit precarious, and honestly, a bit arrogant. But the more I thought about it, the more groundbreaking it looked. So I started hypothesizing some prototypes, and fiddling with some module ideas, and now I'm even more convinced it's the right way to go.

Think about it; when you're running a cluster of 100 servers, 1 master node just doesn't cut it. You start deleguating, and politicizing the command hierarchy. In the regular business thinking nowadays, the master-slave relationship is favored, and biased I'd say. If we have a pair of Domain Controlers for each department, or a DB pair for each 50 webservers, and so on, why not have 1 master per 100 servers ? Right ?

So, to not get into the spiritual side of things; the idea came up as I was being empatic with a pack of wolves. In a pack of wolves, one leader stands out. If he dies, another takes his place to lead the pack. But then, what is a pack ? In my terms, it would be a group of nodes doing a similar type of work.

And then, the idea of communication; put the cloud to work, like it should, and let the nodes communicate between themselves. Elective processes are a necessity (sorry Microsoft, you didn't invent much), so why not do it with rock-paper-scisors, when we don't want to get bothered with who's beefier than who.

So, to sum it all up; follows is what I envision the Cluster-Runner to be doing:

  • a server management methodology focused on self-sufficiency for individual nodes.
  • to provide redundancy
  • make it all idempotent (see another document I will eventually write as to the real whys)
  • server scripts that bootstrap from Amazon, Puppet, Chef, DHCP, through rc or manually.
  • a (current) focus on linux and bsd environments.
  • it should allow me to integrate my existing systems, faster than with Puppet (it currently does)
  • inter-node communications based on channels (a bit like IRC one might note)
  • safer approach to security, sources and backups

This is as far as the Cluster-Runner is concerned. The Tlaloc cloud management system is a bit more involved, Cluster Runner only being a fraction of its intended volume.

Basically, Cluster Runner is meant to be run ON a cluster, more specifically on each node, to make them participate in a cluster.

Tlaloc itself is a higher level where the programming runs on TOP of the clusters. Its meant to be a coding interface to put the clusters to work.

A bit of sample code

$Node_Name=trim(`hostname`);
$Node_OS=PHP_OS;
$Node_OSVersion=trim(`uname -r`);

... some basic initilization stuff omitted here ...
if (preg_match("/^.*\.servers\.domain\.ext$/",$Node_Name)) {
   // we are part of the *.servers.* group
   require 'recipes/base_servers.php';
}
if (preg_match("/ns\-[0-9]*\.dmz\..*/",$Node_Name) ) {
   // We are a name server in the DMZ, authoritative for DNS & NTP service.
   // ie: ns-[01].dmz.*
   require 'recipes/dns_authoritative.php';
}
if (preg_match("/app\-[0-9]*\.dmz\..*/",$Node_Name) ) {
   // We are an app web server in the DMZ
   // ie: app-[01].dmz.*
   $File->Replace_Line("/etc/ntpd.conf","^Server ","Server ntp.dmz.domain.com");
   $Service->Start("ntpd");
   $Package->Install("mini_sendmail");
   $Package->Install("php");
   ...
}