Software & DevOps


LAMP Stack and Linux System Administration

The majority of my programming experience is in PHP, which I've been using since the late '90s. It was in version 3 at that time, and had just become capable of object-oriented programmming — supporting classes, rather than being restricted to functions only. I have always done this using Linux, Apache, and MySQL.

Running a LAMP stack environment obviously requires learning how to use and administrate a Linux machine, but I don't just run Apache and MySQL. I also run DNS (bind), mail (Sendmail and Exim), file (NFS and Samba), and other system services such as firewalls (iptables). My desktop environment is usually a mix of Windows and Linux, with files hosted on Linux and shared with Samba. Thus, my Linux sysadmin skills have been in development since the late '90s. That includes administrating it for employers, as well as for my own companies. I have used original RedHat, Red Hat Enterprise Linux (RHEL) 7, CentOS, Debian, and Ubuntu.

The debate over which of these OS families is "best" usually boils down to this: Which one is already used in your infrastructure? I'm comfortable using whatever you already have. If you're getting ready to launch a new operation, and you need help selecting the OS to base your IT infrastructure on, I'll be happy to advise you, whether you're considering Linux or any other OS.


This Portfolio

This site, Sublime Text IDE Recipe from the Chef cookbook used to provision this site

This portfolio site is served from Laravel 5 on PHP 7. There is no CMS. Every page is coded by hand, using Laravel's "blade" templating system. The code and all content were written in a few days in Sublime Text, a great cross-platform IDE. The site uses a slightly customized version of Bootstrap, which makes it very easy to get a site up and running without having to do a lot of custom front end work. I've written web app frameworks with their own custom CSS in the past, but this represents a significant duplication of effort. In my opinion, it's ususally more efficient to spend time puzzling over problems that haven't already been solved.

Laravel 5 ships with version 3 of Bootstrap. I decided to upgrade to version 4. Overhauling the site code to use the new framework properly took a few hours (mostly research), and was not a big deal. I'm using Bootstrap's CDN to source the JavaScript and CSS files because the versions fetched by npm (even after configuring package.json and webpack.mix.js) are broken in a way that disables the navbar menu on mobile devices. (I assume this is a side effect of the code still being in beta.)

The server is running on Amazon EC2. All source code is tracked in a git repo. Configuration management and deployment of the site are both done through Chef.


Internal Tools

I developed internal tools (websites accessible only via corporate intranet) at several of the companies I worked for. For obvious reasons, I can't share any screenshots.


Game Server Instance Management and Remote Procedure Calls

Front End (Joomla) Instance Management Process - Console

When I ran a Minecraft gaming service provider, it was necessary to create much of the infrastructure from scratch myself, including instance configuration management and remote control. Configuration management systems such as Chef, Puppet, and Ansible are designed for system administrators who need to control entire machines. My application required that, but to a much finer granularity. In addition to handling configuration for the "outer" game server (the physical machine that contained many instances), it had to allow end users to directly control the configuration and operation of their instances through a control panel. I achieved this with PHP, an encrypted REST RPC stack, a series of shell scripts, and a cron job.

Running a GSP has some things in common with running a Web hosting company, but there are some significant differences. With web hosting, you have two main pieces (HTTP and database service) plus some ancillary services (DNS, load balancers, etc). There will also be a control panel of some kind. Most of the Web servers running on a machine are low-traffic, and thus idle almost all the time; so it's reasonable to have hundreds of sites on a single server.

With a game service provider, architectural requirements are much more demanding. Each customer is renting resources (primarily RAM) on a physical machine. In the case of Minecraft, the application is being served from Java, and each server gets its own dedicated RAM. (One Minecraft server cannot cede unused RAM to another, so it's not like running Apache or nginx.) The game constantly simulates a small amount of land, even if no one's logged in, so there is always some load. Each game server has to be controllable via a control panel, and that requires a secure RPC stack. There must also be support for backups and rollbacks, and the control panel has to be able to send commands to the server to effect those and other commands. Game servers have to be controlled by a watchdog process (an Instance Runner) that will restart them if they hang, and each server's watchdog process must itself be launched and periodically monitored by an Instance Management Process (IMP) running on each machine. Additionally, users have to be able to change the password to their SFTP server. In our case, they also had the ability to set up a Mumble server for voice chat.

Front End

The main web server had a registration page, and a user control panel. The control panel allowed logged-in users to adjust the configurations of their game servers, as well as rolling back their maps to previous snapshots. RPC payloads were created as PHP objects and then serialized (turned into a flat text format) for transmission. The serialized payloads were then encrypted using GNU Privacy Guard (GPG), the open-source alternative to PGP. Each server in the system had its own key. When a payload was received, it was decrypted according to its originator's public key, and then responded to with another payload which was encrypted in the same manner.

The website presented a significant attack surface, so I signed up to receive security updates from a Joomla! mailing list. I applied any security patches as soon as I became aware of them. Input from web forms was also carefully sanitized. If a field was supposed to return an integer or a float, the code made sure that the input was strictly numeric, and cast to the correct type. If a field contained text, that text was checked for any suspicious characters, and escaped to avoid any SQL injection attacks.

Application Servers

Each game server had a master Instance Management Process that ensured that all the server instances that were supposed to be up and running, were. Each server instance had its own watchdog process, called an Instance Runner, that would launch the server in a container (a chroot jail), so as to keep each instance isolated from each other instance, as well as the control software. People could upload custom plugins, and there was nothing to stop a plugin from walking the file system, so it was necessary to make sure they had no access to see anything more than absolutely necessary. The Instance Runner process was also responsible for re-launching its game server if it crashed, which could happen from time to time.

The instance runners allowed the main control process (IMP) to connect to them over a socket. To prevent rogue plugins from discovering this and exploiting it, each process communicating over those sockets used identd, a Linux service that reports the user ID that owns the connecting process. Since the control process and each watchdog had their own user IDs, any connection that wasn't coming from the right user would be dropped, and the violation immediately reported.

Configuration Management

CM was critical to this operation for two reasons. One, multiple application servers had to all be running the same version of the same operating system, with the same applications and system settings (such as firewall rules). Two, customers had to be able to edit their application server settings from our control panel.

Botnet Defense

My biggest customer's game server was attacked by a botnet. Hundreds of different accounts logged in at random from hundreds of IPs. If you blocked an account, many more were ready to take its place; and if you blocked an IP, you had the same problem. The weakness of the attack was that the accounts and IPs were used interchangeably. If a botnet user connected from an IP, it was possible to look up all other users who connected from that IP, and then block them, and all the other IPs that they connected from, and all the other users who connected from those IPs, and so on. Six levels of this were all that were required to find all botnet users and IPs. I wrote a PHP script that took only one user or IP as an argument, and then walked the logs to find all associated users and IPs. Once the script was complete, the botnet attack was completely isolated in less than a minute. None of the game servers were ever botnetted after that. If they had been, the attack would've been very short-lived.

Would I do anything differently today?

Yes and no. When you're running a service provider, you have to trust hundreds of moving parts. Anything that's more complicated than it needs to be places the business at added risk of decreased customer satisfaction, reduced signups and retention, and increased workload. Every decision that influences this must be made after carefully analyzing the problem from multiple angles.


Software Development Principles


Putting Out Fires

After I've been working somewhere for awhile, I usually earn a reputation as someone who can be counted on to fix things that need to be back online ASAP. Let me tell you the story of how I rescued a very important site.

At a company I used to work for, someone else was given responsibility for a very high-traffic website, one used by customers who liked everything to be just so. The company was compelled to change the site's software for what I think are valid business reasons. It was legacy software, we didn't use it on any other site, and we didn't want to spend resources on keeping it updated when we already specialized in the software we wanted to replace it with. Nevertheless, it was a sensitive issue.

The software change was given to someone with no technical background, and it didn't go smoothly. It had been launched on a Sunday night, which I would never have done. (Monday through Wednesday? Yes. Thursday? Maybe. Friday through Sunday? Nope! Too hard to get a hold of people if anything goes wrong.) By the time I arrived on Monday, the site had been down for about twelve hours, and the users were beyond furious. They were not at all shy about communicating their true feelings to us. Some of them emailed the CEO! The site's Alexa ranking could be seen to dip on that day, and to stay there for months afterwards.

The people who had done this were huddled around one of their computers, unsure of what to do. I immediately suspected what was wrong, and I asked if I could fix it for them. They said yes. I flagged down a trusted DBA, and to neither of our surprise, the database server was thrashing badly. I took the site down, replacing it with a temporary page apologizing for the outage, and we set to work. I had the DBA modify the InnoDB table settings, optimizing the amount of RAM set aside for caching and so on, according to a best-practices white paper published by NetApp. I also had him convert several tables from MyISAM to InnoDB, and add a few indexes as well. I knew from experience that this would ease the load on the DB server.

It took an hour or so to convert the tables from one storage engine to another, and to rebuild the indexes. When this was done, I brought the site back online, and it loaded normally. The users recognized this immediately, and began to log in en masse — thousands of them every minute. The site took the traffic without a hitch. Pages were loading nice and fast. The users hated the new software, but there was nothing I could do about that. My responsibility was to make it work, and that's just what I did.


Dev Lab

I develop websites with a headless Ubuntu server, and either an Ubuntu workstation or a Windows 10 laptop. To make everything work smoothly, there are a few things I have to do for each one.


Whoops, looks like something went wrong.

(1/1) ErrorException

file_put_contents(): Only 0 of 191 bytes written, possibly out of free disk space

in Filesystem.php (line 122)
at HandleExceptions->handleError(2, 'file_put_contents(): Only 0 of 191 bytes written, possibly out of free disk space', '/home/solidox/solidox.xyz/vendor/laravel/framework/src/Illuminate/Filesystem/Filesystem.php', 122, array('path' => '/home/solidox/solidox.xyz/storage/framework/sessions/VSv084mE5SQmCjJz2h04Z0K3B9MKk8qLFLOCKqeE', 'contents' => 'a:3:{s:6:"_token";s:40:"qPH81mgE3liNlMVDe9BcsFH2H5Ce0Oo2ZU9eJL2v";s:9:"_previous";a:1:{s:3:"url";s:33:"http://solidox.xyz/softwareDevOps";}s:6:"_flash";a:2:{s:3:"old";a:0:{}s:3:"new";a:0:{}}}', 'lock' => true))
at file_put_contents('/home/solidox/solidox.xyz/storage/framework/sessions/VSv084mE5SQmCjJz2h04Z0K3B9MKk8qLFLOCKqeE', 'a:3:{s:6:"_token";s:40:"qPH81mgE3liNlMVDe9BcsFH2H5Ce0Oo2ZU9eJL2v";s:9:"_previous";a:1:{s:3:"url";s:33:"http://solidox.xyz/softwareDevOps";}s:6:"_flash";a:2:{s:3:"old";a:0:{}s:3:"new";a:0:{}}}', 2)in Filesystem.php (line 122)
at Filesystem->put('/home/solidox/solidox.xyz/storage/framework/sessions/VSv084mE5SQmCjJz2h04Z0K3B9MKk8qLFLOCKqeE', 'a:3:{s:6:"_token";s:40:"qPH81mgE3liNlMVDe9BcsFH2H5Ce0Oo2ZU9eJL2v";s:9:"_previous";a:1:{s:3:"url";s:33:"http://solidox.xyz/softwareDevOps";}s:6:"_flash";a:2:{s:3:"old";a:0:{}s:3:"new";a:0:{}}}', true)in FileSessionHandler.php (line 83)
at FileSessionHandler->write('VSv084mE5SQmCjJz2h04Z0K3B9MKk8qLFLOCKqeE', 'a:3:{s:6:"_token";s:40:"qPH81mgE3liNlMVDe9BcsFH2H5Ce0Oo2ZU9eJL2v";s:9:"_previous";a:1:{s:3:"url";s:33:"http://solidox.xyz/softwareDevOps";}s:6:"_flash";a:2:{s:3:"old";a:0:{}s:3:"new";a:0:{}}}')in Store.php (line 128)
at Store->save()in StartSession.php (line 88)
at StartSession->terminate(object(Request), object(Response))in Kernel.php (line 218)
at Kernel->terminateMiddleware(object(Request), object(Response))in Kernel.php (line 189)
at Kernel->terminate(object(Request), object(Response))in index.php (line 58)