Replacing your user agent with a Twisted proxy

A couple of days ago, a friend sent me a link to Google’s “killer-robots.txt” easter-egg. I was curious to see if there was more to it than just the file (a tribute to both the robots.txt file and to the Terminator movies), and started trying to mess around with the user agent I was sending to google.com. I had started by using Burp Suite, which allows you to intercept and alter your requests, but it was a drag having to repeatedly intercept and alter the User-Agent string; consequently, I ended up writing the python code below.

It’s the first time I use Twisted to do anything – I have to say, I like its simplicity.

Like the header says, I did end up using a Firefox plugin to change the user agent; but I thought that the code would be useful to have around for those of you that might wish to intercept and alter a whole bunch of packets. I use regex to identify the user-agent string and replace it with whatever I want – this method grants you quite a bit of flexibility in finding and replacing elements of your request.

Enjoy!


# Written by redparanoid (@Inf0Junki3), Friday 11.07.2014

# ReplacingProxy: A proxy that takes incoming requests, substitutes the user agent with another
# value, and passes the request on. I wrote this script because I was curious about the
# killer-robots.txt easter egg on google, and wanted to see if fiddling with the user-agent would
# change the content of the pages. Sadly, it doesn't.
# Note that I ended up using Firefox's User Agent Switcher to force the agent, because all google
# traffic is over SSL and I didn't want to mess around with HTTPS. I shall someday fix this, when
# I am bored again. In the meantime, this is still a neat proof of concept 🙂

# References:
#   https://wiki.python.org/moin/Twisted-Examples
#   http://stackoverflow.com/questions/9063583/python-twisted-proxy-how-to-intercept-packets

from   twisted.web         import proxy, http
from   twisted.internet    import reactor
from   twisted.python      import log
import sys
import re
from   termcolor import colored

#log to the standard output.
log.startLogging(sys.stdout)

class ReplacingProxy(proxy.Proxy):
  """
  The class used to intercept the data received and print it to the std output.
  """
  def dataReceived(self, data):

  match = re.compile(r"User-Agent: .*$", re.MULTILINE)

  new_user_agent = "User-Agent: T-1000"

  if "User-Agent: " in data:
    old_user_agent = match.search(data).group(0)
    print colored(
        "Substituting %s with %s" % (old_user_agent, new_user_agent),
        "blue")
    new_data = match.sub(new_user_agent, data)
    print colored(
        "OUTPUT: %s" % new_data,
        "green")

    return proxy.Proxy.dataReceived(self, new_data)
  #end def dataReceived
#end class

class ProxyFactory(http.HTTPFactory):
  """
  The factory class for the proxy.
  """
  protocol = ReplacingProxy
#end class

reactor.listenTCP(8080, ProxyFactory())
reactor.run()

Automating your cloud infrastructure, part one: automating server deployment with pyrax, unittest and mock

Automating server deployments with python - because even sysadmins need their sleep... Image CC license from openclipart user hector-gomez

Image CC license from openclipart user hector-gomez

I’ve been tinkering with cloud infrastructure a lot in the past couple of years. I mostly administer my servers by hand, but recently I’ve tasked myself with migrating a dev environment of a half dozen servers – I figured it would be a good opportunity to roll up my sleeves and do some writing on the topic of cloud infrastructure automation. In this first post on the topic, I shall provide a possible approach to automating server deployment (as well as unit-testing your scripts) using Rackspace Cloud’s API; I’ll eventually get round to other topics such as automating your deployments and configuration with Puppet, and setting up monitoring and intrusion prevention — but for now, my focus will be on automating server deployments on the cloud.

Your mission, should you choose to accept it…

Imagine that you are a sysadmin tasked with deploying several servers on cloud infrastructure. There was a time when you had to painstakingly configure and deploy each server manually… A boring, highly repetitive, and error-inducing task. These annoyances are rapidly overcome nowadays: advances in virtualization and cloud computing have made automating server deployments a breeze.

Before we begin, though, here are a couple of assumptions: I assume that you are working with Rackspace Cloud and are familiar with its services. I also assume that you’re familiar with the concept of ghosting, i.e. creating a pre-configured, base template of a typical server that would be deployed in your infrastructure. Configuration of the base template is not in the scope of this post; I might cover some basics in the near-future when broaching the subject of configuration management with Puppet, though.

I also assume that you are familiar with software development concepts such as test-driven development, code versioning, and software design patterns. I’ve tried to provide links where it makes sense; if something’s not easy to follow, feel free to comment and I’ll answer/amend accordingly.

Introducing the tools

Infrastructure automation is a subject mainly discussed in sysadmin circles; however, the tools that I use in my approach come largely from my programming / testing toolkit. I’m an advocate of Test-Driven Development, and I see no reason why the same cannot be applied to systems administration.

My entire approach is based on the assumption that you are comfortable with Python. I’ve set myself up with PyDev for this task; PyDev is a python editor based on the excellent open-source Eclipse IDE. The benefits of using PyDev over notepad, notepad++ or gedit  are that 1) you get syntax highlighting and code completion, 2) the refactoring plug-ins are sweet, and 3) you can manage and run your unit tests from the IDE. I realize and respect that there are a lot of vi / nano / emacs purists out there – I used to be one. If you’re happier using a nice, clean editor like that, cool! Doesn’t change my approach.

But I digress. Rackspace Cloud has a ReST API that allows you to perform (almost) all the tasks you can perform from the admin dashboard. You can do things like create servers, isolated networks, list server images… The panoply of functionalities is documented on http://docs.rackspace.com/. If you’re familiar with python’s urllib library, you can implement your own library with a little work; however, I would recommend using pyrax instead. The library is easy to use, well-documented and only a pip/pypi install away. I’ll be using this library in my sample source code.

As mentioned before, I’m keen on TDD; when developing my deployment script, I begin by writing tests that are bound to fail when they are first run, then implement the code that will make them succeed. This way I can catch a lot of silly errors before I launch the script in my production environment and I can make sure that changes I make as I go along don’t break the scripts. I use the unittest and mock libraries to achieve this purpose. I don’t go as far as to check code coverage, though I may do so eventually for larger scripts.

Setting up the project

I recommend setting up a basic environment so that you can comfortably write scripts for your infrastructure. If you administer several infrastructures, I urge you to have one environment per infrastructure so as to avoid any accidental deployments (or deletions!).

Your entire environment should be contained in a single folder as a package. I’d recommend setting up code versioning with a tool like git to manage code changes but also branches – for instance, you could easily maintain deployment scripts for several infrastructures that way.

Here’s what your environment should look like; I’ve called the root directory of the environment my_rackspace_toolkit – I provide explanations for each component below:

my_rackspace_toolkit [dir]
|
+--> rackspace_context.py
|
+--> rackspace_shell.py
|
+--> category [dir]
     |
     +--> deployment_script.py
     |
     +--> tests [dir]
          |
          +--> deployment_script_tests.py

rackspace_context.py

This contains a single class, RackspaceContext. This allows you to supply your scripts with contextual variables for calling pyrax objects. Here’s an example implementation:

import pyrax as pyrax_lib
import keyring as keyring_lib

class RackspaceContext(object):

    #Set up alias to the pyrax library:
    pyrax = pyrax_lib
    keyring = keyring_lib

    def __init__(self):
        # Set up authentication
        self.pyrax.set_setting("identity_type", "rackspace")
        self.keyring.set_password("pyrax", "my_username", "my_api_token")
        self.pyrax.keyring_auth("my_username")

        # Set up aliases
        self.cs     = self.pyrax.cloudservers
        self.cnw    = self.pyrax.cloud_networks

As the name indicates, RackspaceContext is a typical implementation of the Context design pattern. There are several benefits to this:

  1. With a Context class, you can consistently set up authentication throughout all your deployment scripts. If your API token changes, you only have one file to worry about.
  2. If you want to re-deploy your environment for multiple rackspace accounts, you need only change the context and you’re good to go.
  3. If done right, your deployment scripts don’t need to worry about authentication – they just need to consume the context class.
  4. This makes testing your scripts insanely simple. We’ll see why in a moment.

rackspace_shell.py

The rackspace shell is a command-line interface that pre-loads the context and any scripts that you’ve written so that you can execute them easily. Here’s an example:

#!/usr/bin/env python

from pyrax_context import PyraxContext
context = PyraxContext()

# Import the CreateDevEnvironment script so it can easily be called from the shell.
from dev.create_dev_environment import CreateDevEnvironment

print """
Pyrax Interactive Shell - preloaded with the rackspace context.

When running your scripts, please make calls using the context object.

For instance:

script = CreateDevEnvironment()
result = script.actuate(context)

print result
"""

# Drop to the shell:
import code
code.interact("Rackspace shell", local=locals())

Note that if you’re writing deployment scripts for use in a CI environment like Jenkins, you may wish to adapt this file to make it either interactive or non-interactive, perhaps by using a flag. I’ve found that it’s a useful thing to have in any case.

category directory

You are likely to have several types of deployment scripts; I recommend that you divide them by category using packages. For instance, why not have a dev package for the development servers? Why not separate creation scripts from deletion scripts? How you separate your scripts, whether by functionality or by server type, is up to you; I’ve found that some categorization is essential, particularly because you may find yourself executing many of these scripts at a time and you need a way to do this in a logical manner. Make sure the each of your directories has an __init__.py file, making it a package.

deployment_script.py

Each deployment script should be a file containing a class that will be called from your shell. For instance:

class CreateDevEnvironment(object):
    """
    Script object used to perform the necessary initialization tasks.
    To use, instantiate the class and call the actuate() method.
    """
    # Set up list of machines
    MACHINES = ["machine-1",
                "machine-2",
               ]

    def actuate(self, pyrax_context):
        """
        Actually performs the task of creating the machines.
        """
	try:
		# Get the flavors of distribution
		flavors = [flavor 
		           for flavor in pyrax_context.cs.flavors.list()
		           if flavor.name == u"512MB Standard Instance"]
		flavor_exists = len(flavors) == 1

		# Get the networks
		networks = [network 
		            for network in pyrax_context.cnw.list() 
		            if network.label == "MyPrivateNetwork"]
		network_exists = len(networks) == 1
		network_list = []
		for network in networks:
		    network_list += network.get_server_networks()

		# Get the images
		images = [image
		          for image in pyrax_context.cs.images.list()
		          if image.name == "my_image"]
		image_exists = len(images) == 1

		if (flavor_exists and network_exists and image_exists):
		    for machine_name in self.MACHINES:
		        pyrax_context.cs.servers.create(machine_name, images[0].id, flavors[0].id, nics = network_list)
		return "Creation of machines is successful."
	except Exception as e:
		return "An exception has occurred! Details: ", e.message

The class contains a single method, actuate, which carries out your infrastructure deployment tasks – in this case, it is the creation of two machines based on a previously created image, using the standard 512 MB flavor of server.

deployment_script_tests.py

This is the file containing your unit tests. You can write your tests using unittest, pyunit or nose; I’ve written mine with unittest, and I use mocks to provide my tests with fake versions of pyrax objects. The goal is to verify that the script calls the correct function with the appropriate parameters, not to actually carry out the call. Once again, here’s an example of how this can be done:

import unittest
from rackspace_context import RackspaceContext as RackspaceContextClass
from mock import Mock
from dev.create_dev_environment import CreateDevEnvironment
from collections import namedtuple

Flavor = namedtuple("Flavor", "id name")
Network = namedtuple("Network", "id label")
Network.get_server_networks = Mock(return_value = [{'net-id': u'some-guid'}])
Image = namedtuple("Image", "id name")

class CreateDevEnvironmentTests(unittest.TestCase):

    RackspaceContext = RackspaceContextClass

    def setUp(self):
        self.RackspaceContext.pyrax = Mock()

        self.RackspaceContext.pyrax.cloudservers.flavors.list = Mock(return_value = [
                                                                                 Flavor(id = u'2', name = u'512MB Standard Instance')
                                                                                 ])

        self.RackspaceContext.pyrax.cloud_networks.list = Mock(return_value = [
                                                                           Network(id = u'1', label = u'MyPrivateNetwork')
                                                                           ])

        self.RackspaceContext.pyrax.cloudservers.images.list = Mock(return_value = [
                                                                                Image(id = u'3', name = u'my_image')
                                                                                ])

        self.context = self.RackspaceContext() 

    def tearDown(self):
        pass

    def testActuate(self):
        create_script = CreateDevEnvironment()
        create_script.actuate(self.context)
        # The script should first check that the 512 standard server flavor exists.
        self.assertTrue(self.context.pyrax.cloudservers.flavors.list.called, "cloudservers flavors list method was not called!")

        # The script should then check that the DevNet isolated network exists.
        self.assertTrue(self.context.pyrax.cloud_networks.list.called, "cloudservers networks list method was not called!")

        # The script should also check that the image it is going to use exists
        self.assertTrue(self.context.pyrax.cloudservers.images.list.called)

        # Finally, the script should call the create method for each of the machines in the script:
        for args in self.context.pyrax.cloudservers.servers.create.call_args_list:
            machine_name, image_id, flavor_id = args[0]
            nic_list = args[1]["nics"]
            self.assertTrue (machine_name in CreateDevEnvironment.MACHINES)
            self.assertTrue(image_id == u'3')
            self.assertTrue(flavor_id == u'2')
            self.assertTrue(nic_list == [{'net-id': u'some-guid'}])

if __name__ == "__main__":
    unittest.main()

Notice how I’m setting up mocks for each method that expects parameters back – this allows me to do back-to-back testing on my scripts so that I’m sure that I know how the script will be calling the pyrax libraries. While this doesn’t prevent you from making mistakes based on misunderstanding of how pyrax is used, it does prevent you from doing things like accidentally inverting an image id from a flavor id!

Conclusions

Using this methodology, you should be able to easily develop and test scripts that you can use to mass deploy and configure rackspace cloud servers. Initial setup of your environment using this approach should take no more than half an hour; once your environment is set up, you should be able to whittle out scripts easily and, more importantly, make use of this nifty little test harness so that you avoid costly accidents!

Suggestions and constructive criticism are welcome; I’m particularly interested if you have seen better approaches to automation, or if you know any other nifty tools. I’d also be interested in finding out if anyone out there has real-world experience using pyrax with Jenkins or Bamboo, and/or integrated this type of scripting with WebDriver scripts.

In my next post, I’ll be discussing Puppet. Now that automating server deployments is no longer a secret to you, how do you get your machines to automatically download packages, set up software, properly configure the firewall et cetera? I’ll attempt to address this and more shortly.

Sharepoint auditing – a few thoughts

I have a colleague that’s been updating a sharepoint permissions matrix lately. It’s a good practice (I’d go as far as to say that it’s a must) to maintain such a matrix, in a format that is understandable to non-technical folk. It’s good for IT departments, who need to periodically check that people have access to the right information. It’s good for auditors, who want to show that their clients are exercising due diligence in controlling their resources. And it’s good for staff, who need to know which of their peers has access to the company’s knowledge, information, and tools.

However, while prepping for her cross-checking work, she’d been lead to believe that there are no tools for collecting all of a user’s  permissions on sites and lists. Since I’ve heard this discourse before, I thought I should debunk this myth and write about a tool that the Sharepoint integrator can add to his or her arsenal, Sushi.

It’s a great little utility, which you can obtain and customize to your heart’s content here if you know how to write code: http://sushi.codeplex.com/

Basically, you download the binary (no need to even build the project from source! I understand that this is sometimes daunting to people) and, in a few clicks, you can get a report on what groups your users are a part of and what specific permissions were granted to the user. You can also list which sites or lists in a site collection do not inherit permissions, which helps you identify what you need to specifically audit.

The catch: if you’re looking for a tool that does all your work for you, prepare to be disappointed. This does all the footwork for you, makes it such that you don’t have to repeatedly go through every site and list in your instance and hit “list permissions” and “site permissions”. It’s up to you to provide a matrix that is comprehensible and readable by your non-technical audience.

Previously, I’d been messing with scripts to extract the data right from the source: the SQL database. I started massaging them into a few SSRS reports, but ran out of time and motivation. I still have the script around, somewhere. Frankly though, with a tool like Sushi out there, I’d be inclined to think that one is better off hacking a bit of code to allow admins to select multiple users and export the results as a XML file, JSON file, or even to an SQL database. Once that’s done, the raw information can be easily formatted with a tool like Qlikview.

 

CISPA – do you know what it is?

This one’s a short one, but it’s important. It concerns not just our family, friends and colleagues in the United States, even though they will be the ones most affected. This is about CISPA, the latest in a series of bills to strip people of their right to privacy.

CISPA will affect you. No matter where & who you are. Do you know why it’s important, and how it will change your world?

Regardless of which side you’re on, you owe it to yourself to know what it’s about:

Let’s consider this quote by Edmund Burke: “The only thing necessary for the triumph of evil is for good men to do nothing.”
I love and hate this quote. Love it because it is a powerful, simple sentence that conveys a strong message. Hate it because it’s so damn accurate.

Better netflow visualization, part II

I’ve been looking into netflow visualization tools since my little experiment last week and have seen a couple of interesting visualization tools. I checked out an article on IT World on visualizing netflow, which pointed to AfterGlow and GraphViz; I also looked at Plixer Labs’ blog; they offer commercial netflow visualization tools. Finally, I’ve been browsing through resources mentioned in Raffael Marty’s secviz.org site. Though these sites offer up impressive information on visualization tools (both commercial and free), I was unable to find anything about time-stepped visualization of traffic. This may in fact be a testament to my laziness and/or lack of trying, in which case I do apologize.

I figured that I should give time-stepped traffic visualization a shot on my own; so I dusted off my old college book on OpenGL and went to work figuring out how I might code a tool similar to codeswarm’s but intended for large volumes of traffic.

A quick shout-out to my wife (KRED on Research Salad) is in order, I believe. Kay, if you’re reading this: happy birthday, babe. Thanks for the ten years of laughs, sending me my favorite comic books from across the pond, and regularly assaulting me with a gazillion infosec links – don’t know how you can read all of them, I certainly can’t! Finally, thanks for putting up with the late nights and early mornings working, supporting this crabby ol’ geek through thick and thin!

Better netflow visualization with code_swarm coolness!

Howdy all,

In my last post, I may have mentioned codeswarm, a nifty tool for visualizing how frequently a software project gets updated throughout time. Since it’s an open-source project, I figured that it was worth having a look at the code and seeing if there are other uses for it.

If you check out the Google Code page, you’ll notice that the project isn’t terribly active – the last upload dates back to May 2009. But hey, it does what it’s supposed to do and it’s pretty straightforward.

Reading through the source files, in fact, use of the tool is super simple: you set up an XML file that contains the data to be used, you run Ant, and you let the program do the rest. The format of the sample data is very simple, frankly: a file name, a date, and an author.

So let’s see what other uses we could come up with. Here are a few ideas I thought might be cool:

  • What about adapting it to track your social media messages? First, if you’re following a lot of people, it would look wicked cool. Second, if you’re trying to prune your Follow list, that could be really practical for figuring out who’s the noisiest out there.
  • Sometimes when you’re trying to figure out bottlenecks in your traffic, it’s useful to have a decent visualization tool. Maybe this could be helpful!
  • Finally, you sometimes need a good way to track employee activities. Would this not be a kickass way to see who’s active on your network?

I decided to work on the second idea. I’m not looking to rework the code at this point, just to reuse it with a different purpose.

Prerequisites

To pull this off, you’re going to need the following:

  • The codeswarm source code and Java, so that you can run the code on your system
  • Some netflow log files to test out
  • flow-tools, so that you can process said netflow log files
  • A scripting language so that you can process and parse the netflow traffic into XML. My language of choice was ruby, but it could be as simple as bash.

 The netflow filter

Before we can parse the netflow statistics into the appropriate format, we need to know what we’ll be using and how to extract it. Here’s what I used: each IP endpoint should have its own line; the IP address maps to the “author” field (because that’s what is visible). The protocol and port will map to the filename field, and the octets in the flow will map to the weight field.

The following is the netflow report config file. You should save this in the codeswarm directory as netflow_report.config:

stat-report t1
 type ip-source/destination-address/ip-protocol/ip-tos/ip-source/destination-p$
 scale 100
 output
  format ascii
  fields +first
stat-definition swarm
 report t1
 time-series 600

If you save some netflow data in data/input, you can test out your report by running this line:

flow-merge data/input/* | flow-report -s netflow_report.config -S swarm

Parsing the netflow

If the report worked out correctly for you, the next logical step is to write the code to create the .XML file that will be parsed by codeswarm. You’ll want to set your input directory (which we’d said would be data/input) and your output file (for instance, data/output.xml).

Here’s the source code for my grabData.rb file:

#!/usr/local/bin/ruby
# Prepare netflow data for codeswarm.
$outputFilePath = "data/output.xml"
$outputFile = File.new($outputFilePath, "w")
$outputFile << "<?xml version="1.0"?>n"
$outputFile << "<file_events>n"
# Grab the netflow information using flow-tools
$inputDirectory = "data/input"
$input = `flow-merge data/input/* | flow-report -s netflow_report.config -S swarm`
# This is the part that gets a bit dicey. I believe that in order to properly visualize
# the traffic, we should add an entry for each party of the flow. That's exactly what we're
# Going to do. The "author" in this case is going to be the IP address. The "filename" will
# be the protocol and port. The weight will be the octets.
$input_array = $input.split("n")
$input_array.grep(/recn/).each do |deleteme|
 $input_array.delete(deleteme)
end
$input_array.each do |line|
 fields = line.split(",")
 last = fields[0]
 source = fields[1]
 dest = fields[2]
 srcport = fields[3]
 dstport = fields[4]
 proto = fields[5]
 octets = fields[8].to_i / 1000
$outputFile << " <event filename="#{proto}_#{srcport}" date="#{last}" author="#{source}" weight="#{octets}"/>n"
 $outputFile << " <event filename="#{proto}_#{dstport}" date="#{last}" author="#{dest}" weight="#{octets}"/>n"
end
$outputFile << "</file_events>"
$outputFile.flush
$outputFile.close

And we’re done! This should generate a file called data/output.xml, which you can then use in your code swarm. You can either edit your data/sample.config file or copy it to a new file, then run ./run.sh.

Reality Check

I was really excited when running my first doctored code swarm; unfortunately, though the code did work as expected, the performance was terrible. This was because the sample file that I used was rather large (over 10K entries). Probably considerably more than what the authors had expected for code repository checkins. Also, I suspect that my somewhat flimsy graphic card is unable to handle realtime rendering of the animation, so I set up the config file to save each frame to a PNG so I could reconstitute the animation later. Syntax for this is:

ffmpeg -r 10 -b 1800 -i %03d.jpg test1800.mp4

Moreover, I believe my scale was off; I changed the number of milliseconds per frame to 1000 (1 frame, 1 second).

The second rendering was much more interesting, but it did yield a heck of a lot of noise; let’s not forget that we’re working with hundreds, if not thousands, of IP addresses. However, if we do a little filtering we can probably make the animation significantly more readable.

All in all, this was a rather fun experience but a bit of a letdown. Codeswarm wasn’t meant to handle this high a volume of data, so it makes things tricky, and less readable than what I expected; if you play with your filters, you will definitely be able to see some interesting things but if you’re looking for a means to visually suss out what’s happening on your entire network, you are bound to be disappointed. By next time, I hope to talk a bit about more appropriate real-time visualization tools for netflow and pcap files, maybe even cut some code of my own.

>Ironkey settings stick, even in read-only mode

>I am writing this post as a bit of a sanity check, perhaps someone out there can help me by comparing notes or providing explanations 🙂

Yesterday, I was using my IK to perform a memory dump for forensic analysis on a system infected with a trojan. I’ve used a CD for this in the past but figured “why not just use my IK in read-only mode” — I popped my IK in, making sure I ticked the [I]read-only mode[/I] checkbox. No problems there, of course. Performed a memory dump, which I wrote to a throw-away USB stick, then ejected my IK.

You know how your settings stick from one session to another? I figured this was recorded when the IK checked into the management console. However, when I popped my IK into another machine this morning, I noticed that the settings had stuck.

When I do my forensic analyses, they are in a different location than client sites – this is why I am 100% certain that the machine was not connected to the Internet – wifi was off in any case (though the wifi switch on laptops is sometimes software-managed) but even if it were on, the machine wouldn’t have any AP to connect to. No ethernet or bluetooth connection either, of course.

My theory, therefore, is that the settings are stored on some RW volume on the IK. Can anyone tell me more about this? Is there some part of the manual that I’ve overlooked? What gets written to that volume? What FS does it have, and can it be infected with malware? This would be disconcerting.

Any insight would be very much appreciated 🙂

>”My VMWare log partition is full!” – problem, cause, mitigation

>Hello folks 🙂  Been a while since I’ve last posted. I keep making vows that I will post regularly, and do so for about a month — and then, things get hectic again and I forget this site’s very existence. My solution is for me to quit whinging about how irregularly I post and continue to post relevant shite. No use posting for the purpose of posting, methinks. Fair enough?

Anyway, I finally got something off my plate today. It’s something that I’ve been meaning to write about, namely because the reason for its occurrence is unintuitive, it’s a silly problem to encounter in a production environment, and it’s relatively easy to resolve:

The problem

I first encountered this issue a few months ago; we’d been knee-deep into virtualizing a dozen servers for a client when, suddenly, the ESX machines stopped being able to start VM’s. We thought “OK, that’s weird”, poked around the VSphere Center logs. Queue a puzzling message: “No space left on device”. That couldn’t be right: the SAN we were using was brand new and practically empty. Since nothing else was working, we restarted the servers.

You can probably guess what happened next: physical servers come back up, and now none of the VM’s will start. Luv’ly.

Fortunately, we did finally decide to open up an SSH session in order to check out the logs there to see if there were any additional clues… and discovered that the /var/log directory (which has its own partition) was chock full of logs.

The cause

VMWare’s KB article explains this problem in detail, and actually provides a decent resolution… But here’s why I think this is unintuitive: although these ESX (and ESXi) boxes are *nix servers, absolutely everything is administered via the vSphere client.

The offensive security perks

Want to mess with the sysadmin? Flood his/her ESX box’s syslog file! That’s right, folks — by virtue of flooding the syslog file, the admin won’t be able to start a VM, use vMotion, etc etc…

A solution

One possible way to prevent this kind of issue is to rotate your logs; there’s a good explanation of how this is done here. Setup is rather simple; as a matter of fact, you’ll find that many distros have log rotation implemented out-of-the-box… So why hasn’t VMWare? I’m speculating, but I would imagine that since the only purpose of ESX is to run other machines, VMWare decided that 1) the volume of logs was low enough that they could do away with it, 2) they actually wanted to keep logs from being overwritten for debugging purposes and 3) they figured that in the worst case scenario it would be a way for administrators to be tipped off that something was wrong in the first place. Since this is pure speculation, I won’t go into how bad an idea this was or how a more elegant solution could have been found.

Nevertheless, if you are not ecstatic about losing valuable log information due to rotation, you could possibly set up your ESX boxes to log to a centralized rsyslog server over TLS. This is something that you should consider doing anyway – log consolidation’s a pretty hot topic nowadays.

On my side, I’ve written a very simple bash script which you can set to run as a cron job. It checks how much disk space is used on the log partitions and sends a message to syslog if it’s above 97% – you can then configure syslog to log to another server or set up swatch to e-mail you if the message ever shows up in your syslog:

#!/bin/bash
export diskcheck=`df -h | grep /var/log | grep 9[789]%`
test -n $diskcheck && logger “Log disk is getting low on space: $diskcheck”

Silly, innit? But it works. Note, however, that if your log fills up really really fast, you might not get the message before it’s too late.

Well, that’s me for now. Back to work!

ADDENDUM: I’ve modded my script so that it can run as a service. The script below should be saved as /bin/vmwareDiskCheck.sh …

#!/bin/bash


doservice () {
  while true; do
   export diskcheck=`df -h | grep /var/log | grep 9[789]%`
   test -n “$diskcheck” && logger “Log disk is getting low on space: $diskcheck”
   sleep 10
  done
}


doservice &

… and this script should be saved as /etc/init.d/diskCheck:

#!/bin/bash
#
# Init file for VMWare Log partition check
#
# chkconfig: 2345 55 25
# description: Routinely checks that /var/log isn’t too full.
#
# processname: diskcheck

# source function library
. /etc/rc.d/init.d/functions

path=/bin/vmwareDiskCheck.sh

RETVAL=0

start() {
  $path &
}

stop() {
  # use pgrep to determine the forked process
  # kill that process
  proc=`pgrep vmwareDiskCheck`
  kill $proc
  RETVAL=1
}

case “$1” in
        start)
                start
                ;;
        stop)
                stop
                ;;
        restart)
                stop
                start
                ;;
        *)
                echo $”Usage: $0 {start|stop|restart}”
                RETVAL=1
esac
exit $RETVAL

Comments or improvements welcome!

ADDENDUM 2: If you prefer a cron job, you can drop a script in your /etc/cron.hourly/ directory with the following code (don’t forget to make your script executable!)

#!/bin/bash
  export diskcheck=`df -h | grep /var/log | grep 9[789]%`
  test -n “$diskcheck” && logger “Log disk is getting low on space: $diskcheck”

>What does the frontend of an online hacker store look like? Courtesy of Boing Boing.

>I thought this post was both a frightening and yet strangely entertaining thought. It has such a ‘hollywood’ feel to it — perhaps this is why it’s so dangerous.

You’d think that this is an unsustainable business; I mean, don’t admins change their passwords at least from time to time? Don’t vulnerabilities get fixed, making it impossible to find the password in the long run?

Yeah, right. Site admins are probably as conscientious as they can be given their time and budget constraints. Also, it’s increasingly common for organizations to have ‘site admins’ that have more of an editing / web design background than a sysadmin / web dev / infosec background — an unfortunate consequence of increased outsourcing of web development and increased usability of CMS systems.

What did you expect to see on a webmaster’s CV 5-10 years ago? Fluency in HTML, CSS, and javascript, intermediate to advanced knowledge in a scripting language such as PHP perhaps, maybe some working knowledge of Flash, and definitely some experience with some web design package (like Dreamweaver) or IDE (such as Visual Studio .Net, Eclipse — or hell, even WebMatrix). The site admin was expected to liaise with the Comms team or something in order to put the content on the web, and had little to no experience in the field of editing or journalism.

Nowadays, it’s the opposite effect: with easy-to-use tools such as Drupal, Joomla, DotNetNuke, or Sharepoint, you don’t need nearly as much hard skills in order to administer and maintain a website. I’d go as far as to say that to recruit an admin with a strong technical background would only lead to the person’s frustration and eventual resignation. However, it does mean that this new generation of site administrators is less likely to exercise proper caution — reading access logs, using secure passwords, performing routine security tests and code reviews, and following security feeds in order to reduce the chances of your site getting pwned.

Okay wise-ass, I can hear you say, thanks for stating the problem — now what’s the solution?


Sadly, there is no easy solution for this. Ideally, in a small to medium organization, you want the web team to have at least one person managing the content, layout and editing of your website — let’s face it, we techies are generally allergic to such things (anyone that’s worked with me knows not to mention colors in my presence – I get hives). That person is the main ‘business’ liaison and project champion — let’s call him/her the ‘web editor’. Then, on the technical side, you’d have one web development liaison, and one sysadmin liaison. You don’t want the person that’s writing the code to review the code, or checking the logs — each person has a set of responsibilities that compliments the others. Nobody’s stuck with a laundry list of responsibilities, routine checks are more likely to be performed and, provided that there’s adequate communication between parties, one generally avoids getting listed on such sites as mentioned above.

>Brucon 2010: a recap

>I was at Brucon 2010 last week, and it was a blast!

The ambiance at the con was very much reminiscent of Defcon’s: people talking passionately about security in a relaxed, geek-and-caffeine-rich environment.

In the past, when attending infosec cons I tend to go to all the talks — this time, I decided to go to as many workshops as possible. I must say, I was not disappointed at all — while talks are often absolutely fascinating and wildly entertaining,  workshops provide a chance to understand something at a much deeper level and allow you to test your knowledge of the topic; it also allows the speaker to tune her content to the audience in a much more interactive manner, providing more, or less, background information according to the crowd’s grasp of the subject. For instance, during the malicious PDF analysis workshop, Didier Stevens provided an overview of the PDF structure and started working through his samples, but quickly started skipping through examples he thought were obvious and allotting more time to the ‘juicy bits’.

The best part of a workshop, I’ve found, is that it provides you with an environment in which it’s OK to try something new — and it’s alright to mess up. I walked out of the hardware hacking village with a profound sense of accomplishment, having learned how to solder with Mitch Altman and how to program Arduinos with Fish. I’ve always been a fan of all things electronic, but up to the day I actually learned how to solder, my grasp of what was truly involved was somewhat fuzzy — you look at things very differently once you know what goes into making them.

I’m not going to cover the talks in detail; Peter did a fantastic job of that, so here’s his post. You should definitely read about the following talks:

  • Mikko Hypponen’s recount of the last 25 years of malware — which was just amazing
  • Joe McCray’s “You spent all that money and you still got 0wned” presentation; better yet, wait until the video’s out — the guy’s hilarious
  • Stephan Chenette’s presentation of Fireshark was really good, because he not only goes over what his tool does but covers the concept behind what he calls “malicious ecosystems”
  • Dale Pearson’s head hacking presentation gave me a fresh perspective on just how far social engineering could go — spooky, really. Check out his site, it’s extremely cool!