Better netflow visualization, part II

I’ve been looking into netflow visualization tools since my little experiment last week and have seen a couple of interesting visualization tools. I checked out an article on IT World on visualizing netflow, which pointed to AfterGlow and GraphViz; I also looked at Plixer Labs’ blog; they offer commercial netflow visualization tools. Finally, I’ve been browsing through resources mentioned in Raffael Marty’s secviz.org site. Though these sites offer up impressive information on visualization tools (both commercial and free), I was unable to find anything about time-stepped visualization of traffic. This may in fact be a testament to my laziness and/or lack of trying, in which case I do apologize.

I figured that I should give time-stepped traffic visualization a shot on my own; so I dusted off my old college book on OpenGL and went to work figuring out how I might code a tool similar to codeswarm’s but intended for large volumes of traffic.

A quick shout-out to my wife (KRED on Research Salad) is in order, I believe. Kay, if you’re reading this: happy birthday, babe. Thanks for the ten years of laughs, sending me my favorite comic books from across the pond, and regularly assaulting me with a gazillion infosec links – don’t know how you can read all of them, I certainly can’t! Finally, thanks for putting up with the late nights and early mornings working, supporting this crabby ol’ geek through thick and thin!

Better netflow visualization with code_swarm coolness!

Howdy all,

In my last post, I may have mentioned codeswarm, a nifty tool for visualizing how frequently a software project gets updated throughout time. Since it’s an open-source project, I figured that it was worth having a look at the code and seeing if there are other uses for it.

If you check out the Google Code page, you’ll notice that the project isn’t terribly active – the last upload dates back to May 2009. But hey, it does what it’s supposed to do and it’s pretty straightforward.

Reading through the source files, in fact, use of the tool is super simple: you set up an XML file that contains the data to be used, you run Ant, and you let the program do the rest. The format of the sample data is very simple, frankly: a file name, a date, and an author.

So let’s see what other uses we could come up with. Here are a few ideas I thought might be cool:

  • What about adapting it to track your social media messages? First, if you’re following a lot of people, it would look wicked cool. Second, if you’re trying to prune your Follow list, that could be really practical for figuring out who’s the noisiest out there.
  • Sometimes when you’re trying to figure out bottlenecks in your traffic, it’s useful to have a decent visualization tool. Maybe this could be helpful!
  • Finally, you sometimes need a good way to track employee activities. Would this not be a kickass way to see who’s active on your network?

I decided to work on the second idea. I’m not looking to rework the code at this point, just to reuse it with a different purpose.

Prerequisites

To pull this off, you’re going to need the following:

  • The codeswarm source code and Java, so that you can run the code on your system
  • Some netflow log files to test out
  • flow-tools, so that you can process said netflow log files
  • A scripting language so that you can process and parse the netflow traffic into XML. My language of choice was ruby, but it could be as simple as bash.

 The netflow filter

Before we can parse the netflow statistics into the appropriate format, we need to know what we’ll be using and how to extract it. Here’s what I used: each IP endpoint should have its own line; the IP address maps to the “author” field (because that’s what is visible). The protocol and port will map to the filename field, and the octets in the flow will map to the weight field.

The following is the netflow report config file. You should save this in the codeswarm directory as netflow_report.config:

stat-report t1
 type ip-source/destination-address/ip-protocol/ip-tos/ip-source/destination-p$
 scale 100
 output
  format ascii
  fields +first
stat-definition swarm
 report t1
 time-series 600

If you save some netflow data in data/input, you can test out your report by running this line:

flow-merge data/input/* | flow-report -s netflow_report.config -S swarm

Parsing the netflow

If the report worked out correctly for you, the next logical step is to write the code to create the .XML file that will be parsed by codeswarm. You’ll want to set your input directory (which we’d said would be data/input) and your output file (for instance, data/output.xml).

Here’s the source code for my grabData.rb file:

#!/usr/local/bin/ruby
# Prepare netflow data for codeswarm.
$outputFilePath = "data/output.xml"
$outputFile = File.new($outputFilePath, "w")
$outputFile << "<?xml version="1.0"?>n"
$outputFile << "<file_events>n"
# Grab the netflow information using flow-tools
$inputDirectory = "data/input"
$input = `flow-merge data/input/* | flow-report -s netflow_report.config -S swarm`
# This is the part that gets a bit dicey. I believe that in order to properly visualize
# the traffic, we should add an entry for each party of the flow. That's exactly what we're
# Going to do. The "author" in this case is going to be the IP address. The "filename" will
# be the protocol and port. The weight will be the octets.
$input_array = $input.split("n")
$input_array.grep(/recn/).each do |deleteme|
 $input_array.delete(deleteme)
end
$input_array.each do |line|
 fields = line.split(",")
 last = fields[0]
 source = fields[1]
 dest = fields[2]
 srcport = fields[3]
 dstport = fields[4]
 proto = fields[5]
 octets = fields[8].to_i / 1000
$outputFile << " <event filename="#{proto}_#{srcport}" date="#{last}" author="#{source}" weight="#{octets}"/>n"
 $outputFile << " <event filename="#{proto}_#{dstport}" date="#{last}" author="#{dest}" weight="#{octets}"/>n"
end
$outputFile << "</file_events>"
$outputFile.flush
$outputFile.close

And we’re done! This should generate a file called data/output.xml, which you can then use in your code swarm. You can either edit your data/sample.config file or copy it to a new file, then run ./run.sh.

Reality Check

I was really excited when running my first doctored code swarm; unfortunately, though the code did work as expected, the performance was terrible. This was because the sample file that I used was rather large (over 10K entries). Probably considerably more than what the authors had expected for code repository checkins. Also, I suspect that my somewhat flimsy graphic card is unable to handle realtime rendering of the animation, so I set up the config file to save each frame to a PNG so I could reconstitute the animation later. Syntax for this is:

ffmpeg -r 10 -b 1800 -i %03d.jpg test1800.mp4

Moreover, I believe my scale was off; I changed the number of milliseconds per frame to 1000 (1 frame, 1 second).

The second rendering was much more interesting, but it did yield a heck of a lot of noise; let’s not forget that we’re working with hundreds, if not thousands, of IP addresses. However, if we do a little filtering we can probably make the animation significantly more readable.

All in all, this was a rather fun experience but a bit of a letdown. Codeswarm wasn’t meant to handle this high a volume of data, so it makes things tricky, and less readable than what I expected; if you play with your filters, you will definitely be able to see some interesting things but if you’re looking for a means to visually suss out what’s happening on your entire network, you are bound to be disappointed. By next time, I hope to talk a bit about more appropriate real-time visualization tools for netflow and pcap files, maybe even cut some code of my own.

>Extracting install files uploaded to Kace

>I’ve been working on Kace more and more recently, and I have come to realise that once you upload a binary file for a managed installation, you can’t download it again… At least, not easily. The following is one possible way for you to extract your binary back out of your Kace K1000 box — practical if you’ve lost or deleted your original file and do not wish to lose your work!

In order to proceed, you need to know a little about XML and how files work. You’re going to be working with a hex editor; if you’re not comfortable with that, you may wish to reconsider undertaking this little manipulation.

First, log into your Kace admin console over the web, then go to Settings > Security Settings. Scroll down to the Samba section and enable file sharing by ticking on the corresponding checkbox, and setting the admin password. Next, go to Settings > Resources > Export K1000 resources. Select your managed install package and under Actions, click on Export to Samba Share. This will effectively export your entire managed installation package to the \k1000clientdrop share.

Kace saves the configuration and binaries in a format that is relatively easy to read — a compressed XML file. It is saved as a file of extension .KPKG; if you rename the file to .ZIP, you can extract the underlying XML file to a location where you can work on it.

As mentioned before, you’ll need a hex editor in order to proceed. When working with Windows  I’ve used Olly even if it’s not really intended as a hex editor. If you’re a Linux buff, ghex is a great little tool, very simple and straightforward. For my experimentation, I went with HxD, which is free and is very much like ghex in terms of its simplicity.

Open up the XML and locate the beginning of your file. This is relatively simple if you’re used to working with raw files; if you’re not, you may find that this site might help you. I suspect that most of your binaries, like mine, will be self-extracting files — in other words, executables — in which case, the file header that you’re looking for is ‘4D 5A’ (that’s “MZ” in ASCII). If you truncate your XML file just before that, you should be good to go!

>Quick analysis of a trojan targetting swiss users

>

We’ve seen a couple of cases of this trojan hitting client computers lately; unfortunately, the security bulletin by the CYCO doesn’t have much yet in terms of information on IP addresses, domain names, or what else the trojan might be doing in the background, so I dusted off the old forensics toolkit and did a bit of digging.
Look at this bad boy! Innit unreal? Brilliant 🙂 I knew this kind of stuff was around but I must admit it’s the first time I encounter ransomware this targeted…

My colleague confirmed that this was only happening on the user’s account – not the local admin account present on the computer. So the first thing we did was run Sysinternals’ Process Monitor to identify what was causing the screen to appear. Note that we use Deep Freeze on users’ computers and the machine was frozen at the time of the infection, so it was likely that what was running was persisted on the user’s drive. I really wish that we could freeze everything but the user’s Desktop, My Docs, and Favorites – however, that seems to royally piss off our users. Would have prevented this from happening though.  Anyway, moving on. If you know that the only location where this executable could possibly exist is the user’s drive, it’s easy to identify the culprit:

No big surprise there — it’s running in the user’s Temp folder. Unsurprisingly as well, the user’s SoftwareWindows NTCurrentVersionWinlogon file has been modified to point the shell to that upd executable – that’s easily sussed out by using regripper or regdump. With regripper, we even get a timestamp of when this was done which will be useful for cross-referencing information later. 
OK great, so now we know where this thing is – how did it get there?
It was a bit harder to figure out how the hell the trojan got on the user’s computer, I’ll admit. I used Web Historian at first to identify any suspicious sites. I don’t know about the rest of you out there, but my experience is that when malware shows up on users’ computers, it’s typically because they’ve been downloading something illegal or, er, carnal. However, when looking at the user’s web history no alarm bells were going off. All good, clean, unremarkable sites. I went as far as to investigate the user’s mail store to see if the machine could have gotten infected by email – nothing suspicious there either. USB keys would have left a trace in the registry but since the machine was frozen, I wouldn’t be able to figure out if a key was inserted at the time of the infection. I therefore switched tactics and ran a timeline analysis of the user drive using sleuthkit. That’s when I found this:

The same minute the executable was written, something was written to the Java cache. Coincidence? Yeah right. I took a look at the index file, guess what I found?

If you decompile the JAR using jad, you get something like this:

If you check out the domain and IP address written in the index file, you’ll see that the domain is registered to a Russian registrant; the IP address traces back to the domain, but is hosted in the Netherlands.
That’s all the JAR file seems to do. I haven’t messed around with the upd.exe file yet, will probably do so sometime soon. In the meantime, I hope that you found this entertaining 😀 Should I be looking at anything else? Let me know.

>Ironkey settings stick, even in read-only mode

>I am writing this post as a bit of a sanity check, perhaps someone out there can help me by comparing notes or providing explanations 🙂

Yesterday, I was using my IK to perform a memory dump for forensic analysis on a system infected with a trojan. I’ve used a CD for this in the past but figured “why not just use my IK in read-only mode” — I popped my IK in, making sure I ticked the [I]read-only mode[/I] checkbox. No problems there, of course. Performed a memory dump, which I wrote to a throw-away USB stick, then ejected my IK.

You know how your settings stick from one session to another? I figured this was recorded when the IK checked into the management console. However, when I popped my IK into another machine this morning, I noticed that the settings had stuck.

When I do my forensic analyses, they are in a different location than client sites – this is why I am 100% certain that the machine was not connected to the Internet – wifi was off in any case (though the wifi switch on laptops is sometimes software-managed) but even if it were on, the machine wouldn’t have any AP to connect to. No ethernet or bluetooth connection either, of course.

My theory, therefore, is that the settings are stored on some RW volume on the IK. Can anyone tell me more about this? Is there some part of the manual that I’ve overlooked? What gets written to that volume? What FS does it have, and can it be infected with malware? This would be disconcerting.

Any insight would be very much appreciated 🙂

>Private browsing and forensics

>

Ever wondered whether the “private browsing” feature in your browser actually works?  

This article may shed some light on this topic for you. On a sour note, I’m completely shocked that Microsoft’s implementation of private browsing leaves something to be desired.

From a privacy advocate and defensive security perspective, I’m all for private browsing, both in the private and corporate world, and here’s the main reason: cookies, cached files and the like represent a significant security issue and a potential data leak. If your company uses webmail or an intranet and you’re consulting confidential files on the fly, that data gets stored locally on a machine. This constitutes a risk at the enterprise level that trumps the need for a forensically viable audit trail.

Private browsing isn’t a panacea, though: since data is stored in memory, malware that is already installed on the PC could scrape memory in search of interesting data (credit card numbers, credentials, etc etc.) — and not just malware, either. If you were at the European SANS forensics summit this year, you might have heard this guy talk about retrieving the contents of a machine’s memory using forensics tools.  Nor does it protect the user against a traditional network sniffer / MITM attack. Finally, it assumes that you actually bother to close your browser to clear that memory of sensitive data.

A lot of this is abstract for the layperson, so let’s provide a real-world scenario:

Let’s say you work for a pharma company and you’re waiting for a flight.  You’re bored, so you go to an internet café and open up your webmail. Your teammate’s sent you the latest draft of that report you’ve been working on, internally disclosing the findings of your latest research. You review the document, and fire her back an e-mail with your comments; you then leave the café and proceed to your gate. 

Risk #1: the PC you use isn’t an enterprise PC: to quote a memorable Mike Myers film, it’s the village bicycle of IT — everyone’s had a ride. What’s the café’s policy on updating its A/V? Is there regular maintenance? Does the machine get re-ghosted after every use? Is there a slot for a USB drive (and therefore a vector of infection)? Is the network traffic being sniffed (i.e. monitored)? It all depends on the owner of the café — there aren’t any laws or standards that oblige internet café owners to comply to basic security measures. For this risk, no amount of “private browsing” can help you – you may as well have broadcast your enterprise password and files on facebook.

Risk #2: that report you just looked as has pretty much become public property the minute you opened it up on that public machine. Not only can subsequent users of that PC retrieve your report, but the law will not be on your side (“you should have known better” will be the de facto response). Private browsing can help you there, provided that you close the browser, because the data is stored in memory and not on disk.

Risk #3: how often do people forget to log off? Very often. As a matter of fact, I don’t think there’s a single person on this planet that’s used a computer and has never, ever forgotten to log off. And yet, if you forget to log off when you walk away from that public PC, all of your company’s past, present and future secrets could be compromised. Ever heard of the switchblade USB key? It retrieves cached passwords very nicely, and almost instantaneously. Very difficult to use: you insert the key in the computer, wait thirty seconds, pull it out — voilà, passwords du jour. In this scenario as well, private browsing can be extremely useful, because it doesn’t allow cached passwords to be written to the disk.

So there you have it, straight from the horse’s mouth: private browsing may well make forensics more difficult, but it doesn’t make it impossible. That is an acceptable risk to me, given that it mitigates enterprise and personal risk of a security breach.