Wednesday, February 11, 2015

Building your own LovelyHorse monitoring system with Maltego (even the free version) - it's easy!

Someone linked me to the [LovelyHorse] thingy. If you missed it - it's basically a GCHQ NSA document that was leaked containing a list of a few security related Twitter accounts that the GCHQ NSA was supposedly monitoring. Seeing that, since the last release, we have some interesting Twitter functionality in Maltego, I figured it be interesting to see how we can replicate their work.

First - manually

Before even starting with Maltego I first spent some time thinking about what I really wanted from this and did it all by hand (still in Maltego, but before we start to automate the process). As a start I'd need to get the people's Twitter handles. Well that's easy - the document lists them all. In Maltego I can start with an alias and run the transform 'AliasToTwitterUser' to get the actual Twitter handle:


I want to get the Tweets that the people wrote. There's a transform for that too - 'To Tweets [that this person wrote]'.


OK great - now I have the last 12 Tweets (my slider was set to 12). What information can I extract from the Tweet itself - keeping in mind that I want to end up doing this across 36 different handles? Well - possibly extract any hashtag, any URL mentioned in the Tweet and any other Twitter user's handle. There are transforms for all those.


Running those on a single Tweet we get something like this:


Note how the http://t.co links are nicely resolved. This Tweet didn't contain any other aliases - so you only see the hashtags and the URLs.

If we select all the Tweets and run the 3 transforms across them we see that there are some matches - in this case on hashtags 'infosec and malware':


Now, this is not really interesting at all but it's a starting point. When I do the same for the last 12 Tweets of all of the lovely horses (as I'll call this group of Twitter handles) I might see some pattern.

Now - all the horses

I copy the text from the PDF and paste it into a text editor - Notepad will do. Clean it up a bit and we have:

Select all and paste into Maltego. It will result in every line being mapped as a phrase. Select all the entities (control A) and change the type to 'Alias' and run the 'AliasToTwitterUser' transform on all the phrases - like we did at the beginning, except now we're doing it on all the aliases. It should look something like this:

At this stage I can get rid of the Aliases because I am not going to use it anymore. I click on 'Select by Type' on the ribbon (Investigate) and select 'Alias'. Delete - and they're gone. I do a re-layout, select them all and run the 'To Tweets [that this person wrote]' - this time on all of them. Essentially I am repeating the entire process we did - but this time on all the lovely horses.

When the transforms complete the graph now looks like this:


All that's left is to run the 3 transforms (Pull URL/hashtag/alias) on all the Tweets. To select all the Tweets quickly I use 'Select by type' - Twit again. This takes a while to complete...but when Maltego has pulled out all the hashtags, URLs and aliases from the last 12 Tweets of all the lovely horses it looks like this:

No doubt this looks like ass. It's because the block layout is not really suited for this type of graph. But click on Bubble View:


and you get:


Let's get real

I wont lie - I've been spoon feeding up to now. Let's stop now - else this blog post is going to morph into a book. I am going to assume that you have a bit of Maltego experience under the belt by now. 

The way we've been doing up to now is really not terribly interesting or accurate. We're getting the last 12 Tweets. What we really want is all the Tweets in the last X seconds. Imagine that one horse hasn't been on Twitter in 14 days - then matching his/her Tweets to what's happening right now does not make a lot sense (in a monitoring scenario). We need to introduce the idea of a sliding time window. The Twitter transforms did not support that.

Didn't. Does now. Well - the 'To Tweets [that this person wrote]' does now. I hacked it quickly. Anton will not approve...but it works as it says on the tin. I've added a transform setting called 'Window' - by default 0 but when changed implements this 'in the last X seconds'. 

Now it becomes interesting when combined with machines (scripted transforms) - especially perpetual machines.  Consider the following machine:

machine("axeaxe.LovelyHorse", 
    displayName:"LovelyHorse", 
    author:"RT",
    description: "Simulates the GCHQ's LH program") {

    onTimer(240) {
        type("maltego.affiliation.Twitter",scope:"global")
        run("paterva.v2.twitter.tweets.from",slider:15,"window":"1800")
        paths{
            run("paterva.v2.pullAliases")
            run("paterva.v2.pullHashTags")
            run("paterva.v2.pullURLs")
            run("paterva.v2.TweetToWords")
        }

//half hour + half hour = one hour
        //the entities will be deleted if older than half hour
        //but the transforms time frame adds another half hour
        age(moreThan:1800, scope:"global")
        type("maltego.Twit")
        delete()
        
        age(moreThan:1800, scope:"global")
        incoming(0)
        outgoing(0)
        delete()
    }
}

Let's take a look. We run our sequence every 6 minutes ( 4 x 60s = 240s). We grab all the Twitter handles and get the Tweets - but 15 in total and only if it was written in the last half an hour (30 x 60 = 1800s - we set the window parameter to 1800. After this it's plain sailing - we get the Aliases, hashtags and URLs. 

At some stage we need to get rid of old Tweets - else our graph will just grow and grow and grow. So there's a little logic to delete nodes when they're older than half an hour. This  means that at any stage we have a one hour view on the activity of the horses. One hour - because on the limit the initial transform can contain a Tweet that's 30 minutes old and it will stay on the graph for another 30 minutes. 

The resulting graph will show us when they are Tweeting the same keyword (courtesy  of the 'TweetToWords' transform), hashtag, mention the same website or mention the same Twitter handle in their Tweets. And if they are not active on Twitter - then the graph won't contain outdated info. 

Of course - the values can be changed depending on how closely you want to monitor the situation - if the resolution is a day then the values should be (24 x 60 x 60) /2 and you should 1) up the number of Tweets returned in the slider value (as TheGrugq Tweets waaay more than 15 Tweets in a day) and 2) you shouldn't have to poll every 4 minutes.

Advanced
Right - so what we REALLY want is something that can tell us when we more than X horse's Tweets are linking to the same thing (be that a website/hashtag/whatever). For that we can't just use the 'incoming()' filter because one person could be sending ten Tweets mentioning the same website and it would mean that the website has ten incoming links. No - it has to have unique starting nodes (the horses).

We have that filter. It's called 'rootAncestorCount()'. So now - with a combo of bookmarks and this filter hackery we build something like:

        //if an entity links to moreThan 2 horses & 
        //we haven't seen it before  - mail
        incoming(moreThan:1, scope:"global")
        rootAncestorCount(moreThan:2)
        bookmarked(1,invert:true)
        run("paterva.v2.sendEmailFromEntity",EmailAddress:"roelof@paterva.com",EmailMessage:"Multiple horses mentioned: !value!",EmailSubject:"Horse Alert")
       
        //this is to ensure we don't email over & over
        incoming(moreThan:1, scope:"global")
        rootAncestorCount(moreThan:2)
        bookmark(1)


Basically what happens here is that we check for all entities with more than one incoming link (this can only be hashtags/URLs/words/aliases) and find the ones that have more than 2 unique grandparents (e.g. horses). If we find them, and we haven't seen them before (this Boolean flag is implemented with a bookmark) we mail the value out. We do the mailing with a transform that we wrote (and for obvious reasons cannot make public else it will be used for spam). It's not rocket science tho.

Such a machine can run for days...resultant graph for today (it's almost midnight), when configured with a one day window looks like this:


Highlighted with entire path here is the hashtag 'security' - no surprise here. The other one was the alias DaveAitel (again not suprising). Below is the email received. Remember that we'll only receive email ONCE per alert, that it's only when 2 or more horses links to it and only if it happened within a day.



The complete machine looks like this (please change values as needed - speed / resolution /etc):

machine("axeaxe.LovelyHorse", 
    displayName:"LovelyHorse", 
    author:"RT",
    description: "Simulates the GCHQ's LH program") {

    onTimer(600) {
        //find Twitter handles on graph
        type("maltego.affiliation.Twitter",scope:"global")
        
        //run to Tweets transform
        run("paterva.v2.twitter.tweets.from",slider:30,"window":"43200")
        
        //extract Alias/Hashtags/URLs and uncommon words
        paths{
            run("paterva.v2.pullAliases")
            run("paterva.v2.pullHashTags")
            run("paterva.v2.pullURLs")
            run("paterva.v2.TweetToWords")
        }

        //if an entity links to more than 2 unique horses & 
        //we haven't seen it before  - mail it out
        //comment this entire section if you don't have a mailing transform

        incoming(moreThan:1, scope:"global")
        rootAncestorCount(moreThan:2)
        bookmarked(1,invert:true)
        
run("paterva.v2.sendEmailFromEntity",EmailAddress:"roelof@paterva.com",EmailMessage:"More than 2 horses mentioned: !value!",EmailSubject:"Horse Alert")       
        
        //this is to ensure we don't email over & over
        incoming(moreThan:2, scope:"global")
        rootAncestorCount(moreThan:2)
        bookmark(1)
        
        
        //delete nodes when they grow old
//half hour + half hour = one hour
        //the entities will be deleted if older than half hour
        //but the transforms time frame adds another half hour
        age(moreThan:43200, scope:"global")
        type("maltego.Twit")
        delete()
        
        age(moreThan:43200, scope:"global")
        incoming(0)
        outgoing(0)
        delete()
    }
}

I hope you've enjoyed this (waaaaay too long) blog post on how our thinking goes. Of course - you get a lot more understanding of these things if you do it yourself. All of the above functionality exists in the (free) community edition of Maltego too - although there you probably want to monitor shorter intervals (say 15 minutes) as you can only display 12 Tweets per person. All in all - that's probably better.. ;)

'laters,
RT

PS: for more information on machines check out our newly built dev portal at [http://dev.paterva.com/developer]. The machine syntax etc. is located in 'Advanced'. 

And also - we made a video some time ago that shows the same kind of principle - it's [ here ]

No comments:

Post a Comment