Analyzing a Trac SPAM Attempt

05 July, 2008

One of the best web interfaces for visualizing Subversion repositories (as well as providing integrated project management and ticketing functionality) is the Trac Project, and all Cipherdyne software projects use it. With Trac's success, it also becomes a target for those that would try to subvert it for purposes such as creating spam. Because I deploy Trac with the Roadmap, Tickets, and Wiki features disabled, my deployment essentially does not accept user-generated content (like comments in tickets for example) for display. This minimizes my exposure to Trac spam, which has become a problem significant enough for various spam-fighting strategies to be developed such as configuring mod-security and a plugin from the Trac project itself.

Even though my Trac deployment does not display user-generated content, there are still places where Trac accepts query strings from users, and spammers seem to try and use these fields for their own ends. Let's see if we can find a few examples. Trac web logs are informative, but sifting through huge logfiles can be tedious. Fortunately, simply sorting the logfiles by line length (and therefore by web request length) allows many suspicious web requests to bubble up to the top. Below is a simple perl script sort_len.pl that sorts any ascii text file by line length, and precedes each printed line with the line length followed by the number of lines equal to that length. That is, not all lines are printed since the script is designed to handle large files - we want the unusually long lines to be printed but the many shorter lines (which represent the vast majority of legitimate web requests) to be summarized. This is an important feature considering that at this point there is over 2.5GB of log data specifically from my Trac server.

$ cat sort_len.pl
#!/usr/bin/perl -w
#
# prints out a file sorted by longest lines
#
# $Id: sort_len.pl 1739 2008-07-05 13:44:31Z mbr  $
#

use strict;

my %url       = ();
my %len_stats = ();
my $mlen = 0;
my $mnum = 0;

open F, "< $ARGV[0]" or die $!;
while (<F>) {
    my $len = length $_;
    $url{$len} = $_;
    $len_stats{$len}++;
    $mlen = $len if $mlen < $len;
    $mnum = $len_stats{$len}
        if $mnum < $len_stats{$len};
}
close F;

$mlen = length $mlen;
$mnum = length $mnum;

for my $len (sort {$b <=> $a} keys %url) {
    printf "[len: %${mlen}d, tot: %${mnum}d] %s",
            $len, $len_stats{$len}, $url{$len};
}

exit 0;

To illustrate how it works, below is the output of the sort_len.pl script used against itself. Note that at the top of the output the more interesting code appears whereas the most uninteresting code (such as blank lines and lines that contain closing "}" characters) are summarized away at the bottom:

$ ./sort_len.pl sort_len.pl
[len: 51, tot: 1] # $Id: sort_len.pl 1739 2008-07-05 13:44:31Z mbr  $
[len: 50, tot: 1]     printf "[len: %${mlen}d, tot: %${mnum}d] %s",
[len: 48, tot: 1]             $len, $len_stats{$len}, $url{$len};
[len: 44, tot: 1] # prints out a file sorted by longest lines
[len: 43, tot: 1] for my $len (sort {$b <=> $a} keys %url) {
[len: 37, tot: 1]         if $mnum < $len_stats{$len};
[len: 34, tot: 1]     $mlen = $len if $mlen < $len;
[len: 32, tot: 1] open F, "< $ARGV[0]" or die $!;
[len: 29, tot: 1]     $mnum = $len_stats{$len}
[len: 25, tot: 1]     my $len = length $_;
[len: 24, tot: 1]     $len_stats{$len}++;
[len: 22, tot: 2] $mnum = length $mnum;
[len: 21, tot: 1]     $url{$len} = $_;
[len: 20, tot: 2] my %len_stats = ();
[len: 19, tot: 1] #!/usr/bin/perl -w
[len: 14, tot: 3] while (<F>) {
[len: 12, tot: 1] use strict;
[len:  9, tot: 1] close F;
[len:  8, tot: 1] exit 0;
[len:  2, tot: 5] }
[len:  1, tot: 6]

Now, let's execute the sort_len.pl script against the trac_access_log file and look at one of the longest web requests. (The sort_len.pl script was able to reduce the 12,000,000 web requests in my Trac logs to a total of 610 interesting lines.) This particular request is 888 characters long, but there were some other similar suspicious requests that had over 4,000 characters that are not displayed for brevity:


[len: 888, tot: 1] 195.250.160.37 - - [02/Mar/2008:00:30:17
-0500] "GET /trac/fwsnort/anydiff?new_path=%2Ffwsnort%2Ftags%2Ffwsnort
-1.0.3%2Fsnort_rules%2Fweb-cgi.rules&old_path=%2Ffwsnort%2Ftags%2F
fwsnort-1.0.3%2Fsnort_rules%2Fweb-cgi.rules&new_rev=http%3A%2F%2Ff
1234.info%2Fnew5%2Findex.html%0Ahttp%3A%2F%2Fa1234.info%2Fnew4%2F
map.html%0Ahttp%3A%2F%2Ff1234.info%2Fnew2%2Findex.html%0Ahttp%3A%2F
%2Fs1234.info%2Fnew9%2Findex.html%0Ahttp%3A%2F%2Ff1234.info%2Fnew6%2F
map.html%0A&old_rev=http%3A%2F%2Ff1234.info%2Fnew5%2Findex.html%0Ahttp
%3A%2F%2Fa1234.info%2Fnew4%2Fmap.html%0Ahttp%3A%2F%2Ff1234.info%2Fnew2
%2Findex.html%0Ahttp%3A%2F%2Fs1234.info%2Fnew9%2Findex.html%0Ahttp%3A
%2F%2Ff1234.info%2Fnew6%2Fmap.html%0A HTTP/1.1" 200 3683 "-"
"User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR
1.1.4322; .NET CLR 2.0.50727; InfoPath.2)"

My guess is that the above request is a bot that is trying to do one of two things: 1) force Trac to accept the content in the request (which contains a bunch of links to pages like "http://f1234.info/new/index.html" - note that I altered the domain so as to not legitimize the original content) and display it for other Trac users or to search engines, or 2) force Trac itself to generate web requests to the provided links (perhaps as a way to increase hit or referrer counts from domains - like mine - that are not affiliated with the spammer). Either way, the strategy is flawed because the request is against the Trac "anydiff" interface which doesn't accept user content other than svn revision numbers, and (at least in Trac-0.10.4) such requests do not cause Trac to issue any external DNS or web requests - I verified this with tcpdump on my Trac server after generating similar requests against it.

Still, in all of my Trac web logs, the most suspicious web requests are against the "anydiff" interface, and specifically against the "web-cgi.rules" file bundled within the fwsnort project. But, the requests never come from the same IP address, the "anydiff" spam attempts never hit any other link besides the web-cgi.rules page, and they started with regularity in March, 2008. This makes a stronger case for the activity coming from bot that is unable to infer that its activities are not actually working (no surprise there). Finally, I left the original IP address 195.250.160.37 of the web request above intact so that you can look for it in your own web logs. Although 195.250.160.37 is not listed in the Spamhaus DNSBL service, a rudimentary Google search indicates that 195.250.160.37 has been noticed before by other sites as a comment spammer.

05 July, 2008 | Trac | System Administration | By: Michael Rash

« SPA Talk at the Last HOPE Computer Security Conference

Mitigating DNS Cache Poisoning Attacks with iptables »

cipherdyne.org

Analyzing a Trac SPAM Attempt

Software

Linux Firewalls Book

Recent Blog Posts

Categories