Server Log Spam? Are You Kidding Me?

I've been busy setting up a new web server after an unhappy parting with my previous hosting provider (for the record, I'm now delighted to be with Rimu Hosting, the most professional and technically savvy service I've ever experienced in the ten-plus years I've been playing at these things).

So after tweaking and poking at things to make sure they were running as expected, I came across some very strange entries in my web server logs. Basically the pattern is something like this:

"GET /index.html?id=http://spam.example.com/ HTTP/1.0" 200 3167

And there are dozens and dozens of them. But why would anyone want to append such a strange querystring onto my homepage URL? Looking closely I notice the querystrings all contain very dodgy looking web addresses, the kind of addresses security experts warn you to stay away from. But what possible advantage would anyone gain from doing this?

After a little thought and some googling around I now know this is yet another way that spammers are poisoning the collective well that is our internet. Keep in mind the actual cost of getting their web address into my server log is nearly nothing, all they have to do is make a request to my homepage with an added querystring. This process could easily be automated and executed on hundreds or thousands of web sites every day. But even if this task is virtually free, what is the gain for the spammer?

Not surprisingly, in order for this to be of any advantage at all to a spammer, it requires a certain level of stupidity on the part of the web site owner. In this case the deserving victim needs to break the number one single most important law of web security:

Do not ever ever automatically post user-submitted content onto your web site. No, not ever!

The user-submitted content in this case is the querystring, and how is it getting automatically posted to web sites? Apparently some misguided folks actually link to their server log analytics, which will show that X many requests were made to /index.html?id=http://spam.example.com/.

Look, you have, on your web site, a public reference to the spammer's website - yay! And if your website is crawled by a search engine, they will see that you are endorsing that spammer by including his URL on your site. And this will raise the spammer's search engine rankings - or at least that's the theory. I doubt very much if any reputable search engine would actually fall for that.

So mystery solved, but is this such a bad thing? I mean no one's getting hurt right? Um, actually yes, I'm getting hurt, and so is everyone who maintains a website. The damage comes from all the bandwidth and CPU cycles wasted fulfilling the requests to automated spam bots. If, like me, you pay your hosting provider based on the bandwidth and CPU usage, then this cost is direct, even though I never have nor would publish my web logs publicly. And if you think you're getting a certain number of visitors to your web site based on your log analysis, you can bet the real number (after taking the bots away) is actually much lower.

I don't want to waste any resources generating pages for spam bots. Some of my pages require database connections and web application code to run, so why should I do that dozens of times a day for no reason?

Solution: The following code can be added to an .htaccess file and will immediately return a Forbidden result to any request that includes a fully-qualified URL in the querystring.

RewriteEngine On
RewriteCond %{QUERY_STRING} ^.*https?://.*$ [NC]
RewriteRule .* - [F,L]

Of course these requests will still be logged, but my goal is to prevent my server's resources being stolen. Anyone who automatically makes their log statistics public deserves whatever pain they get as a result, but I'm not one of them.


permalink | Tags: apache.

Two ways of Constructing a Software Design

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.

Lean Software Strategies By Peter Middleton, James Sutton


permalink | Tags: quotes.

Credit Crunch, Finally Explained

Why is my bank balance being punished by this thing nicknamed a "credit crunch?" What the hell even is that? I've thought that plenty of times during the last year and never actually tried to find the answer. The science of Economics is too full of gobbly-talk to be readable to simple little me.

Well, it turns out the answer is not all that complicated, and for the first time I've found a plain and simple explanation.


Movember

Few families have not been affected by the cancers that are specific to men (testicular cancer and prostate cancer, for example) yet, compared to women, men are far less informed about how to look after their own health in this regard.

Every year about 35,000 men in the U.K. are diagnosed with prostate cancer and about 10,000 men die from of the disease. A British man has a 1 in 11 lifetime risk of developing prostate cancer. It is now the most common cancer diagnosed in men in the U.K. with at least one man dying every hour from the disease.

Testicular cancer mainly occurs in men aged 18 to 35, but can occur any time after the age of 15. The number of men diagnosed with testicular cancer has grown by approximately 88% over the last 34 years, but the good news is testicular cancer is highly curable when found early and treated promptly.

Yeah, I know, it's not the kind of thing we guys like to talk about but that's why there's Movember. The aim of Movember is to make men's health a bit more fun by putting the "Mo" back on the faces of English men, and in the process raise awareness and funds for key male health issues.

I've had a beard of some sort since I was 17 so there's not much more I can do to support the facial hair cause than I already have, but if I were to follow the rules of Movember exactly I would start the month clean-shaven, and spend 30 days letting the 'tache be it's own bad self. Most people have never seen me clean-shaven, but you will now...

My son cried.

My daughter said I "looked stupid."

Which reaction will you have?

You can support the campaign that makes children cry by clicking the following link: https://www.movember.com/uk/donate/donate-details.php?action=sponsorlink&rego=2034166&country=uk and making a small donation to men's cancer charities online. Or simply visit the website and learn more about men's cancers. Oh and why not join in on the fun and grow your own? Who knows, you might not look as stupid as I do!

The Sweet Smell of Movember


The Best Day Ever


permalink | Tags: Obama.

Americans Abroad for Obama

Like many American citizens living outside of America, I am pretty much an afterthought when it comes to US election campaigns. I'm glad to say that is beginning to change.

But why shouldn't politicians be interested in my vote: I do vote just like any other US citizen, and have in every presidential election since I was 18. Actually, considering the appalling rate of voter turnout in the States, I vote more than the average US citizen.

And as a member of Democrats Abroad in the UK, I can also vote in primary elections too. This year we sent 22 delegates to Denver, including "super-delegates."

And no, the fact that my house isn't in the US does not mean I can't legally donate dollars to the US candidate of my choice. Yet it is impossible to find any online form that will process a campaign donation if you don't enter the name of the US state you live in. This limitation is a complete oversight on the candidate's part, it is not based on any legal restriction whatsoever: basically we Americans living abroad have just been ignored when it comes to fundraising.

Well, that's how it used to be. But I'm thrilled to say that the candidate I support, Barack Obama, is doing things differently this year. Not only does his website accept contributions from Americans overseas (with a valid US passport number), but he is actively encouraging our participation in his election efforts.

Nice to see the "change" for the better beginning already.


permalink | Tags: Obama.

"Good on you"

I'm starting a new blog tag to collect my experiences with the two different languages we Americans and us British use. Not that I think it would be all that interesting, but I think it is, and there's so much of it I want to keep track.

So "good on me!" Um, yeh. Sounds fruity to the American ear, where we want to say "good for me." Not sure why the goodness should be on me in the UK and for me in the US, but there seldom seems to be any reason to these things. I suspect most of these posts will come to that conclusion.


Peacock

I saw a InachisĀ io darting around our back yard yesterday. It didn't take long to identify because of its distinctive marks.

The photos make it look gorgeous, but in reality it looks even better!


Simulating Pass By Reference in JavaScript

I've tried to clarify for myself something about how values are passed to functions in JavaScript, and it goes something like this:

  • Variabless are always passed by value.
  • Properties of objects act like references.
  • Passing an object to a function passes the value of the object.
  • The value of a reference acts like a reference to the same place in memory.

Okay, so how does this help us simulate pass-by-reference in JavaScript? Simple, if you want to pass a reference to a variable, you just need to make the variable a property of an object. Pass that object to teh function and you indirectly pass a reference to the variable.

This is especially useful when you want to write a function that works as a mutator -- a function that directly changes the variable that is passed to it.

Typically the pattern often used in JavaScript is that you pass the value of the variable, and the function returns a brand new value. This is really the only option if you are limited to pass-by-value. But, if you want to modify the variable passed in directly you must have pass-by-reference, and that's where this post is all leading up to.

var name = {ref: "Bob"}; // create a reference

function mutate(n) {
    if (n.ref) n.ref += "-O!";

    // no need to return anything, we changed the name directly
}

mutate(name);
alert(name.ref); // dereference

Mutators in JavaScript

I was surprised today by an unexpected behavior of JavaScript. In most computing languages the idea of references is often confusing to new programmers, but for some reason this is not emphasized as much in JavaScript. However it is not a topic that can be safely ignored.

var arr = [1, 2, 3];

function foo(arr) {
    arr.shift();   // modifies the array "arr" refers to
    arr.push("a");
    arr[3] = "b";
}
foo(arr);

console.log(arr); // [2, 3, "a", "b"]

A Billion Octothorps

I can tell you from experience that there can be a lot of confusion when Americans speak to Brits about this symbol here → #

You can't use the word "pound," because they've already got a different symbol described by that word: £

The word info website lists some other possible names:

... crosshatch, hash, numeral sign and number sign;

In order to be equally unfair to all, I propose we start using the term "octothorp" instead. From Robert Bringhurst, The Elements of Typographic Style; 2nd edition:

Otherwise known as the numeral sign. It has also been used as a symbol for the pound avoirdupois, but this usage is now archaic. In cartography, it is also a symbol for village: eight fields around a central square, and this is the source of its name. Octothorp means eight fields.

And while I'm on the subject, there are the occasional Britcentrics who will correct you if you try to use the term "billion" to mean this number: 1,000,000,000. Apparently, in the past, Britain used what's called the "long scale" numbering system in which:

  • 1 billion = 1 million millions, 1,000,000,000,000
  • 1 trillion = 1 million billions, 1,000,000,000,000,000,000
  • and so on.

For the life of me I can't imagine why the UK would want to use a numbering system where each unit becomes exponentially larger than the same unit used by the US. Even the British government realized that was a bad idea and switched to the "short scale" numbering system in 1974 (presumably that was about the time their budgets starting regularly getting up into those figures). Today all UK government and national media (including the BBC) use the same system as the US: the short scale.


The Fifth of July

Rowan and Molly standing outside in front of their homemade American flag.

Not surprisingly England doesn't officially celebrate the rebellion of it's former break-away colony, so we were a little on-our-own in the festivities today. But we tried: we had veggie hotdogs, pink lemonade and watermelon under a paper American flag.

The kids are getting small lessons in their New York City heritage from me. This morning we watched the Coney Island hotdog eating contest on youtube. You want heritage? I got yer heritage right here!


Properties of Primitives

I was tripped up momentarily today, swept away by the flexibility of JavaScript, where you can add and modify properties of anything whenever you feel the whim to do so. Well, it turns out that there are a few exceptions to that rule: you can't add properties to primitive data types. JavaScript has the following primitives:

  • Number
  • String
  • Boolean
  • Undefined
  • Null

And the example that shows a primitive resisting my attempt to stick a property onto it:

var s1 = "quick brown fox";
s1.lang = "en";
alert(s1.lang); // undefined

But there is a solution if you want to do this sort of thing: you must wrap your primitive data up in an object-instance cloak.

var s2 = new String("quick brown fox");
s2.lang = "en";
alert(s2.lang); // en

A few more examples of primitive wrappers (you cannot construct a new Undefined or Null):

  • var value = new Number(3.14);
  • var value = new String("hello");
  • var value = new Boolean(false);

Interestingly these wrappers affect the result returned by typeof, causing them to all become "objects" whose constructor is equal to either Number, String or Boolean. You do some coersion with string to promote it to an object and then demote it back to a primative, like so:

// start with a primitive
var s1 = "hello";
alert(typeof s1); // string
alert(s1.constructor == String); // false

// promote to an object
var s2 = new String(s1);
alert(typeof s2); // object
alert(s2.constructor == String); // true

// demote back to a primitive
var s3 = ""+s2;
alert(typeof s3); // string
alert(s3.constructor == String); // false

Simulating Recursive Regex in JavaScript

I've been playing around with the idea of writing lexers in JavaScript, it seems like a language that should be well suited to the task. Unfortunately rhino's regex engine isn't the most powerful on Earth.

While researching this I found an interesting link on Jon Aquino's Blog that discusses two features I wish rhino had: recursive references to captured matches and named references to captured matches, both apparently are available in PHP and other languages too.

I tried to simulate something similar in JavaScript with questionable success: this example shows a method for working outwards from the innermost parenthesized content, out to the outermost, evaluating as it goes.

var str = "(10+(6+(1+1)*(3+2)))";

while (str.indexOf("(") > -1) print(str), str = str.replace(
    /\(([^()]+)\)/g,
    function() { return eval(arguments[1]); }
);

print(str);

// (10+(6+(1+1)*(3+2)))
// (10+(6+2*5))
// (10+16)
// 26

Edit: Is Steven Levithan's XRegExp library a solution?


Resig: BBC Removing Microformat Support

John Resig mentions the recent discussion at the BBC regarding some of the issues with Microformats. I know it wasn't a discussion that was undertaken lightly: there are many developers at The Beeb who are passionate about standards, accessibility and Microformats.

The important point to remember here is that, unlike nearly every other commercial company, publically-funded organizations such as the BBC are mandated to make their content as accessible as possible.

So what happens when a screen reader sees an example of a Microformat date like this?

Am I childish because I am looking forward to a big party on
<abbr class="date" title="2008-07-16T13:06:00EST">my birthday</abbr>?

Apparently some will read out that long string of numbers in the title. It's hard to fault them, it is labeled a "title" after all. But it's not being used as a title here, and the phrase isn't even an abbreviation. Hopefully this discussion will inspire a more accessible solution.

Edit: Seems the BBC are looking into the RDFa format while the Microformats people debate what to do about their dates.


Detecting JavaScript Arrays

It can be difficult dealing with JavaScript's duck-typing when you just want to know if a given object is or isn't an array. Especially as the typeof operator will return "object" for an array -- true but not very specific.

Douglas Crockford suggests the following as a good (but not perfect) technique to determine if you have an array:

function isArray(value) {
    return value &&
        typeof value === 'object' &&
        typeof value.length === 'number' &&
        typeof value.splice === 'function' &&
        !(value.propertyIsEnumerable('length'));
};

JavaScript Arrays or Objects?

If you want to keep a series of data together in a collection JavaScript provides two built-in choices: array or object. Douglas Crockford writes in his book JavaScript: The Good Parts:

The rule is simple: when the property names are small sequential integers, you should use an array. Otherwise, use an object.

It's not that simple. In practice this decision is going to be weighted one way or the other based on your usage of the data.

For example, if you have a collection of employee objects where the employee id numbers happen to be small sequential integers, you might decide to go with an array. But what if you find that most of the time you need to extract employees based on their name property?

function getElementsByProperty(property, value, array) {
    var found = [];
    for (var i = 0; i < array.length; i++) {
        if (array[i][property] == value) {
            found.push(array[i]);
        }
    }
    return found;
}

var susan = getElementsByProperty("name", "Susan Smith", employees)[0];

Not pretty and not fast, especially on big arrays. But if that property you are searching for is unique for any element it becomes a primary key and you can do something much more elegant by creating employees as an object:

var susan = employees["Susan Smith"];

Okay that looks a lot nicer, so maybe we should try to use objects with primary keys? There are two problems you could have. The first has to do with order: in JavaScript the order of keys in an object is not guaranteed to be the same as the order in which you added them, so if the order of your elements is important you can't use an object.

And there's a second problem: what if you frequently need to modify and read the total number of employees? Unlike arrays, there is no magical length property to objects, instead you'll have to loop over every key in the object, incrementing a counter as you go. We're back to "not pretty and not fast" again.

So your choices look more like this:

  • If the order of the elements must be predictable: use an array.
  • If you will need a fast, simple way to get the number of elements: use an array.
  • But if you will need a fast, simple way to access elements by a string primary key: use an object.

And if you need some combination, you must resort to one of the not pretty, not fast approaches.


A Handful of Darkness

I just finished reading the Philip K Dick collection of short stories A Handful of Darkness today and I'm glad to say there were some real gems in there. Strangely the ones I liked best were all atypical of the futuristic sociopolitical robot thriller stories he became famous for. "The Cookie Lady" for example is as subtle and reminiscent as any story by Ray Bradbury or Clifford D Simak. Most surprising of all was the story "Upon the Dull Earth" which has vampiric angels and parallel universes that reminded me of Matheson's I Am Legend.

I also have to mention another great story called "The Builder" which also manages to blend religion and realistic fantasy. Like most of the stories in this collection, this one has a tightly twisted ending which is easy to see coming. Nevertheless, when the ending came, it was delivered with such sure-handed deftness I just had to smile and go back and read it again. Absolutely superb!

A great collection that really shows off Dick's creativity and range. I borrowed this copy from a friend but now I think I want to get a copy for myself.


tags

archive

more blogs