micmath blog
RecentMicrosoft Mime Types
I had a problem with some more recent Microsoft document formats not being recognized when downloaded from an Apache server. Ironically it was the IE 8 having the problem -- I think it was trying to be secure -- while FireFox knew what to do just fine. Any way, here's an .htaccess addition that will help our IE 8 friends out (with thanks to mcupples):
AddType application/vnd.ms-word.document.macroEnabled.12 .docm
AddType application/vnd.openxmlformats-officedocument.wordprocessingml.document docx
AddType application/vnd.openxmlformats-officedocument.wordprocessingml.template dotx
AddType application/vnd.ms-powerpoint.template.macroEnabled.12 potm
AddType application/vnd.openxmlformats-officedocument.presentationml.template potx
AddType application/vnd.ms-powerpoint.addin.macroEnabled.12 ppam
AddType application/vnd.ms-powerpoint.slideshow.macroEnabled.12 ppsm
AddType application/vnd.openxmlformats-officedocument.presentationml.slideshow ppsx
AddType application/vnd.ms-powerpoint.presentation.macroEnabled.12 pptm
AddType application/vnd.openxmlformats-officedocument.presentationml.presentation pptx
AddType application/vnd.ms-excel.addin.macroEnabled.12 xlam
AddType application/vnd.ms-excel.sheet.binary.macroEnabled.12 xlsb
AddType application/vnd.ms-excel.sheet.macroEnabled.12 xlsm
AddType application/vnd.openxmlformats-officedocument.spreadsheetml.sheet xlsx
AddType application/vnd.ms-excel.template.macroEnabled.12 xltm
AddType application/vnd.openxmlformats-officedocument.spreadsheetml.template xltx
Server Log Spam? Are You Kidding Me?
I've been busy setting up a new web server after an unhappy parting with my previous hosting provider (for the record, I'm now delighted to be with Rimu Hosting, the most professional and technically savvy service I've ever experienced in the ten-plus years I've been playing at these things).
So after tweaking and poking at things to make sure they were running as expected, I came across some very strange entries in my web server logs. Basically the pattern is something like this:
"GET /index.html?id=http://spam.example.com/ HTTP/1.0" 200 3167
And there are dozens and dozens of them. But why would anyone want to append such a strange querystring onto my homepage URL? Looking closely I notice the querystrings all contain very dodgy looking web addresses, the kind of addresses security experts warn you to stay away from. But what possible advantage would anyone gain from doing this?
After a little thought and some googling around I now know this is yet another way that spammers are poisoning the collective well that is our internet. Keep in mind the actual cost of getting their web address into my server log is nearly nothing, all they have to do is make a request to my homepage with an added querystring. This process could easily be automated and executed on hundreds or thousands of web sites every day. But even if this task is virtually free, what is the gain for the spammer?
Not surprisingly, in order for this to be of any advantage at all to a spammer, it requires a certain level of stupidity on the part of the web site owner. In this case the deserving victim needs to break the number one single most important law of web security:
Do not ever ever automatically post user-submitted content onto your web site. No, not ever!
The user-submitted content in this case is the querystring, and how is it getting automatically posted to web sites? Apparently some misguided folks actually link to their server log analytics, which will show that X many requests were made to /index.html?id=http://spam.example.com/.
Look, you have, on your web site, a public reference to the spammer's website - yay! And if your website is crawled by a search engine, they will see that you are endorsing that spammer by including his URL on your site. And this will raise the spammer's search engine rankings - or at least that's the theory. I doubt very much if any reputable search engine would actually fall for that.
So mystery solved, but is this such a bad thing? I mean no one's getting hurt right? Um, actually yes, I'm getting hurt, and so is everyone who maintains a website. The damage comes from all the bandwidth and CPU cycles wasted fulfilling the requests to automated spam bots. If, like me, you pay your hosting provider based on the bandwidth and CPU usage, then this cost is direct, even though I never have nor would publish my web logs publicly. And if you think you're getting a certain number of visitors to your web site based on your log analysis, you can bet the real number (after taking the bots away) is actually much lower.
I don't want to waste any resources generating pages for spam bots. Some of my pages require database connections and web application code to run, so why should I do that dozens of times a day for no reason?
Solution: The following code can be added to an .htaccess file and will immediately return a Forbidden result to any request that includes a fully-qualified URL in the querystring.
RewriteEngine On
RewriteCond %{QUERY_STRING} ^.*https?://.*$ [NC]
RewriteRule .* - [F,L]
Of course these requests will still be logged, but my goal is to prevent my server's resources being stolen. Anyone who automatically makes their log statistics public deserves whatever pain they get as a result, but I'm not one of them.
