micmath blog
RecentResig: BBC Removing Microformat Support
John Resig mentions the recent discussion at the BBC regarding some of the issues with Microformats. I know it wasn't a discussion that was undertaken lightly: there are many developers at The Beeb who are passionate about standards, accessibility and Microformats.
The important point to remember here is that, unlike nearly every other commercial company, publically-funded organizations such as the BBC are mandated to make their content as accessible as possible.
So what happens when a screen reader sees an example of a Microformat date like this?
Am I childish because I am looking forward to a big party on
<abbr class="date" title="2008-07-16T13:06:00EST">my birthday</abbr>?
Apparently some will read out that long string of numbers in the title. It's hard to fault them, it is labeled a "title" after all. But it's not being used as a title here, and the phrase isn't even an abbreviation. Hopefully this discussion will inspire a more accessible solution.
Edit: Seems the BBC are looking into the RDFa format while the Microformats people debate what to do about their dates.
Two ways of Constructing a Software Design
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.
Lean Software Strategies By Peter Middleton, James Sutton
Simulating Recursive Regex in JavaScript
I've been playing around with the idea of writing lexers in JavaScript, it seems like a language that should be well suited to the task. Unfortunately rhino's regex engine isn't the most powerful on Earth.
While researching this I found an interesting link on Jon Aquino's Blog that discusses two features I wish rhino had: recursive references to captured matches and named references to captured matches, both apparently are available in PHP and other languages too.
I tried to simulate something similar in JavaScript with questionable success: this example shows a method for working outwards from the innermost parenthesized content, out to the outermost, evaluating as it goes.
var str = "(10+(6+(1+1)*(3+2)))";
while (str.indexOf("(") > -1) print(str), str = str.replace(
/\(([^()]+)\)/g,
function() { return eval(arguments[1]); }
);
print(str);
// (10+(6+(1+1)*(3+2)))
// (10+(6+2*5))
// (10+16)
// 26
Edit: Is Steven Levithan's XRegExp library a solution?
What's The Diff?
This seems like a very simple trick, but it's one that I've found extremely useful: generating a list of every file on a remote server that's not the same as the copy on my local machine.
As part of my workflow I always keep an entire working copy of any websites I work on, databases included, on my development machine. This allows me work on the site without fear of breaking the live site. When I'm happy with the way my local copy is working I can then copy any files I've changed up to the live server via SFTP.
At this point I'd really like a list of exactly which files have changed, either on my local copy or on the live site. The command line tool diff is perfect for this kind of job, but not so good if you only have SFTP access to the live site. The answer is to use SSHFS and FUSE to mount your remote folder onto your desktop. Then you can use any tool or application on your local machine against that mounted folder, exactly as if it were sitting on your local computer.
There are great tools that give you a nice GUI for SSHFS but I already use Panic's Transmit to connect with FTP servers, and that comes with a "Mount favorite as disk" button that just works. So once you have your live site sitting on your desktop, you can then run diff commands like the following, which will list every file that is not the same in two folders.
diff -rq /my/local/copy/ /Volumes/example.com/public_html/
(Yes, I know Transmit has a "Synchronise" feature that purports to do this same task, but unfortunately it uses the file modified times to decide if something has changed, and at least on my servers, the remote system times frequently get out of sync with my desktop, causing Transmit to report that every single file is different. The diff tool on the other hand compares the contents of each file, which is a much more robust approach.)
Is Lightbox the New Popup Window?
Web-based advertising has always been the unloved stepson of web development. Well I say unloved but actually there are lots of people who feel a sort of passion about things like "branding", "click-through rates" and "return on investment". The fact that these things personally bore me doesn't change the fact that there has always been an understanding in this business that if we want to have free-to-view web pages, we must then have a load of ad banners all over those pages. I'm going to concede that point, at least for the sake of my bigger argument.
And I'm conceding the point even though I have no personal experience to base it on: I myself have been using the World Wide Web from it's first days and can safely say I have never, not once ever, clicked on a single web ad. Not even one. Am I living this lifestyle to make some socio-political point? Nope, I'm just still waiting to see a web ad that is interesting enough to click on.
The ad mongers obviously know about uncooperative potential-customers like me, but when they analyze the problem of "he's still waiting to see a web ad that is interesting enough to click on" they come to the wrong conclusion that the solution is to make their uninteresting ads easier to see. If the blinking banners aren't appealing, they reason, then waving a popup window about in front of the content will surely make him want to click; it's worth a try anyway, right? I mean what's the harm?
So here's the harm: Since 2004 every major web browser has had an option to "block popup windows" built-in. And if that's not enough, users can choose from a wide variety of addons and plugins to do the same. If you, wayward web developer, have ever put an uninteresting ad in an popup window, you have helped to practically remove the popup window feature from every modern web browser. Well done, I hope you're proud of yourself.
But in the arms race for greater "CPMs" and "ROIs" there can be no surrender. If the user has developed an immunity to popup windows then there are other fronts to attack.
Now, based on the title of this blog entry, you might guess that I'm suggesting that Lightbox style ads are the latest front, and you'd be right. But I don't intend to criticize the technique used by JavaScript Lightbox implementations, they are no more evil than popup windows were ten years ago. Lightboxes don't annoy people, but annoying Lightboxes do. And here we are standing at the start of yet another very predictable path: how long before we see advice on how to disable JavaScript Lightboxes?
The problem is that a JavaScript Lightbox is not as distinct and self-contained as a popup window. Probably the most practical way to disable all JavaScript Lightboxes will be to disable JavaScript. This is a race back to the 1990's in terms of web development and it should be a concern to all of us.
But it's hard to know who to address this point to: I don't have the illusion that aggressive web advertisers, working for financially-desperate companies are going to suddenly develop a long-view ethos about this issue based on my warnings. In-your-face, hard-sell, intrusive web advertising will always be around but maybe it's up to all of us to enforce a new social contract, if these offenses are offered up by people who speak in terms of dollars and "clicks", then we can give our answer: don't click on those damned things. In fact add those products to your own blacklist of bad members of our web community. Avoid the hosting web sites altogether if possible. This is a campaign for us, if you're a web developer you may be helping yourself in the long run.
JavaScript Arrays or Objects?
If you want to keep a series of data together in a collection JavaScript provides two built-in choices: array or object. Douglas Crockford writes in his book JavaScript: The Good Parts:
The rule is simple: when the property names are small sequential integers, you should use an array. Otherwise, use an object.
It's not that simple. In practice this decision is going to be weighted one way or the other based on your usage of the data.
For example, if you have a collection of employee objects where the employee id numbers happen to be small sequential integers, you might decide to go with an array. But what if you find that most of the time you need to extract employees based on their name property?
function getElementsByProperty(property, value, array) {
var found = [];
for (var i = 0; i < array.length; i++) {
if (array[i][property] == value) {
found.push(array[i]);
}
}
return found;
}
var susan = getElementsByProperty("name", "Susan Smith", employees)[0];
Not pretty and not fast, especially on big arrays. But if that property you are searching for is unique for any element it becomes a primary key and you can do something much more elegant by creating employees as an object:
var susan = employees["Susan Smith"];
Okay that looks a lot nicer, so maybe we should try to use objects with primary keys? There are two problems you could have. The first has to do with order: in JavaScript the order of keys in an object is not guaranteed to be the same as the order in which you added them, so if the order of your elements is important you can't use an object.
And there's a second problem: what if you frequently need to modify and read the total number of employees? Unlike arrays, there is no magical length property to objects, instead you'll have to loop over every key in the object, incrementing a counter as you go. We're back to "not pretty and not fast" again.
So your choices look more like this:
- If the order of the elements must be predictable: use an array.
- If you will need a fast, simple way to get the number of elements: use an array.
- But if you will need a fast, simple way to access elements by a string primary key: use an object.
And if you need some combination, you must resort to one of the not pretty, not fast approaches.
Simulating Pass By Reference in JavaScript
I've tried to clarify for myself something about how values are passed to functions in JavaScript, and it goes something like this:
- Variabless are always passed by value.
- Properties of objects act like references.
- Passing an object to a function passes the value of the object.
- The value of a reference acts like a reference to the same place in memory.
Okay, so how does this help us simulate pass-by-reference in JavaScript? Simple, if you want to pass a reference to a variable, you just need to make the variable a property of an object. Pass that object to teh function and you indirectly pass a reference to the variable.
This is especially useful when you want to write a function that works as a mutator -- a function that directly changes the variable that is passed to it.
Typically the pattern often used in JavaScript is that you pass the value of the variable, and the function returns a brand new value. This is really the only option if you are limited to pass-by-value. But, if you want to modify the variable passed in directly you must have pass-by-reference, and that's where this post is all leading up to.
var name = {ref: "Bob"}; // create a reference
function mutate(n) {
if (n.ref) n.ref += "-O!";
// no need to return anything, we changed the name directly
}
mutate(name);
alert(name.ref); // dereference
Detecting JavaScript Arrays
It can be difficult dealing with JavaScript's duck-typing when you just want to know if a given object is or isn't an array. Especially as the typeof operator will return "object" for an array -- true but not very specific.
Douglas Crockford suggests the following as a good (but not perfect) technique to determine if you have an array:
function isArray(value) {
return value &&
typeof value === 'object' &&
typeof value.length === 'number' &&
typeof value.splice === 'function' &&
!(value.propertyIsEnumerable('length'));
};
Server Log Spam? Are You Kidding Me?
I've been busy setting up a new web server after an unhappy parting with my previous hosting provider (for the record, I'm now delighted to be with Rimu Hosting, the most professional and technically savvy service I've ever experienced in the ten-plus years I've been playing at these things).
So after tweaking and poking at things to make sure they were running as expected, I came across some very strange entries in my web server logs. Basically the pattern is something like this:
"GET /index.html?id=http://spam.example.com/ HTTP/1.0" 200 3167
And there are dozens and dozens of them. But why would anyone want to append such a strange querystring onto my homepage URL? Looking closely I notice the querystrings all contain very dodgy looking web addresses, the kind of addresses security experts warn you to stay away from. But what possible advantage would anyone gain from doing this?
After a little thought and some googling around I now know this is yet another way that spammers are poisoning the collective well that is our internet. Keep in mind the actual cost of getting their web address into my server log is nearly nothing, all they have to do is make a request to my homepage with an added querystring. This process could easily be automated and executed on hundreds or thousands of web sites every day. But even if this task is virtually free, what is the gain for the spammer?
Not surprisingly, in order for this to be of any advantage at all to a spammer, it requires a certain level of stupidity on the part of the web site owner. In this case the deserving victim needs to break the number one single most important law of web security:
Do not ever ever automatically post user-submitted content onto your web site. No, not ever!
The user-submitted content in this case is the querystring, and how is it getting automatically posted to web sites? Apparently some misguided folks actually link to their server log analytics, which will show that X many requests were made to /index.html?id=http://spam.example.com/.
Look, you have, on your web site, a public reference to the spammer's website - yay! And if your website is crawled by a search engine, they will see that you are endorsing that spammer by including his URL on your site. And this will raise the spammer's search engine rankings - or at least that's the theory. I doubt very much if any reputable search engine would actually fall for that.
So mystery solved, but is this such a bad thing? I mean no one's getting hurt right? Um, actually yes, I'm getting hurt, and so is everyone who maintains a website. The damage comes from all the bandwidth and CPU cycles wasted fulfilling the requests to automated spam bots. If, like me, you pay your hosting provider based on the bandwidth and CPU usage, then this cost is direct, even though I never have nor would publish my web logs publicly. And if you think you're getting a certain number of visitors to your web site based on your log analysis, you can bet the real number (after taking the bots away) is actually much lower.
I don't want to waste any resources generating pages for spam bots. Some of my pages require database connections and web application code to run, so why should I do that dozens of times a day for no reason?
Solution: The following code can be added to an .htaccess file and will immediately return a Forbidden result to any request that includes a fully-qualified URL in the querystring.
RewriteEngine On
RewriteCond %{QUERY_STRING} ^.*https?://.*$ [NC]
RewriteRule .* - [F,L]
Of course these requests will still be logged, but my goal is to prevent my server's resources being stolen. Anyone who automatically makes their log statistics public deserves whatever pain they get as a result, but I'm not one of them.
Running A Personal SVN Server on Mac Snow Leopard
I love version control. I use it more than is probably appropriate to be honest. For personal projects that don't warrent their own page on github or google projects I like to use a quick and simple svn repo right on my laptop. I have Apple's Time Machine running on there and full weekly snapshots saved to an external hard drive, so I'm not too concerned about the repo itself getting hosed.
I do however want it to be easy to make frequent commits to files I'm working on. And I specifically don't want to run svn with apache -- that is simply overkill for my personal usage. Finding decent instructions on how to set that up on Snow Leopard is difficult, so here's how I did it:
1. If you haven't already done so, download the installer for the excellent My Subversion-1.5.5 Package. It's simply a double click job and you then have an svn server and client all ready to go.
2. Create a folder to hold your repo. Adjust the filepaths to match your own system, obviously:
mkdir /Users/michael/svn
3. Start the svn server, pointing it at the repository directory you just created:
svnserve -d -r /Users/michael/svn
4. Create a new repository:
svnadmin create /Users/michael/svn/projects
5. Set up users for that new repository. This may be the most complicated step, but even this is very straight forward if you can edit an ini file:
bbedit /Users/michael/svn/projects/conf/svnserve.conf
It's just an ini type of file and has loads of useful comments (so read them). You'll want it to say this:
[general]
anon-access = read
auth-access = write
password-db = passwd
realm = projects
That means anonymous users can only read, while authenticated users can read and write. Also the password file is going to be in conf/passwd. So let's edit that now:
bbedit /Users/michael/svn/projects/conf/passwd
Again, a very simple ini type of file. Add yourself to it and save.
[users]
michael = secretpassword
6. Checkout your repository:
cd ~/Workarea
svn co svn://localhost/projects
7. Make changes, commit:
cd ~/Workarea/projects
bbedit README.txt
svn add README.txt
svn commit README.txt -m 'Initial commit.'
8. Finally, you can always stop the svn server, if you want to save a little battery juice:
killall svnserve
Publish A Developer Blog on GitHub
If you use GitHub you may not realise you already have access to website hosting, accessed via your user name. You can publish content there, once it is set up, but I want to go a step further and set up a blog. This post documents the steps I followed to do that.
Set up you github user page
First you need a user account on GitHub. If you go to a URL like http://yourUserName.github.com you will see the instructions you need to follow to set up your user page. I'll detail how I created mine:
Create a pages repo on github. Mine will be at http://micmath.github.com/
$ mkdir micmath.github.com
$ cd micmath.github.com
$ git init
$ echo "My GitHub Page" > index.html
$ git add index.html
$ git commit -a -m 'Initial commit.'
$ git remote add origin git@github.com:micmath/micmath.github.com.git
$ git push origin master
It can take about 10 minutes for the initial setup. That will give me time to install a few other things on my local machine...
Install the template system
GitHub supports Tom Preston's Jekyll engine and Pygment natively, so it makes sense to use those for a blog hosted on GitHub. I'm using gem and port to get the software on my Mac, you may need to use a different installation process supported by your own OS.
Install Jekyll for templates and site generation.
$ gem install jekyll
Install Pygments, for source code highlighting.
$ sudo port install python25 py25-pygments
Set up the site file structure.
$ mkdir _posts
$ mkdir _site
$ mkdir _layouts
$ touch _config.yml
$ mkdir css
Add the Pygment stylesheet
Pygment can generate it's own stylesheet, to be used for source code highlighting, using the command shown below. Jekyll uses the CSS selector ".highlight code", so add that in and create the file, like so:
$ pygmentize -S default -f html -a ".highlight code" > css/syntax.css
Configure Jekyll
You can configure Jekyll to your liking. For example I have my own Apache server running locally so I don't need Jekyll to run one.
$ cat _config.yml
auto: false
server: false
pygments: true
Create the default layout and css
$ touch _layout/default.html
$ touch css/default.css
In _layout/default.html, add some template code. Jekyll uses the Liquid template syntax, plus a few Jekyll-specific extensions.
$ cat _layout/default.html
<!DOCTYPE html>
<html>
<head>
<title>{{ page.title }}</title>
<meta http-equiv="Content-Type"
content="text/html;charset=utf-8">
<link rel="stylesheet" type="text/css" media="screen"
href="/css/default.css" />
<link rel="stylesheet" type="text/css" media="screen"
href="/css/syntax.css" />
</head>
<body>
<h1>{{ page.title }}</h1>
{{ content }}
</body>
</html>
Create some content
Finally, all the infrastructure is now in place and we can start blogging! To create the content for a blog post, simply add a new markdown file to your _posts folder. The name of the file should be formatted like yyyy-mm-dd-your-title.markdown.
$ cat _posts/2010-04-05-hello-world.markdown
---
layout: default
title: Hello World
---
This is an example blog post.
This is a just a _test_. Here's some code, which will get syntax highlighting from Pygment:
{% highlight js %}
function Foo(x, y) {
this.x = x;
}
{% endhighlight %}
Generate the HTML output
If you want to preview your formatted blog post you can run jekyll locally. This step is technically unnecessary, since GitHub will automatically run this same command for you on the server whenever you push new content. However it's handy if you want to proofread and tweak anything before publishing.
$ jekyll
Configuration from ./_config.yml
Building site: . -> ./_site
Successfully generated site: . -> ./_site
There is no need to publish the HTML generated in your own workarea, GitHub will automatically regenerate it whenever you commit anyway. So, if you haven't already done so, add the _site folder to your .gitignore file.
$ cat .gitignore
.DS_Store
_site
Publish the content
Publishing is easy, just commit and push and new files. If you haven't already done so, add all the new folders and files to git. Finally commit and push:
$ git commit -a -m 'Added source code example.'
$ git push
Mutators in JavaScript
I was surprised today by an unexpected behavior of JavaScript. In most computing languages the idea of references is often confusing to new programmers, but for some reason this is not emphasized as much in JavaScript. However it is not a topic that can be safely ignored.
var arr = [1, 2, 3];
function foo(arr) {
arr.shift(); // modifies the array "arr" refers to
arr.push("a");
arr[3] = "b";
}
foo(arr);
console.log(arr); // [2, 3, "a", "b"]
HTML DOM Manipulation with PHP
When discussing HTML DOM (Document Object Model) manipulation it is common to think JavaScript, but there are cases where you might want to do this on the server.
You, if you're like me, might think that HTML, or at least XHTML, is a job for simplexml. Loading and parsing an XHTML file is certainly possible with simplexml_load_file and you can easily use the various simplexml methods to maanipulate the DOM. You will however encounter a small problem if you need to output HTML. As simplexml is designed to deal with XML, it sees nothing wrong with outputting compacted empty tags. In other words if you give it an empty textarea, like so:
$xml = simplexml_load_string('<textarea></textarea>');
and a little while later try to get that string back again, you will find that it's taken some liberties:
echo $xml->asXML(); // <textarea />
This is perfectly fine XML but as HTML it breaks at least a few browsers. If you want to avoid this you'll need to start thinking up clever ways to circumvent what simplexml wants to do.
Or we can just not use simplexml.
The DOMDocument class is, perhaps not unsurprisingly, more feature-rich than the its simpler little brother. For example it has a a way to preserve opening and closing tags:
echo $dom->saveXML($dom, LIBXML_NOEMPTYTAG); // <textarea></textarea>
But even more usefully, it has a loadHTMLFile method and a saveHTML method, which treat HTML more like, well HTML.
Here's a quick example, we're looking to get the names of any required textareas:
$dom = new DomDocument();
$dom->loadHTMLFile($myHtmlFile);
$requireds = array();
$nodes = $dom->getElementsByTagName('textarea');
foreach ($nodes as $node) {
if ($node->hasAttribute('required')) {
array_push($requireds, $node->getAttribute('name'));
}
}
Properties of Primitives
I was tripped up momentarily today, swept away by the flexibility of JavaScript, where you can add and modify properties of anything whenever you feel the whim to do so. Well, it turns out that there are a few exceptions to that rule: you can't add properties to primitive data types. JavaScript has the following primitives:
- Number
- String
- Boolean
- Undefined
- Null
And the example that shows a primitive resisting my attempt to stick a property onto it:
var s1 = "quick brown fox";
s1.lang = "en";
alert(s1.lang); // undefined
But there is a solution if you want to do this sort of thing: you must wrap your primitive data up in an object-instance cloak.
var s2 = new String("quick brown fox");
s2.lang = "en";
alert(s2.lang); // en
A few more examples of primitive wrappers (you cannot construct a new Undefined or Null):
- var value = new Number(3.14);
- var value = new String("hello");
- var value = new Boolean(false);
Interestingly these wrappers affect the result returned by typeof, causing them to all become "objects" whose constructor is equal to either Number, String or Boolean. You do some coersion with string to promote it to an object and then demote it back to a primative, like so:
// start with a primitive
var s1 = "hello";
alert(typeof s1); // string
alert(s1.constructor == String); // false
// promote to an object
var s2 = new String(s1);
alert(typeof s2); // object
alert(s2.constructor == String); // true
// demote back to a primitive
var s3 = ""+s2;
alert(typeof s3); // string
alert(s3.constructor == String); // false
Limiting Variable Visibility in JavaScript
My momma always told me to keep the visibility of my variables as small as possible. Well, okay I can confidently say those words never passed her lips, but I can easily imagine some stern-faced Code Matron wagging a finger and chiding, "Global variables make baby Jesus cry." I won't go into all the whys of limiting the visibility of your variables, the point of this post is to compare a few different hows.
We have all been told that if we want to limit the visibility of a variable in JavaScript there is only one way to do it, and that is to use var inside a function. For example, if you wanted to ensure that a variable x did not touch the larger scope of your program, you could do something like this:
(function () {
var x; // reference to variable x exists in here
alert(x); // undefined
})();
alert(x); // throws ReferenceError: x does not exist here
It is worth mentioning at this point that JavaScript has a very relavant little feature known as "variable hoisting," meaning that variables are effectively always declared at the top of the function, regardless of where you write the var x bit. The example below is effectively identical to example above:
(function () {
// reference to variable x exists in here
alert(x); // undefined
var x;
})();
alert(x); // throws ReferenceError: x does not exist here
So you can't limit the visibility of a variable lexically, by writing the var stuff near the end of a function. If you declare a variable anywhere in a function the effect is the same as declaring it at the top of that function. For that reason it is arguably good practice to actually declare variables at the top of functions--you may as well make it look the same as the way it works, right?
But using var isn't the only way to accomplish our goal. You can also use function parameters.
(function (x) { // reference to parameter x exists
alert(x); // x is undefined
})();
alert(x); // throws ReferenceError
This time the x listed as a parameter of the function actually pops into existence right before the rest of the function body. So it's possible a var somewhere (anywhere) inside the function could overwrite the parameter x everywhere.
And that would appear to be the end of our options, but in fact there is yet another way to achieve this same goal, and it doesn't use functions at all. This time we'll use with:
// assume undefined === void(0);
with( {x: undefined} ) { // reference to property x exists
alert(x); // x is undefined
}
alert(x); // throws ReferenceError
People don't use with very much (for reasons I'll discuss a bit later) but it is designed to do exactly what we are after here: alter the scope chain for just a little while.
But let's compare these two techniques in a more real world situation: here I want to fill an array with the first few integers squared; I need to create a temporary variable named i but I don't want that i to exist or have any effect whatsoever outside my limited 2 or 3 lines of code.
First the functional way:
squares = [];
(function() {
var i = 5;
do { squares[i] = i * i } while (i--);
})();
alert( squares.join(', ') ); // 0, 1, 4, 9, 16, 25
alert(i); // ReferenceError
or alternatively:
squares = [];
(function(i) {
do { squares[i] = i * i } while (i--);
})(5);
alert( squares.join(', ') ); // 0, 1, 4, 9, 16, 25
alert(i); // ReferenceError
And the same using with:
squares = [];
with( {i: 5} ) {
do { squares[i] = i * i } while (i--);
}
alert( squares.join(', ') ); // 0, 1, 4, 9, 16, 25
alert(i); // ReferenceError
You could argue that the functional techniques look a little more sugary than the with technique and thus with is the cleaner way to go, but there are several other points against using with. First is a slightly circular observation that people are likely to be unfamiliar with with and so don't often use it. I suppose we could start a campaign to get more people to use it and therefore solve that problem, but here's why I wouldn't: with, like its ginger-haired stepbrother eval, is an easy to abuse little sucker, and you don't have to use it very much before you start to consider it confusing at least. What's more, it's slower than the functional equivalents shown here. Well, mostly. In some browsers it's actually slightly faster, but this is more than offset by the fact that it is incredibly humongously slower in almost all other browsers.
Of course, in the big picture, we're still only talking small fractions of seconds difference here so if you really just love the syntax of with shown above, you could still get away with using it. I leave that decision to you.
Mysterious At-Sign: Extended Attributes in Mac Files
I was attempting to run an old perl script today, and doing a quick ls -la to check the permissions revealed a mysterious at-sign @ in the permissions.
-rw-r--r--@ 1 michael michael 467 8 May 13:16 cvsToSql.pl
Hi there little at-sign. What are you doing in there?
And do you know how hard it is to google the meaning of an at-sign in ls -la? Not easy. I did learn that a filename with a trailing at-sign can indicate a symbolic link on Linux. But this is Mac OS X Leopard and the at-sign is in the permissions. More googling....
The best answer is from dev.netcetera.org and Jeff Seitz’s Blog, and it is that the at-sign indicates that the file has extended file attributes. Hmmm... I want to get rid of those I think. Step one is to find out the name of the extended attribute.
$ xattr -l cvsToSql.pl
com.apple.FinderInfo:
0000 54 45 58 54 52 2A 63 68 00 00 00 00 00 00 00 00 TEXTR*ch........
0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Okay, so my file has some extra data attached to the "com.apple.FinderInfo" attribute. I suppose there is some important nerd-reason for that being there but I'm not interested: step two is to get it off.
$ xattr -d com.apple.FinderInfo cvsToSql.pl
And no more at-sign! All that files attributes are now of the non-extended variety.
JavaScript Constructors and Factories
One of JavaScript's peculiarities is the 'new' expression: when a function invocation is included in that expression that function behaves like a constructor. But the function, by itself, is not intrinsically a "constructor," it's still just a function, and as such any function intended to be called with new can just as easily be called without new. Of course if you, the function's author, are relying on the side-effects provided by new things may not work as intended when those effects are missing. There are however many built-in examples of functions which work equally well, if perhaps slightly differently, when invoked as a constructor or as an ordinary function:
Objects are created by using constructors in new expressions; for example, new String("A String") creates a new String object. Invoking a constructor without using new has consequences that depend on the constructor. For example, String("A String") produces a primitive string, not an object. -- ECMAScript Language Specification, 4.2.1 Objects
Constructors are not the only way to build objects, there is a similar pattern that involves a function, called a "factory." The difference is that a factory is not called with new and factories don't return their this value, as constructors do. You could say that String is an example of a function that can be invoked as a constructor to return a String object, or as a factory to return a string primitive. This design blurs the distinction between factories and constructors. What can we do in this blurry area?
One thing we can do is to create a wrapper for an existing constructor, where we add on our own extensions:
function ArrayPlus() {
var self = Array.prototype.slice.call(arguments);
self.last = function() {
if (this.length) return this[this.length-1];
};
return self;
}
var a1 = new ArrayPlus(1, 2, 3); // like a constructor
var a2 = ArrayPlus(1, 2, 3); // like a factory
// the result is the same, arrays that can last()
alert( a1.push(4), a1.last() ); // alerts 4
alert( a2.push(4), a2.last() ); // alerts 4
When we call ArrayPlus as a factory it returns an instance of an Array with an extra method added on. However the same thing happens when we call ArrayPlus as a constructor, how is that possible?
Typically when a function is invoked as a constructor it automatically returns the constructor's own this value, however in ArrayPlus we are returning our own object, named self. This is allowed under the rules of the ECMAScript Language Specification (section 13.2.2), but it is important to know what the difference is: self is an instance of an Array, not an ArrayPlus. In essense we are overriding the typical behavior of a constructor and forcing ArrayPlus to always behave like a factory, whether it is called with or without new.
I referred to the ECMAScript Language Specification a moment ago, and it does indeed allow arbitrary objects to be returned from constructors, but there is a caveat worth knowing related to this: a constructor (called with new) cannot return a non-object. For example:
function MyNumber(value) {
var self = new Number(value); // an object
return self;
}
var one = new MyNumber(1);
function Three(value) {
var self = 3; // a primitive
return self;
}
var three = new Three();
alert(one); // alerts 1
alert(three); // alerts "[object Object]"
What happened? The rules of ECMAScript say that if a constructor returns an object, as MyNumber does, then proceed as expected. But if a constructor returns a non-object, as Three does, then ignore the return statement entirely and instead return whatever the constructor's this value is. In the case of Three the this value is an instance of Three; in any case it is not the number 3, as you might think it is.
Understanding Variable Instantiation
Quick, just from examining this code (no fair testing it), what will the following alert?
var f = 1;
function f() { }
alert(f);
The fact that I consider this code intersesting (or at least instructive) may tip you to what the answer will be: it's 1. Surprised? If you are you're likely suffering from the common misconception that your JavaScript code is executed in the order you wrote it. If that were true the variable named f would first be declared, then assigned the value 1. Next the function named f would be declared, replacing that existing f and finally the alert would show us the function. But we know that's not what happens.
What really happens is explained in section 10.1.3 of the ECMAScript specification, entitled "Variable Instantiation". If you imagine there is a Variable Object and the JavaScript engine is adding properties to it based on your source code, the specification defines the following distinct steps and the order in which they must happen:
- Function declarations
- Variable declarations
- The rest
Note that when you write var f = 1; you are really doing two things together: declaring a variable named f and then assigning the value of 1 to it. So, putting all that together, effectively the code in our previous example is the same as this:
function f() { } // function declarations first
var f; // variable declarations next
f = 1; // the rest
alert(f);
Useless: Form Validation in Safari 5
To my shock today I noticed that on some of my live web sites many of the HTML forms were no longer working. By "no longer working" I mean they simply did nothing when you pressed the submit button. By "simply did nothing" I mean you clicked the submit button and the browser just ignored you. There was no perceivable reaction, no error message, no form submission and no explanation why not. I am practiced at debugging but this mystery didn't provide me much to get started with.
One clue was that these same forms worked fine on another browser. The second clue was that I'd recently upgraded to Safari 5. And then I remembered I was doing something slightly interesting with these forms: I was using JavaScript on the client (and PHP on the server) to validate the form input. As part of that system I was using some HTML5 attributes before they were supported in any browsers, the new pattern attribute, for one:
<input type="text" id="email" name="email" required="required"
pattern="^[^@]+@[^@]+\..+$" hint="Email must be a valid email address." />
My scripts would check each input element for a pattern attribute and perform a validation check against the corresponding submitted value. If the value did not match the pattern then the hint would be displayed as an explanation for what correction was needed from the user. I purposely chose the name pattern for my little system because I knew it would eventually get supported by new browsers that could do HTML5 validation -- pattern is a part of the HTML5 spec.
So what was happening here? Well it turns out that Safari 5 was exactly what I was waiting for: a browser that supported HTML5 form validation natively. The problem is that Safari does its validation before my code runs, and when Safari sees a validation fail it just does nothing. Well actually it does give the failing input element focus, meaning a hazy blue halo appears around it, but this is very subtle and may as well be considered imperceptible by anyone who isn't sharp-eyed and trained to look for it.
So Safari's native validation is useless because it provides no feedback. In fact it's worse than useless because it prevents my scripts from providing any feedback either, since they never get a chance to run. I can't find any documentation on how to control Safari 5's new form validation features so for now I must change my forms so they will no longer use any HTML5 validation attributes.
Safari 5: making my web pages less HTML5 compliant.
Transparent Content Filtering for Web Developers
Imagine you want to play around with some JavaScript resource that is installed on a client's live server. You might want to do this in order to test some changes you've made to that file: it may work on your development server, but you want to see if that will still be true when it goes live.
This can be tricky. You definitely don't want to upload and pray. Maybe some specific configuration that is only on the live server will cause your AJAX features to behave in unexpected ways? How could see those errors for yourself without anyone else seeing them at the same time?
One way is to use a content-filtering proxy. This is basically an application that sits between your web browser and the your internet connection. Any request that browser makes will go through the proxy and any response from the internet will come back through that same proxy. Once you configure this, it's invisible to the browser. So, to accomplish what we need we could configure the proxy to replace any SCRIPT tags in the page on returned from the live server with a different SCRIPT tag that points to the new file on your development server. Everything else will be exactly as it is on the live server, because the page you will see will be coming from the live server. Only you, thanks to your proxy, will see the affects of the new JavaScript.
To illustrate I'm going to show how I could use this technique to insert my own picture into my view of the New York Times Technology Page. Niether the browser nor the server will know that anything has been changed, only I will see the alteration. Naturally it's possible to change anything on the page, I'm changing an image only because it's easy to see.
To accomplish this you'll need two prerequisites: You'll need to be able to install a proxy server, and you'll need to be able to write Perl-style regular expressions.
Let's start by installing a proxy server. I'm on a Mac, but you may need to adjust this for your own operating system. I've decided to use Privoxy because it's free and works on a wide range of computers. The easiest way to install Privoxy on Mac is to use the Darwin Ports package manager.
$ sudo port install privoxy
Once that completes, you will, by default, have a privoxy binary installed in /opt/local/sbin/privoxy. Before you start it, I would suggest a couple tweaks to the configuration file.
$ sudo vi /opt/local/etc/privoxy/config
We want to tell privoxy to use our own personal files as part of the configuration, this will make it easier and safer to tweak the configuration going forward. My personal configuration files will live in ~/.privoxy/, so I'll add the following two lines to /opt/local/etc/privoxy/config:
filterfile /Users/michael/.privoxy/user.filter
actionsfile /Users/michael/.privoxy/user.action
Save and close the main cinfiguration file now, that's the last time we'll need to touch it. Now we can create a folder to hold our personal configurations, and add a couple files to it, like so:
$ mkdir /Users/michael/.privoxy
$ touch /Users/michael/.privoxy/user.filter
$ touch /Users/michael/.privoxy/user.action
Ok, we can now start privoxy up, and it will include our own (temporarily empty) configuration files.
$ sudo /opt/local/sbin/privoxy /opt/local/etc/privoxy/config
Now you need to configure your web browsers to use your new proxy. On Mac, go to System Preferences > Network > Advanced > Proxies and set the Web Proxy (HTTP) and Secure Web Proxy (HTTPS) items to use 127.0.0.1:8118. Save and close the Preferences, and open up a web browser like Safari.
To test that your proxy is running and that your web browser is using it, go to the following URL: http://config.privoxy.org/. You should see some information about your running Privoxy application. By default privoxy will filter out most web ads, though you can adjust this. The important thing is we can add our own filtering. To do that we need to edit those two configuration files we created earlier.
First lets create a filter, one that will replace the URL to an image on the NYTimes domain, with an image on my own development server. The rule for that will go in my /Users/michael/.privoxy/user.filter file and looks like this:
FILTER: justtesting This is a test.
s%http://graphics\.nytimes\.com/example\.jpg%http://192.168.0.3/example.jpg%g
(Those aren't real URL's, I've shortened them for purposes of illustration, the point is you can match any pattern in the page source and replace it with any of your own text.)
If you grok Perl regular expressions, then this substitution syntax should be familiar to you. Here I'm defining a filter named "justtesting" which swaps my own example.jpg URL in wherever it sees a URL matching the one for the example.jpg on the NYTimes server.
Save that, and now you're halfway done. The second part of this process is to add that filter to the user.action file. So I will add the following to my /Users/michael/.privoxy/user.action file:
{+filter{justtesting}}
/
The first line turns on the filter name "justtesting" and the second line is a glob pattern saying which web addresses to apply the filter to. Use a forward slash if you want apply the filter to all web addresses. But, if I'd wanted to limit my filter just to the nytimes server I could have written this instead:
{+filter{justtesting}}
*.nytimes.com
Save that, and you're done. You don't need to restart privoxy for those changes to be applied. Now, if I go to the live web site I should see the affects of my switcheroo. Can you spot my somewhat Simpsonsesque family on my view of the live New York Times Technology Page?

Note: If you want to always run privoxy and you've used port to install it, run the following command on Mac:
$ sudo launchctl load -w /Library/LaunchDaemons/org.macports.Privoxy.plist
$ sudo launchctl list | grep privoxy
Microsoft Mime Types
I had a problem with some more recent Microsoft document formats not being recognized when downloaded from an Apache server. Ironically it was the IE 8 having the problem -- I think it was trying to be secure -- while FireFox knew what to do just fine. Any way, here's an .htaccess addition that will help our IE 8 friends out (with thanks to mcupples):
AddType application/vnd.ms-word.document.macroEnabled.12 .docm
AddType application/vnd.openxmlformats-officedocument.wordprocessingml.document docx
AddType application/vnd.openxmlformats-officedocument.wordprocessingml.template dotx
AddType application/vnd.ms-powerpoint.template.macroEnabled.12 potm
AddType application/vnd.openxmlformats-officedocument.presentationml.template potx
AddType application/vnd.ms-powerpoint.addin.macroEnabled.12 ppam
AddType application/vnd.ms-powerpoint.slideshow.macroEnabled.12 ppsm
AddType application/vnd.openxmlformats-officedocument.presentationml.slideshow ppsx
AddType application/vnd.ms-powerpoint.presentation.macroEnabled.12 pptm
AddType application/vnd.openxmlformats-officedocument.presentationml.presentation pptx
AddType application/vnd.ms-excel.addin.macroEnabled.12 xlam
AddType application/vnd.ms-excel.sheet.binary.macroEnabled.12 xlsb
AddType application/vnd.ms-excel.sheet.macroEnabled.12 xlsm
AddType application/vnd.openxmlformats-officedocument.spreadsheetml.sheet xlsx
AddType application/vnd.ms-excel.template.macroEnabled.12 xltm
AddType application/vnd.openxmlformats-officedocument.spreadsheetml.template xltx
Fun with Partitions
Last year I partitioned the hard drive on my PC laptop so I could install Kubuntu and have it dual boot. It worked quite well except for the fact that there was an extra step for everyone in the family (none of whom have any interest in Linux) every time they turned on the computer: they had to choose the WindowsXP option from the menu presented by the GRUB loader. The fact is I almost always prefer to use my Mac OSX anyway so, from their point of view, I had made their all lives a teensy bit harder for no good reason.
Then they ran out of room. The 30 GB I had left them for their Windows partition was quickly filled to the brim with iTunes music and family photos. I've been using Sun's VirtualBox application at work and was so impressed I decided I would delete that mostly unused Linux partition and install Kubuntu on a virtual hard drive instead. I'd free up a lot of disk space and make logging in easier for everyone in the bargain; you'd expect me to be a family hero, right? The family looked dubious.
Frankly no one seemed even a little surprised when I announced a short while later that I had--once again--rendered the PC completely unusable. "Uh, we do have backups of those photos from Molly's birthday party right?" "Um, I think so... do we?"
The problem was I'd deleted the Linux partition but didn't tell the GRUB about it. This made it impossible to boot the computer. Not a time to panic: the truth is I'd been in this sort of situation more times than I'd care to admit, and there is always a way out, if you just google it hard enough.
Answer: Super Grub Disk. I downloaded the iso file from the website onto my Mac, burned that to a CD using the built-in Mac Disk Utility tool, popped that CD into my PC and restarted. I was presented with a list of options, one of which was to rebuild the WindowsXP master boot record (MBR). No, I don't know what a MBR is, how do you think I got myself in this situation in the first place, all I know is Windows is rather unhappy without one. Anyway, now that the MBR was, um, mastered again, the old laptop booted right up! Even after all that I'm still not the family hero though. Weird.
