micmath blog
RecentWhat's The Diff?
This seems like a very simple trick, but it's one that I've found extremely useful: generating a list of every file on a remote server that's not the same as the copy on my local machine.
As part of my workflow I always keep an entire working copy of any websites I work on, databases included, on my development machine. This allows me work on the site without fear of breaking the live site. When I'm happy with the way my local copy is working I can then copy any files I've changed up to the live server via SFTP.
At this point I'd really like a list of exactly which files have changed, either on my local copy or on the live site. The command line tool diff is perfect for this kind of job, but not so good if you only have SFTP access to the live site. The answer is to use SSHFS and FUSE to mount your remote folder onto your desktop. Then you can use any tool or application on your local machine against that mounted folder, exactly as if it were sitting on your local computer.
There are great tools that give you a nice GUI for SSHFS but I already use Panic's Transmit to connect with FTP servers, and that comes with a "Mount favorite as disk" button that just works. So once you have your live site sitting on your desktop, you can then run diff commands like the following, which will list every file that is not the same in two folders.
diff -rq /my/local/copy/ /Volumes/example.com/public_html/
(Yes, I know Transmit has a "Synchronise" feature that purports to do this same task, but unfortunately it uses the file modified times to decide if something has changed, and at least on my servers, the remote system times frequently get out of sync with my desktop, causing Transmit to report that every single file is different. The diff tool on the other hand compares the contents of each file, which is a much more robust approach.)
Useless: Form Validation in Safari 5
To my shock today I noticed that on some of my live web sites many of the HTML forms were no longer working. By "no longer working" I mean they simply did nothing when you pressed the submit button. By "simply did nothing" I mean you clicked the submit button and the browser just ignored you. There was no perceivable reaction, no error message, no form submission and no explanation why not. I am practiced at debugging but this mystery didn't provide me much to get started with.
One clue was that these same forms worked fine on another browser. The second clue was that I'd recently upgraded to Safari 5. And then I remembered I was doing something slightly interesting with these forms: I was using JavaScript on the client (and PHP on the server) to validate the form input. As part of that system I was using some HTML5 attributes before they were supported in any browsers, the new pattern attribute, for one:
<input type="text" id="email" name="email" required="required"
pattern="^[^@]+@[^@]+\..+$" hint="Email must be a valid email address." />
My scripts would check each input element for a pattern attribute and perform a validation check against the corresponding submitted value. If the value did not match the pattern then the hint would be displayed as an explanation for what correction was needed from the user. I purposely chose the name pattern for my little system because I knew it would eventually get supported by new browsers that could do HTML5 validation -- pattern is a part of the HTML5 spec.
So what was happening here? Well it turns out that Safari 5 was exactly what I was waiting for: a browser that supported HTML5 form validation natively. The problem is that Safari does its validation before my code runs, and when Safari sees a validation fail it just does nothing. Well actually it does give the failing input element focus, meaning a hazy blue halo appears around it, but this is very subtle and may as well be considered imperceptible by anyone who isn't sharp-eyed and trained to look for it.
So Safari's native validation is useless because it provides no feedback. In fact it's worse than useless because it prevents my scripts from providing any feedback either, since they never get a chance to run. I can't find any documentation on how to control Safari 5's new form validation features so for now I must change my forms so they will no longer use any HTML5 validation attributes.
Safari 5: making my web pages less HTML5 compliant.
Limiting Variable Visibility in JavaScript
My momma always told me to keep the visibility of my variables as small as possible. Well, okay I can confidently say those words never passed her lips, but I can easily imagine some stern-faced Code Matron wagging a finger and chiding, "Global variables make baby Jesus cry." I won't go into all the whys of limiting the visibility of your variables, the point of this post is to compare a few different hows.
We have all been told that if we want to limit the visibility of a variable in JavaScript there is only one way to do it, and that is to use var inside a function. For example, if you wanted to ensure that a variable x did not touch the larger scope of your program, you could do something like this:
(function () {
var x; // reference to variable x exists in here
alert(x); // undefined
})();
alert(x); // throws ReferenceError: x does not exist here
It is worth mentioning at this point that JavaScript has a very relavant little feature known as "variable hoisting," meaning that variables are effectively always declared at the top of the function, regardless of where you write the var x bit. The example below is effectively identical to example above:
(function () {
// reference to variable x exists in here
alert(x); // undefined
var x;
})();
alert(x); // throws ReferenceError: x does not exist here
So you can't limit the visibility of a variable lexically, by writing the var stuff near the end of a function. If you declare a variable anywhere in a function the effect is the same as declaring it at the top of that function. For that reason it is arguably good practice to actually declare variables at the top of functions--you may as well make it look the same as the way it works, right?
But using var isn't the only way to accomplish our goal. You can also use function parameters.
(function (x) { // reference to parameter x exists
alert(x); // x is undefined
})();
alert(x); // throws ReferenceError
This time the x listed as a parameter of the function actually pops into existence right before the rest of the function body. So it's possible a var somewhere (anywhere) inside the function could overwrite the parameter x everywhere.
And that would appear to be the end of our options, but in fact there is yet another way to achieve this same goal, and it doesn't use functions at all. This time we'll use with:
// assume undefined === void(0);
with( {x: undefined} ) { // reference to property x exists
alert(x); // x is undefined
}
alert(x); // throws ReferenceError
People don't use with very much (for reasons I'll discuss a bit later) but it is designed to do exactly what we are after here: alter the scope chain for just a little while.
But let's compare these two techniques in a more real world situation: here I want to fill an array with the first few integers squared; I need to create a temporary variable named i but I don't want that i to exist or have any effect whatsoever outside my limited 2 or 3 lines of code.
First the functional way:
squares = [];
(function() {
var i = 5;
do { squares[i] = i * i } while (i--);
})();
alert( squares.join(', ') ); // 0, 1, 4, 9, 16, 25
alert(i); // ReferenceError
or alternatively:
squares = [];
(function(i) {
do { squares[i] = i * i } while (i--);
})(5);
alert( squares.join(', ') ); // 0, 1, 4, 9, 16, 25
alert(i); // ReferenceError
And the same using with:
squares = [];
with( {i: 5} ) {
do { squares[i] = i * i } while (i--);
}
alert( squares.join(', ') ); // 0, 1, 4, 9, 16, 25
alert(i); // ReferenceError
You could argue that the functional techniques look a little more sugary than the with technique and thus with is the cleaner way to go, but there are several other points against using with. First is a slightly circular observation that people are likely to be unfamiliar with with and so don't often use it. I suppose we could start a campaign to get more people to use it and therefore solve that problem, but here's why I wouldn't: with, like its ginger-haired stepbrother eval, is an easy to abuse little sucker, and you don't have to use it very much before you start to consider it confusing at least. What's more, it's slower than the functional equivalents shown here. Well, mostly. In some browsers it's actually slightly faster, but this is more than offset by the fact that it is incredibly humongously slower in almost all other browsers.
Of course, in the big picture, we're still only talking small fractions of seconds difference here so if you really just love the syntax of with shown above, you could still get away with using it. I leave that decision to you.
Publish A Developer Blog on GitHub
If you use GitHub you may not realise you already have access to website hosting, accessed via your user name. You can publish content there, once it is set up, but I want to go a step further and set up a blog. This post documents the steps I followed to do that.
Set up you github user page
First you need a user account on GitHub. If you go to a URL like http://yourUserName.github.com you will see the instructions you need to follow to set up your user page. I'll detail how I created mine:
Create a pages repo on github. Mine will be at http://micmath.github.com/
$ mkdir micmath.github.com
$ cd micmath.github.com
$ git init
$ echo "My GitHub Page" > index.html
$ git add index.html
$ git commit -a -m 'Initial commit.'
$ git remote add origin git@github.com:micmath/micmath.github.com.git
$ git push origin master
It can take about 10 minutes for the initial setup. That will give me time to install a few other things on my local machine...
Install the template system
GitHub supports Tom Preston's Jekyll engine and Pygment natively, so it makes sense to use those for a blog hosted on GitHub. I'm using gem and port to get the software on my Mac, you may need to use a different installation process supported by your own OS.
Install Jekyll for templates and site generation.
$ gem install jekyll
Install Pygments, for source code highlighting.
$ sudo port install python25 py25-pygments
Set up the site file structure.
$ mkdir _posts
$ mkdir _site
$ mkdir _layouts
$ touch _config.yml
$ mkdir css
Add the Pygment stylesheet
Pygment can generate it's own stylesheet, to be used for source code highlighting, using the command shown below. Jekyll uses the CSS selector ".highlight code", so add that in and create the file, like so:
$ pygmentize -S default -f html -a ".highlight code" > css/syntax.css
Configure Jekyll
You can configure Jekyll to your liking. For example I have my own Apache server running locally so I don't need Jekyll to run one.
$ cat _config.yml
auto: false
server: false
pygments: true
Create the default layout and css
$ touch _layout/default.html
$ touch css/default.css
In _layout/default.html, add some template code. Jekyll uses the Liquid template syntax, plus a few Jekyll-specific extensions.
$ cat _layout/default.html
<!DOCTYPE html>
<html>
<head>
<title>{{ page.title }}</title>
<meta http-equiv="Content-Type"
content="text/html;charset=utf-8">
<link rel="stylesheet" type="text/css" media="screen"
href="/css/default.css" />
<link rel="stylesheet" type="text/css" media="screen"
href="/css/syntax.css" />
</head>
<body>
<h1>{{ page.title }}</h1>
{{ content }}
</body>
</html>
Create some content
Finally, all the infrastructure is now in place and we can start blogging! To create the content for a blog post, simply add a new markdown file to your _posts folder. The name of the file should be formatted like yyyy-mm-dd-your-title.markdown.
$ cat _posts/2010-04-05-hello-world.markdown
---
layout: default
title: Hello World
---
This is an example blog post.
This is a just a _test_. Here's some code, which will get syntax highlighting from Pygment:
{% highlight js %}
function Foo(x, y) {
this.x = x;
}
{% endhighlight %}
Generate the HTML output
If you want to preview your formatted blog post you can run jekyll locally. This step is technically unnecessary, since GitHub will automatically run this same command for you on the server whenever you push new content. However it's handy if you want to proofread and tweak anything before publishing.
$ jekyll
Configuration from ./_config.yml
Building site: . -> ./_site
Successfully generated site: . -> ./_site
There is no need to publish the HTML generated in your own workarea, GitHub will automatically regenerate it whenever you commit anyway. So, if you haven't already done so, add the _site folder to your .gitignore file.
$ cat .gitignore
.DS_Store
_site
Publish the content
Publishing is easy, just commit and push and new files. If you haven't already done so, add all the new folders and files to git. Finally commit and push:
$ git commit -a -m 'Added source code example.'
$ git push
Transparent Content Filtering for Web Developers
Imagine you want to play around with some JavaScript resource that is installed on a client's live server. You might want to do this in order to test some changes you've made to that file: it may work on your development server, but you want to see if that will still be true when it goes live.
This can be tricky. You definitely don't want to upload and pray. Maybe some specific configuration that is only on the live server will cause your AJAX features to behave in unexpected ways? How could see those errors for yourself without anyone else seeing them at the same time?
One way is to use a content-filtering proxy. This is basically an application that sits between your web browser and the your internet connection. Any request that browser makes will go through the proxy and any response from the internet will come back through that same proxy. Once you configure this, it's invisible to the browser. So, to accomplish what we need we could configure the proxy to replace any SCRIPT tags in the page on returned from the live server with a different SCRIPT tag that points to the new file on your development server. Everything else will be exactly as it is on the live server, because the page you will see will be coming from the live server. Only you, thanks to your proxy, will see the affects of the new JavaScript.
To illustrate I'm going to show how I could use this technique to insert my own picture into my view of the New York Times Technology Page. Niether the browser nor the server will know that anything has been changed, only I will see the alteration. Naturally it's possible to change anything on the page, I'm changing an image only because it's easy to see.
To accomplish this you'll need two prerequisites: You'll need to be able to install a proxy server, and you'll need to be able to write Perl-style regular expressions.
Let's start by installing a proxy server. I'm on a Mac, but you may need to adjust this for your own operating system. I've decided to use Privoxy because it's free and works on a wide range of computers. The easiest way to install Privoxy on Mac is to use the Darwin Ports package manager.
$ sudo port install privoxy
Once that completes, you will, by default, have a privoxy binary installed in /opt/local/sbin/privoxy. Before you start it, I would suggest a couple tweaks to the configuration file.
$ sudo vi /opt/local/etc/privoxy/config
We want to tell privoxy to use our own personal files as part of the configuration, this will make it easier and safer to tweak the configuration going forward. My personal configuration files will live in ~/.privoxy/, so I'll add the following two lines to /opt/local/etc/privoxy/config:
filterfile /Users/michael/.privoxy/user.filter
actionsfile /Users/michael/.privoxy/user.action
Save and close the main cinfiguration file now, that's the last time we'll need to touch it. Now we can create a folder to hold our personal configurations, and add a couple files to it, like so:
$ mkdir /Users/michael/.privoxy
$ touch /Users/michael/.privoxy/user.filter
$ touch /Users/michael/.privoxy/user.action
Ok, we can now start privoxy up, and it will include our own (temporarily empty) configuration files.
$ sudo /opt/local/sbin/privoxy /opt/local/etc/privoxy/config
Now you need to configure your web browsers to use your new proxy. On Mac, go to System Preferences > Network > Advanced > Proxies and set the Web Proxy (HTTP) and Secure Web Proxy (HTTPS) items to use 127.0.0.1:8118. Save and close the Preferences, and open up a web browser like Safari.
To test that your proxy is running and that your web browser is using it, go to the following URL: http://config.privoxy.org/. You should see some information about your running Privoxy application. By default privoxy will filter out most web ads, though you can adjust this. The important thing is we can add our own filtering. To do that we need to edit those two configuration files we created earlier.
First lets create a filter, one that will replace the URL to an image on the NYTimes domain, with an image on my own development server. The rule for that will go in my /Users/michael/.privoxy/user.filter file and looks like this:
FILTER: justtesting This is a test.
s%http://graphics\.nytimes\.com/example\.jpg%http://192.168.0.3/example.jpg%g
(Those aren't real URL's, I've shortened them for purposes of illustration, the point is you can match any pattern in the page source and replace it with any of your own text.)
If you grok Perl regular expressions, then this substitution syntax should be familiar to you. Here I'm defining a filter named "justtesting" which swaps my own example.jpg URL in wherever it sees a URL matching the one for the example.jpg on the NYTimes server.
Save that, and now you're halfway done. The second part of this process is to add that filter to the user.action file. So I will add the following to my /Users/michael/.privoxy/user.action file:
{+filter{justtesting}}
/
The first line turns on the filter name "justtesting" and the second line is a glob pattern saying which web addresses to apply the filter to. Use a forward slash if you want apply the filter to all web addresses. But, if I'd wanted to limit my filter just to the nytimes server I could have written this instead:
{+filter{justtesting}}
*.nytimes.com
Save that, and you're done. You don't need to restart privoxy for those changes to be applied. Now, if I go to the live web site I should see the affects of my switcheroo. Can you spot my somewhat Simpsonsesque family on my view of the live New York Times Technology Page?

Note: If you want to always run privoxy and you've used port to install it, run the following command on Mac:
$ sudo launchctl load -w /Library/LaunchDaemons/org.macports.Privoxy.plist
$ sudo launchctl list | grep privoxy
Running A Personal SVN Server on Mac Snow Leopard
I love version control. I use it more than is probably appropriate to be honest. For personal projects that don't warrent their own page on github or google projects I like to use a quick and simple svn repo right on my laptop. I have Apple's Time Machine running on there and full weekly snapshots saved to an external hard drive, so I'm not too concerned about the repo itself getting hosed.
I do however want it to be easy to make frequent commits to files I'm working on. And I specifically don't want to run svn with apache -- that is simply overkill for my personal usage. Finding decent instructions on how to set that up on Snow Leopard is difficult, so here's how I did it:
1. If you haven't already done so, download the installer for the excellent My Subversion-1.5.5 Package. It's simply a double click job and you then have an svn server and client all ready to go.
2. Create a folder to hold your repo. Adjust the filepaths to match your own system, obviously:
mkdir /Users/michael/svn
3. Start the svn server, pointing it at the repository directory you just created:
svnserve -d -r /Users/michael/svn
4. Create a new repository:
svnadmin create /Users/michael/svn/projects
5. Set up users for that new repository. This may be the most complicated step, but even this is very straight forward if you can edit an ini file:
bbedit /Users/michael/svn/projects/conf/svnserve.conf
It's just an ini type of file and has loads of useful comments (so read them). You'll want it to say this:
[general]
anon-access = read
auth-access = write
password-db = passwd
realm = projects
That means anonymous users can only read, while authenticated users can read and write. Also the password file is going to be in conf/passwd. So let's edit that now:
bbedit /Users/michael/svn/projects/conf/passwd
Again, a very simple ini type of file. Add yourself to it and save.
[users]
michael = secretpassword
6. Checkout your repository:
cd ~/Workarea
svn co svn://localhost/projects
7. Make changes, commit:
cd ~/Workarea/projects
bbedit README.txt
svn add README.txt
svn commit README.txt -m 'Initial commit.'
8. Finally, you can always stop the svn server, if you want to save a little battery juice:
killall svnserve
JavaScript Constructors and Factories
One of JavaScript's peculiarities is the 'new' expression: when a function invocation is included in that expression that function behaves like a constructor. But the function, by itself, is not intrinsically a "constructor," it's still just a function, and as such any function intended to be called with new can just as easily be called without new. Of course if you, the function's author, are relying on the side-effects provided by new things may not work as intended when those effects are missing. There are however many built-in examples of functions which work equally well, if perhaps slightly differently, when invoked as a constructor or as an ordinary function:
Objects are created by using constructors in new expressions; for example, new String("A String") creates a new String object. Invoking a constructor without using new has consequences that depend on the constructor. For example, String("A String") produces a primitive string, not an object. -- ECMAScript Language Specification, 4.2.1 Objects
Constructors are not the only way to build objects, there is a similar pattern that involves a function, called a "factory." The difference is that a factory is not called with new and factories don't return their this value, as constructors do. You could say that String is an example of a function that can be invoked as a constructor to return a String object, or as a factory to return a string primitive. This design blurs the distinction between factories and constructors. What can we do in this blurry area?
One thing we can do is to create a wrapper for an existing constructor, where we add on our own extensions:
function ArrayPlus() {
var self = Array.prototype.slice.call(arguments);
self.last = function() {
if (this.length) return this[this.length-1];
};
return self;
}
var a1 = new ArrayPlus(1, 2, 3); // like a constructor
var a2 = ArrayPlus(1, 2, 3); // like a factory
// the result is the same, arrays that can last()
alert( a1.push(4), a1.last() ); // alerts 4
alert( a2.push(4), a2.last() ); // alerts 4
When we call ArrayPlus as a factory it returns an instance of an Array with an extra method added on. However the same thing happens when we call ArrayPlus as a constructor, how is that possible?
Typically when a function is invoked as a constructor it automatically returns the constructor's own this value, however in ArrayPlus we are returning our own object, named self. This is allowed under the rules of the ECMAScript Language Specification (section 13.2.2), but it is important to know what the difference is: self is an instance of an Array, not an ArrayPlus. In essense we are overriding the typical behavior of a constructor and forcing ArrayPlus to always behave like a factory, whether it is called with or without new.
I referred to the ECMAScript Language Specification a moment ago, and it does indeed allow arbitrary objects to be returned from constructors, but there is a caveat worth knowing related to this: a constructor (called with new) cannot return a non-object. For example:
function MyNumber(value) {
var self = new Number(value); // an object
return self;
}
var one = new MyNumber(1);
function Three(value) {
var self = 3; // a primitive
return self;
}
var three = new Three();
alert(one); // alerts 1
alert(three); // alerts "[object Object]"
What happened? The rules of ECMAScript say that if a constructor returns an object, as MyNumber does, then proceed as expected. But if a constructor returns a non-object, as Three does, then ignore the return statement entirely and instead return whatever the constructor's this value is. In the case of Three the this value is an instance of Three; in any case it is not the number 3, as you might think it is.
