Friday, February 22, 2013

Reflections on Writing JavaScript

I've been working with JavaScript for a little while now & I want to briefly share changes I've made in my coding style. These changes, while seemingly pedantic, can be very meaningful in constructing a maintainable script.

Use Anonymous Functions Sparingly

When I first started writing semi-serious JavaScript using jQuery, I was passing anonymous functions as parameters frequently. It's a pattern that's condoned by Codecademy & all the brief jQuery API examples, but it gets messy & unsustainable quickly. Throwing anonymous functions around all the time misses the entire point of functions, i.e. that they're named, reusable chunks of code. What's clearer here:

// anonymous
$.getJSON( "http://some.api.url/gimmejson", { q: "search+term" } , function ( response ) {
        var len = response.len;
        if ( len > 0 ) {
            console.log( "Well, at least it's not empty..." );
        } else {
            return "ERROR ERROR DEATH FATAL ERROR";
        }
        var dataset = [];
        for ( var i = 0; i < len; i++ ) {
            dataset.push( response.items[ i ].text );
        }
        return dataset; },
    );

// named
$.get( "http://some.api.url/gimme.json", processResponse );

Having ten lines of anonymous function pasted into a function call as a parameter is probably the least readable code pattern commonly in use. In particular, if other parameters also span multiple lines (e.g. if I pass a much larger object in the second parameter above) it is a chore to differentiate between commas that separate items within objects & arrays & the commas that separate the parameters you're passing. Debugging is also easier with named functions; you can look back through a call stack that makes sense, rather than discovering that the last function called before an error was but one of the twelve anonymous ones sprinkled throughout your code.

The one disadvantage is that it's not immediately evident that processResponse is a function; it looks like it could be any type of variable. That's why the best, most readable way to use most functions is by passing parameters in an object, which jQuery makes extensive use of:

// passed in an object
$.ajax( {
    url: "http://some.api.url/gimme?json=yesyesyes",
    dataType: "json",
    data: { q: "search+term" },
    success: processResponse,
    error: displayError
} );

This makes the role of processResponse much clearer; it's a callback function called upon a successful request. If the $.getJSON function let me pass in both a success & an error callback, I'd have to look up the function's syntax every time just to figure out which anonymous function was assigned to each. With the object parameter, their roles are doubly evident both from the name of their key as well as the name I've given the function.

&& and ||

&& and || are frequently used in assignment expressions, while intuitively they only belong inside comparison expressions. It's not something I do a lot but it's incredibly frequent in code libraries so understanding its usage is important. Basically, && and || are not merely comparison operators; they are expressions which return a value. && returns the first value if it is falsey & the second if the first is truthy; || is the opposite in that it returns the second value if the first is falsey & the first if it is truthy. You can see how this works in typical comparisons, where && is used to mean "and" & || is used to mean "or". Example:

if ( false && true ) // -> false because 1st is falsey, code won't execute
if ( false || true ) // -> true because 2nd is truthy, code will execute

We know intuitively that these make sense, because "and" usage means that both the first and the second conditions must be true while "or" usage is happy if either the first or the second is true. But what do you think this code, taken from the Google Analytics snippet, does?

var _gaq = _gaq || {};

Does it make sense to have a || outside of a conditional statement such as if? Here, || returns _gaq if _gaq is truthy (e.g. if it exists) but it will return an empty object literal if _gaq is falsey. Then, later on in my code, if I add a method or property to _gaq I've guaranteed that it exists so I won't receive a reference error. So a more verbose but less tricksy rewriting is:

if ( _gaq !== undefined ) {
    _gaq = _gaq;
} else {
    _gaq = {};
}

Writing one line as opposed to five makes sense; an if-else condition is overkill here, when we just want to check if our object already exists & initialize it as empty if not.

Spaces

Spaces are good. I like an abundance of spaces in my code. I pad array brackets, object curly braces, & parentheses wrapped around control flow expressions or function parameters with spaces. So I write

var obj = {
    nums = [ "one", 2, three ],
    funk: function ( param ) {
        if ( param.toLowerCase() === 'parliament' ) {
            return 'Give up the funk.';
        }
    }
};

instead of

var obj = {nums=["one", 2, three],
    funk:function(param){
        if (param.toLowerCase() === 'parliament') return 'Give up the funk.';
}};

One telling space is the parentheses that wrap a function's parameters. I try to always put a space in between the term function & the parameters in a function definition, while there's no space when the function is being executed.

var funk = function ( args ) { ... } // function assigned to variable
funkyFunk function ( args ) { ... } // function declaration
funk(); // function being executed.

Functions are thrown around so frequently in JavaScript that this subtle difference, if consistently enforced, can go a long ways towards helping you read whether a piece of code is being executed or defined for later use.

Switch

I generally avoid the switch statement; its syntax is weird. I find it uncharacteristic that the code blocks following "case foo" aren't wrapped in curly braces. If I had to guess how a switch statement would be done, the cases would look more like:

switch ( foo ) {
    case ( bar ) {
        doSomething();
        break;
    }
    case ( bah ) {
        doSomethingElse();
        break;
    }
}

which parallels the control flow operators. switch doesn't save much space over a series of if comparisons & carries the potential hazard of unintentional fallthrough.

++ and ?

I follow a lot of Douglas Crockford's advice, but not his avoidance of ++. I use ++ in for or while loops & it hasn't come back to bite me. Sometimes I'll use it to increment a value outside of a loop. I think I understand its usage in these limited contexts & while it isn't a huge gain in terms of saving space, it's nice to put all my loop details in one expression. I also don't think the ternary operator is worth avoiding; it's very handy during variable initialization even if it's a little opaque, much like || and &&. The ternary operator looks like:

var someVariable = ( expression ) ? "value if expression evaluates to true" : "value if expression evaluates to false";

We could rewrite the Google Analytics code:

var _gaq = ( _gaq ) ? _gaq : {};

It does the exact same thing; check to see if _gaq exists, initialize it to an empty object literal if not.

You Don't Hate JavaScript, You Hate the DOM

I, as many JavaScript programmers before me, have discovered that JavaScript is really not so bad a language. It has its peculiar errors—the extreme unreliability of typeof & the leading zero issue with parseInt come to mind—but it also has gorgeous features. In particular, the first-class nature of functions is wonderful & I can't live without it. Passing functions as parameters to other functions is mind-blowing once you realize how much you can achieve with it.

But JavaScript's biggest issue isn't the language itself, it's the way it interacts with HTML pages via the DOM. DOM manipulation is tough, the commands are verbose, & cross-browser incompatibilities abound. There's a reason why people love jQuery; it removes the pain of accessing & altering the DOM, scaffolding on top of CSS selectors that most web developers already know. The one biggest piece of advice I give to people who want to learn JavaScript is to start with jQuery. With a nice layer of abstraction, you can actually do something on a website which is amazingly gratifying. The building blocks of the language are easier to acquire when you see their utility on the web, as opposed to repeatedly printing text to the console.

Conclusion: Steps to Learning a Language

There are a few steps you go through when learning a programming language. The very first step is simply understanding what syntax is valid. Writing echo "Hello world!" will result in an error in JavaScript. The next step is understanding the advantages of specific syntax choices; knowing whether a particular situation calls for a a particular control flow operator, for instance.

The next step after that is meaningless in terms of how the code executes but of paramount importance to programmers, who tend to be human; knowing how to write clear code. Once I had the basics out of the way, I found myself having lots of opinions on what makes a piece of JavaScript understandable. Now, every time I go back & look at something I wrote previously, I find myself employing all sorts of conventions (spaces! fewer anonymous functions!) that I've discovered or come to appreciate. While much of JavaScript: The Good Parts went over my head initially, I now understand its essence; that deliberate choices when writing JavaScript can not only avoid common programming pitfalls but increase clarity.

Friday, February 15, 2013

Optimizing IIS for Performance & Security

My college uses Microsoft's IIS 7 for its servers instead of the more common Apache. That's fine; IIS is probably a good server. I don't know, I'm not qualified to say which is better. But one thing's for sure: Apache is a easier to use & learn simply because of the availability of documentation. If you're a full stack web person starting a new project, please use something with community support & documentation. Apache plays nice with Drupal, there's tons of security & performance tweaks documented online, & it has some great add-ons for any situation.

But hey, I'm stuck with IIS. This post is mostly a note-to-self on how to optimize IIS. I'm not at all a server configuration expert, so please don't take it as gospel. Most especially, if I'm flat-out wrong about something, I'd like to hear about it.

For the tl;dr & the resulting file, see my web.config github repo.

Caching

The hardest part is caching correctly. The goal is to use far-future expires headers, similar to Cache-Control: max-age=9000000. There are many different means of caching in HTTP but far-future expires is both simple (the server just says "hey, you can hang onto this content for X seconds") & effective. Some other caching methods end up sending "conditional get" requests, essentially saying "hey, server, I have version 3.2 of this file, is that current?" & the server sends a response back saying either "yup, carry on" or "nope, here's the current version." That is slightly less error-prone, because you can update a file on the server & it'll still make its way to clients that have cached the content, but that extra HTTP request adds up quickly. To update files with max-age or other far-future type caching schemes, I use filename-based versioning, essentially bumping a version number like "style.1.css" to "style.2.css" every time I change a file. Because remembering to change filenames is tedious, I either have a CMS (Drupal's built-in caching) or a build script (Yeoman) handle it for me.

In IIS 7, unfortunately, it looks like you can either set static content caching on or off with little in between (Apache lets you specify expires time by MIME type). If there's a particular static MIME type that you don't want to get cached, too bad. That's problematic for at least two types: text/html & text/cache-manifest. These are both static, text types but the files need to be able to change without changing their name. If you altered your HTML file's name every time it changed, you'd constantly break incoming links. The appcache can't change because it causes this weird loop wherein clients that have previously visited the site & primed their cache can never get an updated version because they always looks in the wrong place; Jake Archibald covers this brilliantly in Appcache Douchebag.

So to get around this conundrum, I use two layers of web.config files: in the site's root where HTML, server-side scripts, & the cache manifest reside I use a config with no caching whatsoever, that's <clientCache cacheControlMode="DisableCache" />. Then, in any subdirectory where static content (images, CSS, JavaScript, fonts, etc.) might reside, I override that setting with an aggressive, far-future expires header.

Finally, I remove ETags with a two-part rule. The HTML5 Boilerplate server configs botch this horribly, ruining the X-UA-Compatible header in the process, but some searching around StackOverflow found me the right combination of rules to remove ETags per performance best practice (see Steve Souders' book).

GZIP

I just copied this bit from the HTML5 Boilerplate Server Configs & made sure it worked with YSlow & other external tests. It's super important to GZIP content, arguably the biggest performance win you can get, & yet that's not the default in IIS 7.

Security

I'm not an expert at hardening servers but it makes sense to eliminate headers that unnecessarily expose server information without any added benefit. I blank the X-AspNet-Version, X-Powered-By, & Server headers. Another IIS quirk is that you can't simply remove the Server header, all you can do is set its value to be an empty string which is at least enough to protect against the version number being exposed.

Rendering Engines

Since the X-UA-Compatible meta tag doesn't really work, I send it as an HTTP header. This forces IE to use Chrome Frame if it's available or the latest rendering engine (e.g. no IE 8 using the IE 7 engine) if not.

Saturday, February 9, 2013

Eric Explains URLs (video)

I'm teaching a course entitled "The Nature of Knowledge" and we're specifically focusing on what happens to knowledge in a digitized, networked environment. I gave the class a "technology inventory" survey to complete and the hardest question on it proved to be identifying the top-level domain of a given URL. As such, I made this video to explain URLs a little bit more in-depth.



Weaknesses

I didn't do a particularly good job of explaining a few things in this video. I want to make it clear that it's not a flawless intro. Hopefully I can remake it sometime, but for now here are some caveats:

  • What does a scheme mean? I introduce two of them but don't describe their implications, i.e. that they're transfer protocols.
  • Subdomains are basically everything in the domain that's not the TLD. I don't think that's clear from my example.
  • Search can literally be a file, e.g. search.php, search.html, search.pdf (though that wouldn't have a query string). I know that the idea of URLs pointing to files is mostly an antiquated idea in the days of database-driven CMSs & web frameworks like Ruby on Rails. But it's a good starting point to learn more about them.
  • Google is a bad example. I knew that but I didn't realize quite how poor, because Google doesn't use a ? to distinguish the query string, oddly enough, so a Google search actually contradicts how I'm describing a query string.

Anything I missed? Open to criticism but I hope this is a decent overview despite its flaws.

Also, I have a git repo of the site I made to demonstrate the different pieces, totally willing to share if someone wants it.