PataMetaData: April 2012

Thursday, April 26, 2012

An HTML Resume, Part II

Last time, I discussed the general structure of my semantic HTML resume & my usage of the <section> element. For part two, I am focusing on a very specific segment of my resume.

A Publications List

I was excited to work on this <section> of my resume because it is such a unique form of content. My starting point was the <cite> element &, for the first time, I became frustrated with HTML5's semantics. To me, as someone who teaches citation styles & writes academic articles, logically <cite> should wrap an entire citation, e.g. author, title, publishing information, etc.. But <cite> in HTML5 is only for the title of a work. HTML5 Doctor has a great article on the drama surrounding <cite>'s semantics; basically, HTML5 has limited the scope of <cite> by forbidding the wrapping of an author's name or full citation in <cite>, which leaves us with no single element for an academic citation. This completely destroys the utility of processing data inside of <cite> tags because a work's title is a radically insufficient identifier. Consider a journal article; if you only know the title of the journal, you're a long way from tracking down the individual article, & HTML5 has no means of telling you where a citation begins & ends. The spec even shows an example where a citation is wrapped in a <p>, meaning A) it's almost certainly getting rendered improperly by the user agent, B) isn't part of a list (HTML5Doctor's example of <cite> in an academic citation, by the way, does show a citation as a list item within an ordered list), & C) is indistinguishable from other paragraphs. How do I tell a main body paragraph with a <cite> in it from a works cited list? For those working with citation information (read: every web librarian out there, but many other academics as well), this is a massive void that's necessarily filled with a few more complex approaches. True, it is perhaps asking too much for HTML5 to be as thorough as, say, Zotero's Citation Style Language XML format, but the current state of HTML citations is still disappointing.

However, <cite> (which, according to spec, user agents should render in italics) does correspond with what is usually italicized in a citation, such as a book or journal title. So by marking publication names with <cite> I was able to semantically render publication titles in italics, as opposed to the clearly un-semantic approach of wrapping a title in <em> tags or a-semantic approaches involving styled <span>s. But doing a hanging indent in HTML/CSS still feels very hacky (li { text-indent: -2em; margin-left: 2em; } ...yes, that's right, a negative text indent offset by a positive margin) & the <cite> element appears to be an ideal place for a user agent to render a hanging indent by default, something that no HTML element currently does. If the standard isn't going to do that, then at least a "hanging" flag for the text-indent property in CSS (e.g. { text-indent: 2em hanging; } would replace the negative text-indent hack) should be on the table.

The second part of marking up my publication list was exposing the citations to OpenURL applications. I read up on COinS & attempted to implement the standard, which is an a-semantic approach involving empty <span>s with a string inside the title attribute that can be tacked onto an OpenURL resolver domain. Producing the code is somewhat nontrivial &, while I'm glad I read the spec & developed my inchoate understanding, I used the COinS Generator to speed up the process. Here's an example <span> from my publications list:

<span class="Z3988" title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rfr_id=info%3Asid%2FoCOinS.info%3Agenerator&rft.genre=article&rft.atitle=Hardening+the+Browser%3A+Protecting+Patron+Privacy+on+the+Internet&rft.title=Reference+%26+User+Services+Quarterly&rft.issn=1094-9054&rft.date=2012&rft.volume=51&rft.issue=3&rft.spage=210&rft.epage=214&rft.ssn=spring&rft.aulast=Phetteplace&rft.aufirst=Eric&rft.auinit=EP&rft.au=Eric+EP+Phetteplace"></span>

Could that be any prettier? Adding embedded metadata like COinS is probably not a step most authors go to but has an instant gratification that mere semantic HTML lacks: you can instantly see your metadata consumed by applications like Zotero & Mendeley. If your library uses an OpenURL resolver, chances are that a "find full text" icon will appear next to your citations when they're viewed from within the library network. Wikipedia employs this tactic to great effect.

Zotero offers to save two articles in my publications list

Ahhh yeah, that's the good stuff.

So here we have a totally a-semantic approach that's still exposing metadata to lots of applications that can reuse it today. That is the promise of all this semantic markup: that machines will be able to process it with greater accuracy & that humans will think up interesting uses for the processed markup. So while COinS may look hideous, it's a pretty great example of the potential for markup to interact with APIs.

Tuesday, April 17, 2012

Ageism, Technology, & Learning

I've seen a lot of library conference sessions & research papers lately that talk about generational differences in librarianship. In general, I don't go to these sessions nor do I read these papers. Why? Because generational differences are dangerous generalizations at their worst & at their best they're mere strawpeople relative to the real issue at hand. In a way, they're just the Myth of the Digital Native spoken in a different dialect.

One real issue in librarianship is the differences in technological skill in our profession. I could be termed a "techie librarian": I work with Drupal, JavaScript, & web services. I love my browser extensions. But at the same time I'm very new to library web services. I've been in my first full-time librarian position for under a year. I'm still learning JavaScript, RESTful APIs, OpenURL, & a host of other standards. Meanwhile, virtually all of the techie librarians I look up to are from a prior generation. I'm no Roy Tennant; I'm no Ed Summers or Dan Chudnov or [insert your favorite librarian here]. So the idea that there's a lack of technological skill in older librarians rings particularly false to me.

Technological skills gaps are an issue, it's just that generations are a poor proxy. There are younger librarians who are luddites, there are older librarians who are coding ninjas. &, to go one layer deeper, the issue isn't one of present skill level but rather willingness to learn. I may be new to the profession & a web neophyte, but I work very hard to learn. In fact, I'd say that each week I acquire new & useful knowledge, whether it's a software tool or a code snippet. So the problem is when people believe it's OK not to evolve—no, that it's OK to remain the same. We all have to strive to be better, or we risk irrelevance.

A small but incredibly validating experience today inspired this post. An older tutor—not a librarian, but we work in the same building & with the same students—attended a staff training workshop I was teaching on keyboard shortcuts. Now, I'm a keyboard nerd. I love my shortcuts, I love my QuickSilver. & I made a completely ageist, asinine assumption; I thought that everyone would tune me out & get nothing worthwhile out of the workshop. But this tutor, who had previously stated that we had no business going into social media, loved it. She ate it up. She asked a really interested question (Q: "Do you think the mouse will go extinct?" A: Yes, but because of touch screens, not the keyboard.) at the end. & that's precisely the sort of person every profession needs: someone who's curious, who wants to learn, to improve. Age is irrelevant, an egregious red herring. Let's get to learning.

Tuesday, April 10, 2012

A Semantic HTML Resume

A resume is an interesting piece of content, rife with opportunities to apply semantic markup & some more advanced features like COinS (at least in academic resumes). I recently revisited my resume web page, bringing it up to date but also redoing the markup for great justice. I'm going to divide this topic into two posts, because it's not the most exciting & I'm really going to drone on about citations in the second part. Aren't you excited?

High Level Overview

A resume is a perfect use case for the new <section> element, since it's divided into multiple independent pieces each with its own header. Therefore, instead of the standard, a-semantic <div> markup that my previous page used, I decided to give <section> the job of demarcating my work experience, education, et cetera. Each section has a header that describes its content. I chose the <h3> element for my headers, because the <h1> is used for the title of my website, the <h2> is used for the title of the page (e.g. resume in this case), thus <h3> makes sense as its the next step down in the hierarchy. Indeed, looking at my resume page in the HTML5 Outliner, it is structured in the most logical manner. Unfortunately, my Drupal theme wraps the main content of a page in an <article> tag, which is inappropriate in this case & throws off the outline, but other than that I'm good.

Within each <section>, however, I was uncertain about how to proceed. My previous page used HTML's definition list (<dl> being the root element, followed by a <dt> for a term and a <dd> for a definition) & I stuck with that, albeit after some hesitation. Is a list of previous & current employers really a set of definitions? So I went to my go-to source for HTML5 semantics: the HTML5 Doctor. &, lo & behold, HTML5 has repurposed <dl>, turning it from a "definition list" to a "description list" in a way that better suits my usage.

Schema.org

What really incited me to redo my resume was listening to a podcast on Schema.org & microformats, as well as reading the recent Code4Lib article on implementing Schema.org in a cultural heritage context. I wanted to try out this new means of smuggling metadata into Plain Ol' Semantic HTML (POSH, for those in the know ;). The hResume microformat would have been another choice & I don't mean to advocate for one approach to the exclusion of the other. For my purposes, the Person schema was obviously the best choice & it provides several fields that would be present in most any resume or CV.

Unlike Microformats, which operate by placing structured metadata inside HTML class attributes, Schema.org's microdata approach utilizes a couple new attributes. First off, your root element that contains all text being described by the schema will have a boolean itemscope attribute as well as an itemtype attribute with a value pointing to the URI of its schema. Then, beneath that root element, one adds itemprop attributes to the various pieces text that fall under a property of the chosen schema. So, for instance, the <dt>University of Illinois</dt> in my Education section becomes <dt itemtype="alumniOf">University of Illinois</dt>. Some of the Person properties I used include name, worksFor, jobTitle, alumniOf, memberOf (for professional organizations), & url. That's a lot of additional information exposed to machines. While I can imagine many more properties, perhaps the one glaring weakness in the schema is the lack of a "creatorOf" property. There's "performerIn" but not "creatorOf", which is silly. The fact that other Schema.org items such as books, movies, & music all have "author" or "creator" properties only highlights this myopia.

Once I marked up my resume in the Person schema, I could view the results of the invisible metadata in a few ways. For one, Google has a Rich Snippets feature in their webmaster tools that shows what data the Googlebot sees, including what can be exposed to a Custom Search Engine. You can also see previews of what your page might appear like in a search result, though there's no guarantee that Google will use the rich snippet. There's also the Live Microdata site which validates your microdata & can show you what it looks like as JSON. Right now, the big selling point—that I'll get a customized search result—isn't panning out. It's actually a super ugly result for me when I'm signed into Google, with a citation as preview text & some Google+ spam below. But there are other reasons to embed metadata in HTML & I had a pleasant learning experience.