Tuesday, July 23, 2013

Adding LibGuides to Drupal's Search Results

This will be another super specific post about how to do something useful for libraries in Drupal. The tl;dr is that you can use LibGuides XML Export, the Feeds module, and the Feeds XPath Parser module to make LibGuides show up in your Drupal site search results. So when users search for "english composition" and you don't have any study guides on your Drupal site, something relevant from LibGuides might show up.

I was inspired to do this by the Drupal in Libraries book, though I haven't read it (I saw it mentioned in American Libraries). I didn't see specific details in the book's preview, and Michigan is putting the XML into their Solr search index which is too sophisticated for my small college, so I thought a brief write-up might benefit other libraries who have LibGuides but don't use Solr. Libraries using other CMSs might still benefit from the general outline, though the specific details won't be useful. I'd be shocked if Wordpress libraries couldn't do the same, using WP All Import or other plugins.

These directions are specific to Drupal 7; I bet the same can be achieved in 6 but I can't vouch for any of the settings or code being the same.

Set-up: LibGuides & Modules

In order to do this, you have to do a couple steps first to prepare both LibGuides and Drupal.

  • Purchase the Images and Backups Module from Springshare. In my experience, the pricing is very reasonable, and the "images" part of it means you can upload images to LibGuides which makes adding them to guides much, much easier for authors.
  • Install the Feeds module, a popular and well-maintained module for mass importing nodes from structured data (RSS/Atom feeds, CSV files, OPML files) into Drupal
  • Install the Feeds XPath Query module which adds an extra parser to your Feeds installation, allowing you to import nodes from arbitrary XML documents

Once you've done these three steps, download the XML export from LibGuides (Springshare will email you when it's ready) and enable both modules in Drupal.

Process the XML

I don't work with XML much (shame, librarian, shame!) but this is a step where you could edit the LibGuides export to make it more useful as an imported node. In my pre-processing, I only wanted to accomplish one thing: when I import the nodes, I don't want any unpublished or private guides to be published in Drupal. We have a few under-construction or private guides that shouldn't show up in search results.

To do so, there's just one Drupal quirk you have to know: later on, in configuring the way your data maps to Drupal nodes, you'll be able to map the contents of an XML element to a Drupal node's "publication status" field. 1 means published and 0 means unpublished.

Luckily, the LibGuides XML has a <STATUS> element underneath each <GUIDE> which you can easily map to either 0 or 1. To process the XML, I performed a simple pair of search-and-replace operations in Sublime Text:

  • Search for "<STATUS>Published</STATUS>" and replace with "<PUBLISH>1</PUBLISH>"
  • Search for "<STATUS>.*</STATUS>" and replace with "<PUBLISH>0</PUBLISH>"

That second search and replace uses a teeny bit of regex: the period stands for "any character except a line-break" and the asterisk means "any non-zero number of the preceding character". So I'm searching for any non-empty string of text inside of a <STATUS> element and turning it into <PUBLISH>0</PUBLISH>, which works because all of my published guides no longer have a <STATUS> element after the first search-and-replace.

Configure the Feeds Importer

Back inside Drupal, we need to create a new content type and set up the Feeds module to receive our XML file.

  • Under the "Structure" menu of the admin toolbar, select Content Types
  • Add content type and then give it a name and description, e.g. "Imported LibGuides"
  • Add fields to your new content type, which at the very least should contain two new fields: an "ugly URL" field for LibGuides that don't have a friendly URL, and a "friendly URL" field. You can make these Text field types with the standard settings.
  • Under the "Structure" menu of the admin toolbar, select Feeds importers (or visit {{drupal root}}/admin/structure/feeds)
  • Add importer and then give it a name and description, e.g. "LibGuides Importer"

There are a lot of settings here, which can seem intimidating, but is actually great. The Feeds module gives you control over how data is imported into Drupal and everything is straight-forward if you take the time to read through it. I'll walk through my basic settings, but just know that you could do whatever seems reasonable here and be OK; the only piece of this post you might need to reference are the XPath queries later on.

  • Basic Settings
    • Attach to content type: select your LibGuides content type here
    • Periodic import: off, periodic import is only for grabbing nodes from web feeds, e.g. RSS
    • Import on submission: check
  • Fetcher: File upload
    • Allowed file extensions: you can leave as is, but I put XML since I'll only be uploading XML files
    • Upload directory: leave as is
  • Parser: XPath XML parser (this option only appears if you installed Feeds XPath Query)
    • Settings: see the section below on the XPath queries, but trust me this won't be that painful
  • Processor: Node processor
    • Bundle: select your LibGuides content type again
    • Update existing nodes: this is a bit of a judgment call, but you'll be fine with either "Replace existing nodes" or "Update existing nodes."
    • Skip hash check: I leave this unchecked but you'd be fine either way
    • Text format: your call, I leave as "Plain text" which is fine for search results
    • Author: anonymous, or your user if you want to brag about how many nodes you made
    • Authorize: probably should leave checked
    • Expire nodes: Never
    • Mapping for Node processor: make the Title, Body, Published status, Friendly URL, and Ugly URL fields all map to an "XPath Expression" source. The two URLs fields are ones we created with our Imported LibGuides content type, so if you chose a different name for them back then they will appear differently in the Target drop-down options here.

Whew, we're done! I know that looks like a lot, but Feeds has a pretty nice UI for such a sophisticated and powerful module.

Parsing XML with XPath

Now for the fun part: we need to map XML elements in LibGuides to Drupal fields using XPath expressions. We also get to say things like that which only .01% of humans understand.

XPath is a query language for XML, if you know SQL or CSS it's kind of similar. It gives you a way of traversing the structure of an XML document to retrieve the contents of various elements. The LibGuides XML is structured in a pretty logical, simplistic manner so writing our queries won't be tough. Back in the Feeds importer settings that we were just editing, select the Settings link under the Parser section. This gives us a menu where we can write our XPath queries. Here's the setup that I use with some English translations:

Context: //GUIDE

We want our queries to run in the context of each <GUIDE> element. We could do without this, but it means we'd be prepending /LIBGUIDES/GUIDES/GUIDE/ to each query below, which is silly.

title: NAME

body: DESCRIPTION

Set the name of the LibGuide to the node's title and the body of the node to its description. The description is the brief sentence which shows up underneath the name of a LibGuide.

field_friendly_url: FRIENDLY_URL

field_ugly_url: URL

Each <GUIDE> element has two URLs, so we map both of those to the two custom fields we set up on our Imported LibGuides content type. Once again, if you named your fields something different, their machine-readable names (which is what you see in this menu, they're just lowercase with underscores instead of spaces) will be different.

status: PUBLISH

Remember when we edited the LibGuides XML to set up a <PUBLISH> element that's either 0 or 1? That's where this mapping comes into play, taking that Boolean value and using it as Drupal's publication status field.

You can leave all the "Select the queries you would like to return raw XML or HTML" options unchecked. Note that this could provide some interesting options if you were doing more sophisticated things with LibGuides, since the XML export contains all the raw HTML of the various boxes in each guide. Debug Options can also be left unchecked, although if you're testing this process I recommend checking them off. The debug options show you what Drupal found with each XPath query, which can help you configure the importer properly.

I leave "Allow source configuration override" unchecked as well. Since we just set up our XPath queries the way we wanted, there's no need to override them later. However, you could do something interesting where you set up a generic LibGuides importer in these settings, then have multiple different ways of mapping the XML into nodes.

Redirecting Imported Nodes to LibGuides

Before we actually import our LibGuides, we want to make sure they're handled appropriately. That is, we don't want people clicking on their search results simply to see some lame text and URLs on the screen, we want them to be redirected straight to the LibGuide.

There are probably other ways to do this, for instance the Field Redirection module, but I use node templates, which are PHP templates that apply to only specific node types. Under the Templates folder of your theme (which will be somewhere in sites/all/themes likely) create a file named "node--imported-libguides.tpl.php" where "imported-libguides" is whatever you named your LibGuides content type but with hyphens replacing spaces. Inside that template, paste the following PHP:

<?php
// redirect user to LibGuide rather than node if user is not signed in
// uid 0 means anonymous user
if ( $user->uid == 0 ) {
  // prefer friendly URL if available
  if ( $node->field_friendly_url ) {
    drupal_goto( $node->field_friendly_url[ 'und' ][ 0 ][ 'value' ] );
  } else if ( $node->field_ugly_url ) {
    // ugly_url should always exist but just in case, use a conditional
    drupal_goto( $node->field_ugly_url[ 'und' ][ 0 ][ 'value' ] );
  }
} else {
  print render($content);
}
?>

I've written comments in the code, but essentially here's the path this code steps through:

  • Is the user anonymous? If yes, redirect them. If not, we assume the user is some kind of editor, so we print out the lame text fields. This makes it easier for librarians to edit nodes after they've been imported, but assumes that your users don't have Drupal accounts. If they do, you'll need to consider the first if condition thoroughly to make sure only the right types of users are seeing the plain text.
  • Does the node have a friendly URL? If so, redirect anonymous users to it.
  • If not, the node must have an ugly URL, redirect anonymous users to that.

I noted it above, but because it's so important: if you allow users to create Drupal accounts, this template won't work well. It won't expose confidential data or anything, but it's definitely meant for Drupal sites where all non-editor traffic is anonymous.

Your theme may also have a particular way of printing out nodes that you want to stick to; in that case, you'd be better off copying node.tpl.php or another node type template rather than using my code verbatim. You could put the logic piece of this code at the top of your node template, dropping the else clause at the end. That would work fine as long as it's named appropriately, e.g. "node--imported-libguides.tpl.php".

We're Almost There

Now that our template is set and our importer configured, we need to create an importer node, give it a file, and let it run wild. Go to {{drupal root}}/import to see a list of available importers, including the default ones that come with the Feeds module and your LibGuides Importer. Select LibGuides Importer and you're greeted with the usual node editing form, except this time there's a place to upload a file towards the top. Use that to browse to the processed LibGuides XML, then upload it. You can leave the body and other fields blank.

Once you've created this node, it will have an Import tab with an identically named button. Simply click that and your nodes should be created in Drupal, with whatever debug messages you chose in the importer displaying as well.

Totally screwed up the XPath queries, causing a bunch of broken and useless nodes to be imported? No worries, the importer node that you just created has a Delete items tab which can delete any of the nodes which it imported. This makes trying out a Feeds importer rather risk free; just keep trying until you get it right.

Final Steps

Drupal's internal search index will still need to index the new nodes before they show up in its results. You can run cron a few times depending on how many nodes you just added and they should show up. Try a search for the title of a LibGuide that wouldn't return any of your other pages, and make sure clicking on a LibGuide result from an anonymous session causes you to be redirected to the guide.

As LibGuides are added and removed, you'll have to sync them to their Drupal nodes again. However, once you've done the process once, it only takes a few minutes to grab a new XML export, upload it, and click the import button.

Monday, July 1, 2013

Foreign For-In, or Python as a First Language

...being a brief recap my experience at the Python Preconference at ALA Annual. In general, the session was a smashing success and I was elated to see a diverse group of people picking up Python so quickly. Without going into details elsewhere, which I think other attendees or organizers will cover, here's one struggle and one pleasant surprise from the preconference.

Explain a For-In Loop

Describing how a for-in loop works was difficult and I repeatedly ran into attendees who just couldn't quite grok it. A Python for-in loop looks like:

for word in wordlist:
    print word

That would loop through the wordlist data structure, which we'll say is a list (similar to an array in other languages), printing each term to the screen. Simple, right? But it's actually pretty weird, because in the above example what exactly is word? It's a local variable that gets a new value each time through the loop. If for-in loops for lists didn't exist in Python, you might implement them like so:

i = 0
while i < len( wordlist ):
    # being super explicit here
    word = wordlist[ i ]
    print word
    i = i + 1

len( wordlist ) here returns the length of the wordlist list, for non-Python people. Otherwise, I assume the syntax is straightforward for anyone who knows a little code. The biggest disadvantage to this implementation is you end up with two variables in the scope—i and word—neither of which is useful after the loop has run.

I'm not sure my explicit for-in loop is more clear to a new programmer, but it's my conceptual model. Students struggled with understanding the for variable's name; where does word come from? In the lecture, Becky Yoose used this example:

for fruit in pies:
    print fruit

The reaction from attendees seemed to be "since pies is a list of different fruits, the variable name has to be 'fruit' here." As if Python was somehow doing natural language processing to figure out a good descriptive term of an individual item in a thematic list. It's a weird thing to grasp conceptually, perhaps the crux being you're getting a variable without any assignment statement. That's a nice convenience for programmers coming from other languages but it obscures what's going on for learners.

Nested Loops & First Languages

On the other hand, I found that a lot of our exercises and final projects involved nested loops, sometimes three to four layers deep. Everyone seemed to absorb this without conceptual difficulty. Maybe it's my own experience speaking here, but I get more and more anxious the deeper my indents go. A lot of this anxiety is based in JavaScript, where blocks wrapped in curly braces tend to take up more space and are harder to parse than in whitespace-happy Python. The uglist code in the world is an instantly-invoked function expression which ends in a bunch of closed code blocks:

            }
        }
    }
}( 'this happens way too often in JavaScript' ) );

Python's conveniences, like range() and how the for-in loop works seamlessly across different data types (lists, dictionaries, even strings. Strings, people!) are a serious boon to beginners. I still think JavaScript makes a great first language for a few reasons: 1) everyone already has it installed via their web browser, so there's zero setup barrier, 2) the web is where data and applications live these days and JavaScript is the language of the web, and 3) a trivial amount of jQuery can make cool things happen. Other languages require more investment before the cool things go down.

But the setup process wasn't an issue for the preconference. We held a help session the night before and only two people came; one of them already had Python installed and on the Windows path, they just needed confirmation that they'd done it right. A number of factors contributed to the ease of setup: many attendees had Macs which typically come with a 2.6.x or 2.7.x version of Python, the Boston Python Workshop docs are great and cross-platform, and a fair portion of attendees were advanced computer users. So with an easy setup, Python (or Ruby) is a sensible choice for a first language.