This week I encountered a significant roadblock when trying to use OpenURL in a situation where it is a natural fit. Let me explain the scenario. A scientific researcher at the company where I work built an extensive bibliography of journal articles on a particular subject, and wants to publish that bibliography on the company intranet, complete with hyp[er]text links to the full text. This person initially thought it’d be ok to simply mount the full text articles that he had downloaded in the same webspace as the bibliography, and simply link to the files. Of course, that ideas was quickly shot down. Instead, we thought, why can’t we take this bibliography, check it against our SFX KnowledgeBase to see what articles we have available in full text, and then output the complete OpenURL for each of those articles for this researcher to use when marking up and publishing his bibliography?
The use case sounds straightforward, right? Turns out that it is anything but. I was provided with a text file of citations and was asked to come up with appropriate SFX links for each. Of course I could have manually rekeyed the citations one by one into a search form querying our SFX KB, but that would take quite a long time and quite a bit of effort. I tried to think of how this whole process could be automated.
On the advice of Dan Chudnov I downloaded an open source application written in Perl called Biblio-Citation-Parser, which on the face of it seemed to be exactly what I needed. I need a way to automatically parse the whole list of citations into the necessary chunks of metadata, and then automatically generate an OpenURL for each citation. After trying unsuccessfully to get Biblio-Citation-Parser to work (this isn’t a limitation of the software but of my Perl expertise), I sent queries out to other SFX users as well as to the Code4Lib discussion list. There were several responses from members of the Code4Lib discussion list, some of whom mentioned the application that I already knew about. But it turns out that pretty much nobody in that community [at least among those who responded] had ever used it, and also, that nobody in that community had come up with a good solution to this parsing problem themselves.
Since the original citations were stored in Reference Manager, one of the more common citation management software applications, I wrote back to the colleague who first asked me to help with this situation, asking him if he could provide me with the Reference Manager files. He did, and I downloaded a free trial version of the software, imported the references, then exported them in RIS format. Next, I imported the RIS output file into Zotero, and then exported the whole bibliography from Zotero into a readymade HTML bibliography. Because of Zotero’s built-in COinS functionality, the readymade HTML bibliography is automatically populated with OpenURLs. But I wasn’t done yet. I had to go through each citation by hand and test whether we did indeed have the article in full text, and also, to edit the HTML coding to substitute our company’s specific SFX base URL in each link.
In the end, I achieved what the user wanted — a list of bibliographic references with SFX links as the hypertext links. But it was a huge amount of work, and I kept asking myself, surely there is a better, easier way to do this?! Surely, someone, somewhere has already solved this problem of how to readily parse bibliographic citations in a text file and run them through a process to check for which articles are available in full text?
Maybe there is a much simpler solution and if you know of it, please comment on this post to let me know. I’m left thinking that this whole OpenURL stuff still has a ways to go in terms of ease of implementation for situations like I described.
Steve, I don’t have an answer to your specific question but some general comments on the not-quite-ready for prime-time-ness of all this.
As you know, I’m now trying to output COinS data from Zotero for use in my weekly reading lists, to add them to the citations of old papers of mine that I’m posting to the web, etc.
But even though it is a “shiny new thing” the process is archaic as all get out. 1) Have to get the citation into Zotero (understandable!) 2) Have to output as a bibliography in HTML. 3) Have to open web page and get as source code (possibly multi-step process). 4) Copy and paste source code for each item into blog source code, or web page. 5) Rinse; repeat.
This is simply insane! And I’m not even attempting the level of local integration that you are. I only want them in and then people with Zotero or an OpenURL resolver will see them.
Without even addressing your issues, although this is a part of the overall process, I need Zotero to just output the code to perhaps the clipboard, or even better to paste it into whatever open application I have at the insertion point.
Once that becomes simple I might have the energy to try and do even more with it.
Question though … have you noticed any sort of specifics as to which items get COinS and which do not? Sometimes books and more often my articles don’t. I think it’s the ones I have entered by hand, but I haven’t actually done a real test to determine for sure. Any observations?
Hi Mark,
I don’t know exactly why some items get COinS and some don’t, but I do know that in order to get all of the COinS to work initially when I was setting up my bibliography, I had to manually input them one by one in the COinS generator form. I suspect that what you are experiencing might be due to incomplete metadata, but cannot be sure at all.
Anyway, I agree that all of this needs to be simplified further. Not taking away from Zotero’s achievement, nor of COinS, but still.
If you have to do this more than once (or with a lot of data), a web-based reference management system might make sense. refbase is free/open source & automatically creates a COinS entry and unAPI+MODS for every single reference & seems to be compatible with Zotero and LibX.
It can also automagically import from a number of formats (including RIS). It also lets you specify an SFX resolver for OpenURL links.
So, it might take the double-checking & manual generation of COinS out of the workflow.
Rick,
Thanks for the tip about refbase. I hadn’t heard of it before. I imported citations into that system but I can’t (yet) figure out how to specify an SFX resolver.
Steve, the current version of refbase allows the database admin to globally specify the base URL to an OpenURL resolver, but it doesn’t yet allow regular users to define user-specific OpenURL settings. So, at the moment, you’d need to setup your own refbase installation to provide for custom OpenURL resolvers.
refbase was build to allow individual researchers and/or scientific institutions to dynamically publish their bibliographies online, with links to full texts (if available or permitted). In that way, a dedicated refbase instance on your institution’s own server might be what you’re looking for.
Matthias, thanks for clarifying this feature.
Quick followup in case anyone has to repeat
the process of turning COinS into links to a particular SFX resolver.
Instead of refbase (as I suggested) or a text editor/regular expressions (as Steve used), one could simply use a COinS resolver in their browser.
So dust off your institution’s version of LibX or one of the various other OpenURL extensions/bookmarklets. Then just re-save the resulting page!
This gives a static page that uses the resolver specified in LibX (or whatever) for everyone else. Depending on the method you used to auto-link COinS, the COinS entries will remain too.