Citation management

How a 21st century researcher can imply their ideas are important by presenting them as extravagantly as possible: through 20th century infrastructure in the style of a 19th century gentleman scholar

Usefulness: 🔧 🔧 🔧
Novelty: 💡 💡
Uncertainty: 🤪 🤪 🤪
Incompleteness: 🚧

The genealogy of evidence is important and there are many important ideas about how we could track it, especially with advances in technology; However, this page is not about that propagation of certainty, but rather the shabby proxy, citations in actually-existing academic publishing.

In particular, here I answer for myself: How can I get my journal-ready citations in the 19th century-style format required by my journals with the greatest possible degree of modern possible convenience? Fast-forwarding citation technology itself all the way to the state-of-the-1940s art must fall to someone else with time.

After trying too many alternatives, at great cost of time, I have settled upon Zotero. This option is open source, powerful and hackable. It could be more user-friendly; but then the competitors set the bar so low that this is hardly a criticism. Slightly more user-friendly but substantially less hackable is Mendeley, a closed-source reference manager that I would not judge you for using. Since I have no patience for things that cannot be automated, Zotero is an easy winner for me; you may wish to try both.

All other options that I have tried are abysmal and I can say nothing but I told you so if you try them and they give you grief.

Zotero

Zotero has, as presaged, an API with which you can both read and write data.

It has an active community around it, and I don’t feel that I am locking my data away with an untrusted party if I rely upon it. (Of course, you can always try to migrate data around from anything to anything with BibTeX or one of the XML formats etc, but if you have URLs in there, or consort with foreigners who dare to have diacritics in their names, this usually leads to trouble). I can use the API to make changes that I couldn’t make manually, without worrying about that parsing nonsense. Moreover some other apps, such as Mendeley, already use the Zotero API, so you know that it’s not going to be a community of one playing with it.

Useful hacks

• Integrate with your existing folder of PDFs?

Richard Zach points out

… you keep your PDF directory synced across computers (e.g., if it lives in your Dropbox), linking the PDFs is just as good. If you add a PDF, Zotero will look up the metadata for you and add a reference to your database.

As a bonus, this means you don’t have to pay Zotero for storage. I pay them anyway because I want to support this project. Maybe I should just donate?

• For exporting references from Zotero to my plain text blog, I wrote a CSL file which renders my citations and bibliographies as Markdown and RestructuredText, which is good enough for the internet. This is a hack that works, but it could be better if you want to keep your citation keys consistent across internet and web pages.

• You could write a custom exporter for Zotero without lots of pain. However, writing code always produced some pain. Nonetheless, here are overview docs, detail docs, and all the code. One problem I have is writing to output formats with need a unique citekey reference to the items in the bibliography. Here is a simple example which deals with that citekey issue (albeit with an outdated version of the citekey system) Here is a soothing walk-through of the whole process.

• There is a Javascript API which you can access manually for one-off tricks, e.g. batch editing

To be evaluated, the currently active clients seem to be

or you you use zotfile to synchronise a folder full of attachments to your tablet. That’s what I do. It’s not perfect, but it’s easy and robust.

Installing Zotero

Easy with windows and mac; there are standard installers. A little more tedious with linux. retorquere’s repo of deb installers is a simple way for Ubuntu. There are none of the snap-app packages one would expect for ubunutu. However, there is a cross-platform flatpak zotero, which I have not tried yet.

flatpak install flathub org.zotero.Zotero
flatpak override --user --filesystem=/PATH/TO/ZOTEROFOLDER org.zotero.Zotero

tl;dr. There are many over-engineered solutions to get your zotero citations in a blog. I use pandoc markdown. Anyway, if you want a more fragile but fancier solution, there are other options below.

Better BibTex

Better Bibtex, a.k.a BBT, makes all the BibTex stuff in Zotero smoother, and since BibTeX also integrates with markdown via pandoc, this is a double win.

The biggest trick is that BBT makes sensible, user-accessible citation keys for your references, and if you use these consistently life will go well for you. Here are some citation key format strings for bibtex. (Of course, there is no guarantee your colleagues will agree on a sensible standard for citation keys, but that problem is perennial.)

Classic:

[auth:fold:nopunct][Title:fold:nopunct:skipwords:select,1,1][year]

which I think is approximately the same as [zotero:clean]

I currently use this one:

​[auth][veryshorttitle1_1][year]

I am fond of the following BibLaTeX export exclusions:

file,abstract,note,keywords,lccn

BBT has a Cite-as-you-write feature for integration with generic editors, including a GUI popup. I do not use this. It also has HTTP-pull export support, meaning that accessing an up-to-date URL is a matter of an HTTP request that looks like http://127.0.0.1:23119/better-bibtex/collection?/1/citation_management.biblatex to pull all the records in the citation_management collection and this works from my local copy of Zotero, so it’s fast and reliable compared to the cloud Zotero server API which depends upon my internet connection etc.

NB: the pull export works better if you set the hidden preference extensions.zotero.translators.better-bibtex.sorted to be true so that the reference sorting is consistent.

citr provides an interactive citation finder for RStudio that works with BetterBibTeX.

install.packages("citr")

Render plain-text bibliographies using a CSL style file

CSL is a citation rendering mini-language used by modern journals and software to express house style. And you can use it too.

This is a slightly weird way to get plain-text citations out; CSL is a system for instructing your citation software how to render citations in your word processor; but it can be forced to pump out normal text. Since it’s designed for rich formatting etc, but it is robust ad simple. There is a CSL editor online so this is easy-ish. One catch is that you might, like me, wish to get your BibTeX citation keys out to refer to them. But Bibtex keys are not accessible to CSL, so this doesn’t quite work.

• Here is my docutils/ReST style file, restructuredtext.csl, which renders citations as plain text with ReST markup, including anchor links.

It’s ugly, because it has to battle with a grumpy rich-text XML infrastructure to render plain text, but it gets the job done without any coding, and is robust against software changes.

• Similarly, here is my Markdown style file, markdown.csl, which, likewise, renders citations as plain text with Markdown markup, including anchor links.

• Emiliano Heyns has a BibTeX key CSL, which renders Emiliano’s preferred citekey \{LeDT06} (sadly not quite matching mine.)

CSL has a citation-label variable, but it doesn’t correspond to the bibtex keys generated by BBT, which is unsatisfactory.

Atom citation picker

For Atom+Zotero+Markdown, you could try zotero citation, which adds a citation picker to the atom editor. References look like

[$$Heyns, 2014$$](#@heyns2014)
[$$Heyns, 2014$$]([email protected],heyns2015)

They are rendered in the output by an in-built pandoc filter, which is installed separately. One locates the output bibliography in the document in either pandoc-YAML or plaintext format at

[#bibliography]: #

I don’t bother; BetterBibTeX works fine for my needs and across many editors.

Dynamic bibliography file generation

Erik Hetzner’s zotxt can avoid the need to create bibfiles, rendering bibliographies directly by querying the zotero app, rather than manually rendering an intermediate file. It feels less fragile to me to use Better BibTex Pull Export.

Pandoc markdown

• Chris Krycho’s pandoc-based workflow.

• Caleb McDaniel talks through the process of getting citation into near anything using pandoc; I have lots of links like this because, whilst it works, I can’t believe how painful it is, but I hope it will seem less painful if I read a lot about it. The wisdom of the ancients etc.

The preferred pandoc-citeproc format seems to be something with an @ sign and/or occasional square brackets

Blah blah [see @heyns_foo_2014, pp. 33-35; also @heyns_bar_2015, ch. 1].
But @heyns_baz_2016 says different things again.

This is how you output it.

# Using the CSL transform

pandoc -F pandoc-citeproc --csl="APA" --bibliography=bibliography.bib \
-o document.pdf document.md
# or using biblatex and the traditionalist workflow.

pandoc --biblatex --bibliography=bibliography.bib \
-o document.tex document.md

You will need to set up Preferences>export>Default format to be Better Bibtex citation key quick copy and set Better Bibtex to copy pandoc style.

See the pandoc manual and the pandoc-citeproc manual.

BibTeX

Oh, BibTeX

None of that faffing about is useful if you are working with academics, who don’t regard words on the internet as a real thing. Your words must be behind a paywall where no-one can read them to count as significant. Moreover, they must have been rendered harder to analyse by running them through LaTeX, to obfuscate them into a PDF, which probably also entails using BibTeX to do the citation stuff.

If you are starting from Zotero, you can use Better BibTeX to make this less painful.

To manage annoying BibTeX problems from BibTeX files themselves algorithmically I use bibtool.

BibTeX is ancient and has accreted deep strata of fossilised rules, but it does work if you move very carefully and don’t touch anything. Imitate the rituals of your predecessors and the gods will reward you.

BibLaTeX

a.k.a. biber, because the biblatex system is factored into a couple of distinct packages but you more or less need to get the lot to reap benefits. It looks similar to BibTeX but is better in various ways, mostly to do with being more modern, e.g. having less baffling misfeatures than BibTeX, full unicode support etc As such, it is a low key upgrade from the winding steeplechase of character set errors that is BibTeX. It is notable not better in the ecological sense of being widely used by the various technoligcally moribund conferences/journal to which you might want to submit your paper, because many such organisations believe that these accursed foreigners will eventually despair of their disconcerting languages and their filthy habits of smearing diacritical marks all over their names, if only we wait long enough.

BibLaTeX can handle non-English names and URLs, bringing it up to speed with 1999. You often need to use alternate biblatex styles made by passionate bibliography rendering fans to approximate some journal or other, e.g. here is an IEEE-like one.

Note that, unlike BibTeX, BibLaTeX is supported in Beamer slides.

The configuration options are manifold. Here is one set I like.

\usepackage[
style=authoryear,   % author-year style in references
citestyle=authoryear-icomp, % compact author-year in cites
date=year,  % No i don’t care what month
url=false,       % clickable urls in ref.
uniquelist=false,
uniquename=false, % turn off auto-disambiguate
backref=true,     % auto backrefs in ref.
datezeros=true,   % dates with leading zeros
maxcitenames=1, % et al. with two or more authors
%indexing,       % to create an index of persons
sortcites=false,
sorting=nyt,
%defernumbers=true,   % numbers in any bibliography
backend=biber]      % use biber for compiling
{biblatex}
\addbibresource{refs.bib}

These are confusingly documented in the manual, but obvious from the cheat sheet.

The citation command has fancy options.

\cite[see][page 12]{latexcompanion}

More generally, there is no reason not to use the plural version \cites and in fact since \cite makes assumptions about formatting, one should in fact use \autocites.

\autocites[152-169]{Smith}[252]{Jones}

Compare and contrast with the BibTeX config:

\bibliographystyle{unsrt}
\bibliography{refs}

and plainer cite command:

\cite{latexcompanion}

CSL

Gotcha: This will render nice citations, except with butt-ugly naked URLs instead of hyperlinks, because hyperlinks are not possible, and not even in scope yet. Please vote for those github issues.

Jabref, Bibdesk.

Past traumas

There have been various other options such as Papers (meh) and Sente (defunct due to being crappy) and (sigh) Endnote. I won’t link or refer to those further here, for the reason that I’ve already lost too much data that way, and I don’t intend to lose more. Since all citation software is, basically, awful, it is crucial that whichever application you choose, it is one that you can get your data out of it when you find a less awful option or when it implodes from awfulness. The one that is best at letting you keep all your citations even if you ditch it, is Zotero. Also it’s probably the least awful.

Still, if you’re unswervingly dedicated to trying other things for yourself, my advice for any closed-source tool in this domain would be the same: Try and see how well you import and export data, en masse, because that’s what you’ll have to do if the company goes bankrupt or gets bought by Google and shut down, or by Yahoo and accidentally set on fire, or by Facebook and you are only allowed to use it if you click on ads promoting sports shoes for 18 minutes out of every hour or whatever unpaid market research work they allocate to you.

All the alternatives apart from Mendeley and Zotero have failed the test of preserving my precious data when I migrated to a different software package, so using those other packages is putting my work in the uncaring hands of an unaccountable third party. To actually extract my data from Sente, for example, I had to burn a whole working week turning their malformed markup into valid XML, (which is a specialty that I don’t care about and no-one should ever need to care about) and I still couldn’t work out how to parse some of it. Then many other things went wrong. Also, Mendeley has started behaving suspiciously since they were bought by Elsevier.

If you mostly care about LaTeX output (which is a frequent the lowest common denominator amongst academic collaborators) one might be able to survive on Bibdesk or jabref or just editing a plain Bib(La)TeX file, but I for one could not bear to give up the browser integration of Zotero, which has saved innumerable hours of painstaking pointless typing, and can output BibTeX just fine.

🚧 Complain about the entire structure of citations in the electronic age (keep it short though, because everyone is tired of complaining about it, and at least it’s better than the general howling void of unsourced internet media.)

🚧 Apologise for accidentally complaining at length despite my stated aim of keeping it short. 😏

docutils citations

a.k.a. Citations in ReST.

I no longer recommend this. For all the laudable design goals and extensibility of ReST, it’s not where the community is. They are all using markdown.

But if you are keen, the docs say:

Standard ReST citations are supported, with the additional feature that they are “global”, i.e. all citations can be referenced from all files.