PDF is a terrible format, but it is the standard in academia, despite some perfunctory efforts to make like the rest of the world and get with ebook formats. That would be nice; but in the real world, ebook formats are the minority, and generally if you convert PDFs to ebooks, the equations turn into 💩, so this is only a solution for people who survive without equations or tables or graphs, which does not resemble my job description.
Now, how will I read all those PDFs and annotate them without losing track and going crazy? Bonus points if I can sync your annotations to your citation management software. More bonus points if I can also synchronise to a convenient e-reader so I don’t have to have my distracting laptop to read every sodding thing. Bonus points if the solution involves not putting all my notes in some obscure opaque commercial database with no guarantee of existing in perpetuity.
- Zotero can sync annotations and store PDFs with citation metadata conveniently. See citation management for the details. It knows how to capture and store journal article metadata really well. (open source)
- Calibre isn’t a general sync solution, but it does manage ebooks well, especially ones that are real books and have ISBNs etc. And it does synchronise with various ebook readers and convert to their local dialect of whatever. (open source)
If I only read books or I only read papers and I had time, I could probably hack one of these into being a general purpose document annotation-and-metadata-and-ebook-reader-and-desktop-synchronisation system. As it is, I swap awkwardly between two systems depending on, basically, whether the PDF I am reading is short (Zotero) or long (Calibre).
e.g. kindle fire A device for reading things.
How can I get my incoming articles on it? Also Built-in reader does not support PDF annotations. Sync to the device using a sync service of some kind, then read the PDFs using Acrobat? Maybe try xodo?
Paper reading and discovery
In recent years, a highly interesting pattern has emerged: Computer scientists release new research findings on arXiv and just days later, developers release an open-source implementation on GitHub. This pattern is immensely powerful.[…]
GitXiv is a space to share links to open computer science projects. Countless Github and arXiv links are floating around the web. It’s hard to keep track of these gems. GitXiv attempts to solve this problem by offering a collaboratively curated feed of projects. Each project is conveniently presented as arXiv + Github + Links + Discussion. Members can submit their findings and let the community rank and discuss it. A regular newsletter makes it easy to stay up-to-date on recent advancements. It’s free and open.
In terms of things that I will actually use, this source-code requirement idea is good.
Arxiv Sanity Preserver
Built by @karpathy to accelerate research. Serving last 26179 papers from cs.[CV|CL|LG|AI|NE]/stat.ML
includes twitter-hype sorting and other such flawed but important baby steps towards web2.0 style peer-review.
Keep track of arXiv papers and the tweet mini-commentaries that your friends are discussing on Twitter.
Because somehow some researchers have time for twitter and the opinions of such multitasking prodigies are probably worthy of note. I, however, will never contribute to such discourses. Anyway, great hack.
- Extract references and metadata from a given PDF
- Detect pdf, url, arxiv and doi references
- Fast, parallel download of all referenced PDFs
- Output as text or JSON (using the -j flag)
- Extract the PDF text (using the --text flag)
- Use as command-line tool or Python package
- Works with local and online pdfs