Citations in Jupyter Notebook, Lab and output documents (various formats)

Jupyter Notebook, Lab, document format, tools etc.


Post Reply
User avatar
Paul Hockett
Posts: 22
Joined: Mon May 25, 2020 5:17 pm
Location: Ottawa
Contact:

Citations in Jupyter Notebook, Lab and output documents (various formats)

Post by Paul Hockett »

Maybe I missed the obvious, but it seemed to me that citations - at least citations from existing Bibtex bibliographies in standard LaTex style - are a bit non-trivial in Jupyter. Here are a few methods that I've tried, but I'm still not totally satisfied, and my testing has also been pretty brief/cursory so far. It'd be interesting to hear what experience other people have had here... and I will update this doc as things progress.

Why would one want to do this...? There are probably mulitple usage cases, but for myself I am interested mainly in

  1. including citations directly in notebooks,
  2. writing standard scientific papers using notebooks (or, perhaps more precisely, using contents from notebooks as the main source for papers, with the potential of some post-processing).

In both cases I'd like to have citations in text, and a bibliography section, as per a standard manuscript and, ideally, directly in Jupyter notebook/lab (i.e. display without any post-processing). More on this at the end.

Image
Image from the ipypublish docs

(To be clear, in this post Jupyter notebook and lab refer to the Jupyter environment or interface, running in a web browser, while notebook or document or source used generically refers to a .ipynb computational notebook file, which can be rendered/viewed by multiple environments, and exported to other formats. Since some extensions work within a Jupyter environment, they may only work for notebook or lab. Methods which post-process notebook files are environment agnostic. For more background and a general introduction, see the project Jupyter website. Note that notebooks support Markdown as standard, and also process and displays LaTex maths via MathJax in browser - other LaTex (usually) requires post-processing.)

Unless otherwise stated, the test machine was running Ubuntu 18 LTS, with Firefox v76.0.1, Jupyter Notebook v6.0.3, Python v3.7.6 (in an Anaconda virtual env.).

Manual Markdown citations

The obvious answer for just dropping a couple of items into a notebook document is to just manually code the Markdown. For stand-alone notebooks, with just a couple of citations, this is pretty easy, and probably the quickest thing to do - but is a pain if you want to pipe things in from a Bibtex file and/or automate any part of the citation process.

For some more detailed notes on this:

Pros

  • Quick and easy.
  • OK for small docs.

Cons

  • No bibtex.
  • No automation.

Cite2c

Cite2c is a handy tool for pulling references from Zotero into a notebook. It supports citations in line, and a bibliography, in notebooks. (This works in a similar manner to the Nbconvert method below, with custom HTML tags included for the citations.) This is a notebook extension which requires installation, and provides cite and bibliography toolbar buttons once installed.

Pros

  • Quick and easy.
  • Looks good in notebook (I didn't yet test in Jupyter lab - likely it is not supported however).

Cons

  • No control over citation style (at least not obviously - likely there is a way if one goes deep here).
  • No references in output when processed to PDF. (EDIT: there's some notes for getting this working here, via a custom nbconvert template.)
  • Requires Zotero account.

Nbconvert

Citations are supported directly for LaTex + Bibtex via nbconvert. Nbconver is the standard tool for converting notebooks to other (static) formats, e.g. HTML or PDF, so this is very much a method for post-processing notebooks.

This uses HTML style formatted citations in the source notebook, e.g. <cite data-cite="citation">(manual label)</cite>, which are replaced in the nbconvert-processed output with standard latex \cite{citation} commands. Note there is also a template file required for this processing to work.

Once set-up, the notebook is converted in the usual way, with the required settings passed explicitly

jupyter nbconvert --config ipython_nbconvert_config.py

(See the Github repo for an example config file.)

Pros

  • Works, agnostic for notebook/lab.
  • Fairly transparent.
  • Using the same method should also allow for direct use of \cite{} in raw latex notebook cells (I didn't test this however).
  • OK for all notebook formats that nbconvert supports.
  • For more control (with more effort), one can convert to .tex and then use the standard LaTex tool-chain. For a half-and-half solution, with LaTex control via nbconvert templates, try the article Making publication ready Python Notebooks, see also the nbconvert docs for more info - this is likely more effort than most people will want to make however!

Cons

  • In the original notebook, citations use only the manually set label text, so may not match output format (if, e.g., it's set for automatic numerical refs.)
  • No bibliography in source notebook (unless set up manually).
  • There's a bit of set-up required on the back end, specifically a template file plus configuration options, which includes the path to the .bib files. (There is probably an easy way to stream-line this however.)

(Some) latex envs in Jupyter notebooks

This is a very handy notebook extension which provides enhanced LaTex support.

Pros

  • Full LaTex style citations directly in the notebook.
  • Easy to use (includes new menus in notebook).
  • Output latex seems as advertised in brief testing.

Cons

  • Doesn't support Jupyter Lab.
  • In brief testing, I couldn't get the Bibliography to render in a notebook, although citations appeared correctly in notebook, and the output LaTex looked OK (but required manually adding LaTex bibliography calls). This might be a machine/browser issue (testing on Ubuntu 18 LTS, with Firefox v76.0.1, Jupyter Notebook v6.0.3, Python v3.7.6 - this last may be a known issue).

Other methods

A few other approaches I've stumbled over, but not yet thought further about or tested...

Other tool-chains/languages

Large docs/books/other docs from Jupyter notebook source

  • ipypublish:
    • "A package for creating and editing publication ready scientific reports and presentations, from Jupyter Notebooks."
    • "Combining features of the Jupyter Notebook, WYSIWYG editors, Latex document preparation system and Sphinx HTML creation"
    • This is a full tool-chain for post-processing notebooks, including multiple formats, via Pandoc.
    • Looks very powerful for use-case (2), but a little bit of custom stuff to learn.
    • Possibly replaced (or just supplemented?) by Jupyter Book project? See here for discussion, and notes below.
  • Jupinx:
    • "a build system for lectures."
    • "Jupinx is an open source tool for converting ReStructuredText source files into a website via Jupyter Notebooks"

This last sounds generally very promising for any large project documentation, incorporating both computational and scientific manuscript-style content, although might be overkill for stand-alone manuscripts:

The goal of the EBP is to build tools that facilitate creating professional computational narratives (books, lecture series, articles, etc.) using open source tools. We want users in the scientific, academic, and data science communities to be able to do the following:

  • Write their content in either markdown text files, or Jupyter Notebooks. These files include rich content - outputs from running code, references and cross-references, equations, etc.

  • Execute content and cache the results. Intelligent caching means that only modified code cells are re-run.

  • Combine cached outputs with content files with a document model. Using the excellent Sphinx documentation stack, documents can include many features for publishing, such as equations, cross-references, and citations.

  • Build interactive HTML or publication-quality PDF outputs. Sometimes users wish to create rich and interactive websites, other times they want to send a high-quality PDF to a publisher. This system will treat both as equal citizens.

  • Control everything above with a simple command-line interface. Most users should not have to know anything about Sphinx, caching, etc. A simple user interface will hide most of the complexity of this process.

(Possibly) Deprecated Tools

Summary

There's still some work/testing for me to do here, but I think that (some) latex envs notebook extension is the tool closest to what I want to use for the most part (use-cases as listed above) - user-friendly and with all the key features, apart from Jupyter Lab support. Hopefully a bit more testing will make clear if this is the best tool. I'm also planning to test the ipypublish and/or Jupyter book frameworks to see what they offer above and beyond the basics of including citations.


Post Reply