An efficient and effective research environment
So, I would like to share the environment that I have created for the purposes of doing research. Specifically it is an environment that allows me to:
- Gather research papers,
- Comment on them in various ways, and review these comments at large,
- Store this information in source control for the purposes of sharing between my machines, and
- Write up and deal with ideas in a systematic fashion.
So, the perhaps the first component of this system is, what operating system? Happily, it doesn’t exactly matter. I use Ubuntu, but I also use Windows 7. The great thing about this scheme is that it is adaptable to any environment that runs the tools (well, obviously) and the tools all have multi-platform support.
I personally find editing in vim nicer on Ubuntu, and there is one or two arguably minor things that linux has that Windows does not (XMonad, for example), but I will elaborate on these later.
The general scheme
This approach is, obviously, tailored specifically for me, and given that I have a significant programming background, I am happy to solve some problems with actual programming. I also quite enjoy “systematic” approaches; i.e. things that may not neccessarily be the most user friendly or easy, but ones that follow a specific and clear pattern that makes logical sense.
This approach may not suit everyone, but hopefully there are at least interesting and useful ideas here that you could adapt to your own situation.
The beginning – reference management
So, of course in order to gather research papers it is neccessary to store them in a useful way. JabRef is free, and is a very nice option for this. I’ve described my custom JabRef set up on my own personal blog a few months ago, please read that for how to do this.
One thing on using JabRef is that sometimes you need to correct the format that bibtex exports give you. For example, one thing I am often changing is lists of authors like: “Billy Smith, Joey Anderson” to “Billy Smith and Joey Anderson”
It’s not immediately clear to me why the bibtex are generated the wrong way from some of these sites, but nevertheless. This simple correction is neccessary for the data to be stored properly, and the authors to be picked up correctly, etc.
Okay, now that you’ve read that, you understand that I save all my PDFs to a specific location, a folder like “…/research/reference library”. Where is this folder? It’s in the research repository.
The “research” repository
I keep all my research, on any topic, in one generic folder, called “research”. This is a private git repository hosted on BitBucket.org. I chose bitbucket over github because bitbucket has free unlimited-space private repositories, while githubs cost money. It is neccessary for the research repository to be private for two reasons, one obvious one is that it contains paywall-restricted PDFs, and the other is that it’s just not appropriate to have in-progress research notes be viewable by anyone.
So, the general structure of my research repository is as follows:
~/research /articles /conferences /diary /jabref /reference library /projects /semigroups /... /quantum-lunch /...
The contents of these folders are as follows:
~/research/articles – Articles
This contains folders that map directly to papers that I’m trying to write (or more correctly at the moment, scholarships that I’m applying for, and misc notes). These are all, unsurprisingly, LaTeX documents that I edit with Vim.
When I complete an article, I created a “submitted” folder under the specific article, and put the generated PDF in there, but up until that time I only add the non-generated files to source control (this is the generally correct practice for using any kind of source control; only “source” files are included, anything that is generated from the source files is not).
~/research/conferences – Notes on conferences
In here I have folders that map to the short conference name, for example “ACC” for Australian Control Conference. Under that, I have the year, and within that I have my notes, and also any agenda’s that I may have needed, to see what lectures I would attend. The notes should be in vimwiki format (I will describe this later) for easy/nice reading and editing.
~/research/dairy – Research diary and general ideas area
This is the main place I work on day-to-day. It contains all my notes, and minutes from various meetings and lectures I attend. It contains a somewhat-daily research diary, and a list of current research ideas, past research ideas (that were bad, and reasons why) and so on.
My preferred note taking form is vimwiki (to be described below), so in here are purely vimwiki files.
It’s not essential that you also use vim (and hence vimwiki), but it is appropriate that whatever mechanism you use, it is a format that is ameneable to source control (i.e. allows nice text-based diffs). Emacs or any plain-text editor will be sufficient here.
~/research/jabref – Bibtex files
This is perhaps not the most appropriately named folder, but nevertheless. It contains all my .bib databases. I actually only have 3. One is very inappropriately called “2010.bib”, with the view that I would store research by the year I gathered it. I’m not following this approach and I actually just keep all my research related to quantum computing (and more general subjects) in here.
I have two other bib files, one is related to a secondary field of that I am interested in researching. That is to suggest, in 2010.bib I have only documents related to quantum computing, theoretical physics and some theoretical computer science. I have a different .bib for research in completely seperate fields, say investment. The other is “lectures.bib”, and it is obvious what that contains.
It’s worth noting that I actually have two systems for storing lectures. One is the above, where the lecture set fits into a nice single PDF. The other is when the lecture series is split over several PDFs. These I store under a generic “University” repository that I use for my current studies (including assignments and so on). This component of my current setup needs work, and I’m open to suggestions here.
~/research/reference library – All the PDFs
Every PDF releated to my research is stored in here, prefixed with the bibtex key. So, for example I have “Datta2005, 0505213v1.pdf”. JabRef associates this with the relevant item in the .bib file by the prefix, and virtue of this link in the .bib I have a trivial way to programmatically (if I’m so inclined) get this PDF.
I don’t ever browse this folder directly, and currently it contains ~600 PDFs and is about 1 gig in size. Storing this much data in a git repository may offend some people, but essentially they are wrong to be offended. It is okay to store binary files as long as they are not constantly changing, and they are considered a key component of the repository; which in this case they are.
There are some downsides to this, though, and I think it’s plausible to consider alternative arrangements.
The viable alternatives are:
- Dropbox for PDFs,
- Secondary git repository for PDFs only, and include as submodule, or
- Some other non-local storage (say, the one offered by Zotero).
I dislike all of them because for me I prefer to have everything together. I could see dropbox being suitable, because technically it’s not neccessary to have verioned PDFs.
If you have any comments on this, please share them.
~/research/projects – Specific Projects
You’ll notice I have one folder here, “semigroups”. This relates directly to a research scholarship I completed at RMIT. This actually involved a python program, which I have in this directory, as well as some miscellaneous files. It may be appropriate to have nicer codenames for projects, or somehow related them directly to the scholarship details. I think the best approach here is to have a codename, which is detailed in the “diary” folder, and then there is no risk on confusion or duplicate names. The scholarship details could be held seperately in the folder, because perhaps the work could be continued across scholarships.
Anyway, it’s probably not neccessary to overwork this structure. It can always be changed, and it shouldn’t be prohibitively difficult.
~/research/quantum-lunch – Files related to my reading group
This folder is indicative of the other types of folders you may find in this directory. In here, I have some misc python scripts related to this group. There are no notes in here, they are kept in the “diary” folder.
Technically this should be a transition area, where scripts and programs that reach an appropriate level of maturity/usefulness are either published publically (in a different repository), or moved to an appropriate folder under project, but I’ve not yet gotten to that stage.
It’s worth noting that I do have a public github profile: silky, underwhich I will, and do, publish and tools that are worthwhile being public. If one of these projects reaches that stage, I’d essentially move it out of here (delete it) and continue it in that area.
The tools
So, with the repository layout described, let me know discuss the tools I use. We’ve already covered JabRef, for reference management, so we have:
- JabRef (as mentioned), for reference management,
- Vim + Vimwiki plugin, for taking notes, keeping ideas, and writing LaTeX,
- Okular, for reading PDFs, and annotating them [linux],
- Python, for programming small scripts,
- XMonad, for window management [linux], and
- pdflatex and bibtex, for compiling latex (from the TeXLive distribution).
So, almost all of these are available on any platform. Okular is worth commenting on, because it has an important feature that I make use of – it stores annotations not in the PDF but in a seperate file, that can then be programmatically investigated. If you can’t use okular, then you may find that your annotations to PDFs are written back into the PDF itself, and it will be difficult to extract this. You can decide whether or not this bothers you when I describe how I use my annotations.
I will now describe the usage pattern for the various tools, starting in order for easiest to hardest.
Tools – Okular
So, install okular via your favourite method, say “sudo apt-get install okoular”, and then open it. You will want to make it your default PDF editor, and I also choose to have it be very minimal in it’s display; setting the toolbar to text only, hiding the menu, and hiding the list of pages on the left. I also configured a shortcut for exiting, namely pressing “qq”.
For me this is indicative of an important principle – make small customisations that improve your life. It’s worth thinking about, as they can often be trivial, but provide a nice noticable benefit.
You will also want to enable the ‘Review’ toolbar. This allows you to highlight lines of interest, and also add comments. Your comments are saved in a location like:
~/.kde/share/apps/okular/docdata/[number].[pdf-name].xml
This is where it gets fun. I’ve written a program to capture these comments, as well as comments in the ‘Review’ field of the .bib file. This tool is available on my github: get-notes.
You may need to adjust the ‘main.conf’ to suit your needs, or even change the source in some fashion. The code is pretty trivial, but requires some python libraries that you can install with easy_install.
This file products vimwiki output (you can trivially change this however you like, if you program in python). I then symbolically link this generated file (“AutoGeneratedNotes.wiki”) to my “~/research/diary”. Of course, following the general strategy of not including generated files in the source code, I do not break this rule for this file. There is one perhaps obvious downside to this: The output might be different on different machines, because the ~/.kde/… folder is not under source control. I consider this acceptable, because this file is a “transitional” file, in that it is not supposed to be for long-term storage of ideas and comments.
The contents of this file should be reviewed, occasionally, and then moved into either a PDF of comments, or into the research diary for an idea to investiage, or removed because you’ve dealt with it.
For example, I have a comment in the “Review” field of the file “Arora2002”. It says: “Has a section on the Fourier Transform that might be interesting”. This should, eventually, be transitioned into a minor topic to investigate further, or a small writeup in a private “Comments on things” LaTeX document, where you write up, slightly more formally, and with maths, your thoughts on things you’ve learned. I have this document under my “articles” folder.
With this in mind, it is then not an issue that the generated output is different bewteen machines, because ideally there will be no output on any machine, one it has been sufficiently transitioned.
Tools – LaTeX
As indicated, I use LaTeX to write up maths and more detailed notes, proposals, applications, etc. You may wish to use some front-end for LaTeX authoring, for example LyX, but as I already do a lot of work in vim, I prefer to also do LaTeX in here. If I were to switch to another editor, it would probably be Emacs.
Tools – Python
As mentioned in the above comment, I use python to write small scripts. Because they’re in python, they are essentially directly runnable on any system (provided the associated packages can be installed).
I also like python because it provides various packages that are actually useful for my research (like, for example, numpy). You can get similar funcionality from free Math environments, though, such as Octave.
Tools – XMonad
XMonad is not particularly neccessary for this workflow, but I include it because I find it’s ease of use aids in efficient reading and editing. I don’t want to go into significant detail of XMonad configuration (but it’s a fun way to spend your time), you may simply review my XMonad configuration on github.
What I like about it is the concept of focus. You can simply and easily make a PDF full screen, for distraction-free reading, and then switch things around to have vim side-by-side for commenting with context.
Feel free to disregard this, if you are using Windows, as it’s equally possible to do fullscreen and side-by-side editing in Windows 7. XMonad also offers other benefits for general programming, which is the main reason I have it.
Tools – Vim + Vimwiki + LaTeX Box
Essentially the last item in my setup is Vim. It’s hard to express the level of obsession one has for Vim, after a while of using it. It is highly customisable, and includes and inbuilt help system, which I used all the time, when initially learning it.
Most people will find Vim initially difficult to use (I did, when I first learned it when starting work here), but if you dedicate a few days to using it correctly, and you make significant use of the “:help [topic]” command, you will get the hang of it.
You aren’t truly using Vim correctly (or, indeed, living a full life), if you don’t get various plugins. The neccessary ones for LaTeX + note taking are: Vimwiki and LaTeX Box or Vim-LaTeX.
You can find the current state of my Vim configuration, again on my github – .vim
I actually currently use Vim-LaTeX, but I am planning on changing to LaTeX Box because it is more lightweight, so I would recommend starting with LaTeX Box.
The nice thing about using okular is that you can recompile your LaTeX document with the PDF open, and it will refresh it, keeping the current page open. This is very useful when typing long formulas, and reviewing your work.
I have configured Vimwiki to open with “rw”, so I can type this at any time, in Vim, and be looking at the index to all my research notes. In this I have links to all my diaries, my storage spots for old search ideas, and a big list of topics to look into. I also make “TODO” notes in here, and review them with one of my other tools, “find-todo” (on the aforemention github, under /utils). This gives me a list inside Vim, and I can easily navigate to the appropriate file. Again, the TODO’s are items that should be transitioned.
Review
I have documented my reseach environment, as it stands currently. It allows me to make notes easily, transition them in an appropriate workflow, and access all my documents at any time, from any computer.
The proof of a good research environment obviously in the blogging, it’s in the producing of legitimately good research output, and of course that’s yet to be delivered (by myself personally), so it’s not possible to objectively rate this strategy for it’s actual effectiveness. Nevertheless, I do feel comfortable with this layout; I feel like I can take the appropriate amount of notes; I feel my notes are always tracked, and I feel that I have a nice and readable history of what I’ve done. I like that I can track bad ideas; I like that I can make comments “anywhere” (i.e. in Okular or in JabRef) and have them captured automatically for later review, and I like the feeling of having everything organised.
I hope this description has been useful, and I would love to hear about any adjustments you’d propose, or just your own research strategies.
— Noon
Great blog. Thank you very much for sharing this.