Automate GeneTrail with iMacros

GeneTrail is a comprehensive web-based application to perform Gene Ontology and pathway analysis, with a user-friendly graphical interface. It is not possible, however, to access it programmatically, which can be cumbersome when processing more than a few datasets. It is nevertheless possible to automate interactions with the graphical interface using iMacros, which works as an add-on to web browsers such as Firefox.

iMacros is very simple to use:

  1. Once installed, open iMacros in Firefox.
  2. Click on “Record” and perform all the tasks you normally do on GeneTrail, including upload and download. Stop recording when done.
  3. The macro is now saved and you should see it in the iMacros panel. Right click on it and edit as follows:
    • Locate the name of the file to be uploaded and replace it by a new file name.
    • Add “WAIT SECONDS=120″ before the line starting with “ONDOWNLOAD”. If necessary, replace 120 with a number of seconds greater than the time GeneTrail takes to process results, so iMacros waits long enough before to try downloading.
    • Edit the line starting with “ONDOWNLOAD” to indicate the folder and file names (e.g. “ONDOWNLOAD FOLDER=/home/sebastien/imacrosresults WAIT=YES”). Delete other lines starting with “ONDOWNLOAD”, if any.
  4. Click on play. To replay with different upload and download file names, edit the script as indicated above.

iMacros can also be called from the command line, and it is possible to automate further, both on Linux and Windows.


Running David from R

David is an online resource to perform Gene Ontology analysis. Analysis can be done graphically on the website or programmatically, using David Web Service. The following snippet illustrates how to query David from R, using the Bioconductor “RDAVIDWebService” package.

Get directory of running script

In shell scripts, it is often useful to know which directory the script is running from. Here is a way to get this information:

curdir=$(cd -P -- "$(dirname -- "$0")" && pwd -P)

“$0” is the script name. “dirname” returns the corresponding directory. “cd” changes to that directory, which becomes the working directory. “pwd” returns it, so it can be assigned to the variable “curdir”. The option “-P” tells to only use the physical directory structure and to not follow symbolic links. Note that “cd” is encapsulated using the “$(…)” syntax, so it does not change the working directory for the rest of the script.

Bam to wig conversion

Conversion from bam to wiggle format can be done using the rsem-bam2wig utility, which takes a sorted bam file as input. The syntax is rather simple:

rsem-bam2wig sorted-bam-input-filename wig-output-filename wiggle-plot-name [--no-fractional-weight]

The option “–no-fractional-weight” should be set if the bam file has not been generated by rsem.

To sort a bam file, use samtools:

samtools sort bam-input-filename sorted-bam-output-filename