In shell scripts, it is often useful to know which directory the script is running from. Here is a way to get this information:
curdir=$(cd -P -- "$(dirname -- "$0")" && pwd -P)
“$0” is the script name. “dirname” returns the corresponding directory. “cd” changes to that directory, which becomes the working directory. “pwd” returns it, so it can be assigned to the variable “curdir”. The option “-P” tells to only use the physical directory structure and to not follow symbolic links. Note that “cd” is encapsulated using the “$(…)” syntax, so it does not change the working directory for the rest of the script.
This shell script (also on Github) is a wrapper to run a command on all files from a set of directories, and output to directories with matching names. I often use it in pipelines, to batch process files in a project directory structure similar to the one used by ProjectTemplate.
Conversion from bam to wiggle format can be done using the rsem-bam2wig utility, which takes a sorted bam file as input. The syntax is rather simple:
rsem-bam2wig sorted-bam-input-filename wig-output-filename wiggle-plot-name [--no-fractional-weight]
The option “–no-fractional-weight” should be set if the bam file has not been generated by rsem.
To sort a bam file, use samtools:
samtools sort bam-input-filename sorted-bam-output-filename
The script below (also on GitHub) converts a three-columns bedgraph-like tab-separated file to actual bedgraph format. The original three columns are “chr”, “start” and “value”. An “end” column is added by the script to make it a bedgraph format.
A common task I happen to do in R is merging tables (for instance, using gene symbol as a join) and renaming columns with a suffix to indicate their table of origin. Below is a function that automatizes most of the process, taking any number of dataframes as input. It is also posted on GitHub.
I am currently analyzing ChIP-seq data from ENCODE, starting from bigWig files, which I have to convert to wig. Unfortunately, in my case, the bigWigToWig program from UCSC converts to bedGraph format. The reason why this is happening is somehow explained in this thread. Briefly, it is likely because the bigWig files were generated from a bedGraph and not a wig file. To be noted, UCSC also has a bigWigToBedGraph conversion program. One difference between the two programs is that bigWigToWig outputs bedGraph files with uniform step size, whereas bigWigToBedGraph outputs bedGraph files with variable step size, by combining consecutive steps that have the same value.
Anyway, I had to convert from bedGraph to wig. A Perl script by Dave Tang does the job, but it outputs wig files with a step size equal to 1 bp. Because such files are unnecessarily big and for some other reasons, I wanted to be able to specify the step size. So, I wrote a new bedGraph to wig converter, inspired by Dave Tang’s.
The script is written below and can also be found on GitHub. Step size is specified on the command line, and it has an option to skip steps with null value, in order to save space. I hope this is useful, and I obviously welcome any feedback.
Ubuntu makes it easy to mount remote folders through samba or sftp, using the default file manager Nautilus. The location where these folders are mounted, however, is not obvious. I found an answer on this thread. On my system (Ubuntu 13.10), the location is: /run/user/1000/gvfs/. Much useful to know when working from the command line.