Wrapper shell script for pipelines

This shell script (also on Github) is a wrapper to run a command on all files from a set of directories, and output to directories with matching names. I often use it in pipelines, to batch process files in a project directory structure similar to the one used by ProjectTemplate.


Bam to wig conversion

Conversion from bam to wiggle format can be done using the rsem-bam2wig utility, which takes a sorted bam file as input. The syntax is rather simple:

rsem-bam2wig sorted-bam-input-filename wig-output-filename wiggle-plot-name [--no-fractional-weight]

The option “–no-fractional-weight” should be set if the bam file has not been generated by rsem.

To sort a bam file, use samtools:

samtools sort bam-input-filename sorted-bam-output-filename

bigWig to bedGraph to wig

I am currently analyzing ChIP-seq data from ENCODE, starting from bigWig files, which I have to convert to wig. Unfortunately, in my case, the bigWigToWig program from UCSC converts to bedGraph format. The reason why this is happening is somehow explained in this thread. Briefly, it is likely because the bigWig files were generated from a bedGraph and not a wig file. To be noted, UCSC also has a bigWigToBedGraph conversion program. One difference between the two programs is that bigWigToWig outputs bedGraph files with uniform step size, whereas bigWigToBedGraph outputs bedGraph files with variable step size, by combining consecutive steps that have the same value.

Anyway, I had to convert from bedGraph to wig. A Perl script by Dave Tang does the job, but it outputs wig files with a step size equal to 1 bp. Because such files are unnecessarily big and for some other reasons, I wanted to be able to specify the step size. So, I wrote a new bedGraph to wig converter, inspired by Dave Tang’s.

The script is written below and can also be found on GitHub. Step size is specified on the command line, and it has an option to skip steps with null value, in order to save space. I hope this is useful, and I obviously welcome any feedback.