Batch tsv to csv conversion

I recently had to convert tables from tsv to csv format and found several ways to do it in this thread from StackOverflow, including the tsv2csv.py Python script below:

import sys
import csv

tabin = csv.reader(sys.stdin, dialect=csv.excel_tab)
commaout = csv.writer(sys.stdout, dialect=csv.excel)
for row in tabin:
    commaout.writerow(row)

Here is a simple wrapper Bash script to run the conversion in batch:

for file in *.tsv
do
    python tsv2csv.py < $file > ${file%.*}.csv
done

Bam to wig conversion

Conversion from bam to wiggle format can be done using the rsem-bam2wig utility, which takes a sorted bam file as input. The syntax is rather simple:

rsem-bam2wig sorted-bam-input-filename wig-output-filename wiggle-plot-name [--no-fractional-weight]

The option “–no-fractional-weight” should be set if the bam file has not been generated by rsem.

To sort a bam file, use samtools:

samtools sort bam-input-filename sorted-bam-output-filename

bigWig to bedGraph to wig

I am currently analyzing ChIP-seq data from ENCODE, starting from bigWig files, which I have to convert to wig. Unfortunately, in my case, the bigWigToWig program from UCSC converts to bedGraph format. The reason why this is happening is somehow explained in this thread. Briefly, it is likely because the bigWig files were generated from a bedGraph and not a wig file. To be noted, UCSC also has a bigWigToBedGraph conversion program. One difference between the two programs is that bigWigToWig outputs bedGraph files with uniform step size, whereas bigWigToBedGraph outputs bedGraph files with variable step size, by combining consecutive steps that have the same value.

Anyway, I had to convert from bedGraph to wig. A Perl script by Dave Tang does the job, but it outputs wig files with a step size equal to 1 bp. Because such files are unnecessarily big and for some other reasons, I wanted to be able to specify the step size. So, I wrote a new bedGraph to wig converter, inspired by Dave Tang’s.

The script is written below and can also be found on GitHub. Step size is specified on the command line, and it has an option to skip steps with null value, in order to save space. I hope this is useful, and I obviously welcome any feedback.