These scripts are written in the Ruby programming language. You can view more information here.
You must have the Ruby programming language installed to use these scripts. Type
ruby -v into the prompt to make sure you have Ruby installed.
Many of these scripts also require the BioRuby gem to be installed. You can find information about installing the BioRuby gem, view the documentation here.
Splits a fasta file into smaller fasta files.
Usage: split-fasta.rb [options] file1 file2 ...
-v, --verbose Print more info
-l, --log Log information to file
-r, --records [NUM] Number of records to get (defaults: all)
-n, --per-file [NUM] Number of records per file (default: 100)
-o, --offset [NUM] Number of records to skip (default: 0)
-d, --dir [DIR] Output dir (default: out)
-p, --print Print reads
-t, --test Test file lengths afterward
-c, --deep-test Test file contents afterward
-h, --help Display this screen
Example:
split-fasta.rb -n 1000 -d sample_split my_fasta_file.fasta
This example splits the fasta file into a series of fasta files containing 1000 reads each. The files will be in the directory 'sample_split'.
Use this script to submit multiple blast jobs to the PBS system.
Usage: multi-blast.rb [options]
-v, --verbose Print more info
-i, --input-dir [NAME] Input directory
-o, --output-dir [NAME] Output directory
--output-script Output script rather than submitting it
-e, --extension [NAME] Extension of fasta files [default: fna]
-p, --prefix [NAME] PBS job prefix [default: kartchner]
-g, --group [NAME] Queue to use [default: standard]
-q, --queue [NAME] Number of CPUs to use [default: rmaier]
-n, --num-cpus [NUM] Number of CPUs to use [default: 12]
-r, --ram [NUM] Amount of RAM to use [default: 23gb]
-c, --cpu-time [NUM] Amount of CPU time to use [default: 50:0:0]
-w, --wall-time [NUM] Amount of Wall time to use [default: 7:0:0]
-s, --start [NUM] Start Index [default: 0]
-z, --end [NUM] End index [default: 50]
-d, --db-path [PATH] BLAST Database path [default: /genome/nr]
-b, --base-name [NAME] Base name of FASTA files
--blast-path [PATH] Path of blast executable
-t, --test-only Test instead of submitting jobs
Example:
multi-blast.rb -i /gsfs1/xdisk/bmf/datasets/main -o /gsfs1/xdisk/bmf/results -b my_fasta_file_ -q windfall {0..9}
This example submits a series of blast jobs to the PBS system. This script should be used with files created with the split-fasta.rb script. The 'base name' (my_fasta_file_) is found by looking at your fasta files. Take everything up to the number part of the filename, and that is the base name. E.g., if you have my_fasta_file_1, my_fasta_file_2, ..., your base name is 'my_fasta_file_'.
Used to combine multiple blast output files into one file.
Usage: combine-blast.rb [options] file1 file2 ...
-v, --verbose Print more info
-l, --log Log information to file
-f, --output-file [FILE] Number of records to skip (default: 0)
-d, --dir [DIR] Output dir (default: out)
-h, --help Display this screen
Example:
combine-blast.rb -f output.blastx *.blastx
This example will copy all blastx files in the current directory into a single new file called 'output.blastx'. Make sure you have enough space to contain both the original and new copies of the data.
Reads in a fasta file and then rewrites the file
Usage: fasta-rewrite.rb [options]
-v, --verbose Print more info
-l, --log Log information to file
-o, --out-file [FILE] Output file
-d, --dir [DIR] Output dir (default: out)
-i, --in-file [FILE] Input file
-h, --help Display this screen
Example:
fasta-rewrite.rb -i input.fasta -o output.fasta
This example reads in a fasta file and writes each read to a new file. If your fasta file seems to contain errors, try using this script to remove them.
Counts the reads in a fasta file
Usage: fasta-stat.rb [options] file1 file2 ...
-v, --verbose Print more info
-c, --count Count reads in file
-h, --help Display this screen
Example:
fasta-stat.rb -c my_fasta_file.fasta
This example counts the reads in a fasta file.
Calculates Mean, Median, Mode, and Standard Deviation for FASTQ data
Usage: fastq-stat.rb [options] file1 file2 ...
-v, --verbose Print more info
-n, --num-records [NUM] Number of records to get (defaults: all)
-h, --help Display this screen
Example:
fastq-stat.rb my_fastq_data.fastq
This example generates statistics for the file 'my_fastq_data.fastq'.
Use this to submit a job to the PBS system without creating a separate script
Important: You must use absolute paths in your command when using this script. If you use relative paths, the PBS system may not be able to find your files.
Usage: qsub.rb [options] command
-v, --verbose Print more info
-o, --output-script Output script to file and do not submit
-n, --num-cpus [NUM] Number of CPUs to use
-r, --ram [NUM] Amount of RAM to use
-c, --cpu-time [NUM] Amount of CPU time to use
-w, --wall-time [NUM] Amount of Wall time to use
-t, --test-only Test and print script to stdout
Example:
qsub.rb -v -n 12 /bin/blastx -num_threads 8 -db /genome/nr -query /datasets/main/organism.fna
Submits the blastx command as a PBS job using 12 CPUs and printing verbose information regarding the operation of the script.
Filters FASTA data and provides detailed information about the process.
Usage: quality-filter.rb [options] file
-v, --verbose Print more info
-p, --print Print out sequence data in FastQ format
-l, --log Log the output
-b, --output-bad-seqs Output bad sequences
-d, --output-dir [DIR] Select output dir (default: out)
-o, --output [FILE] Select output file (default: output.fasta)
-r, --num-records [NUM] Number of records to get (default: all)
-h, --help Display this screen
Example:
quality-filter.rb -b -d output_dir -o quality-filtered.fasta
In this example, the -b switch causes the script to output the sequences that are excluded by the quality filter rather than discard them. The -d switch chooses the output directory, and the -o switch chooses the output file.