Overview

The digitagCT distribution is part of the CracTools and its latest version can be downloaded here.

All files are self documented using the POD format and tools.

Installation

To install this module type the following:

perl Makefile.PL
make
make test
make install 

Usage

Once the installation is performed, the sotware binary 'digitagCT' should be available. However, one last step is required before you can use digitagCT software, you need to obtain an annotation file in GFF3 format. To do this, CracTools-core provide a script that is capable of such a thing using Ensembl Perl API, it is called buildGFF3FromEnsembl.pl. You can also provide your own GFF3 file, this format is detailed here: http://www.sequenceontology.org/gff3.shtml.

Note that a supplementary attribute 'type' for mRNA features is required in digitagCT. This attribute represent the type of the mRNA, it could be either a 'protein_coding' or any other string. If it is not 'protein_coding' the mRNA will be considered as 'non_coding'. A subtype can be precised using a ':' colon separator (example : protein_coding:pseudogene).

Input file formats

  • --gff . A GFF3 file format with annotation
  • --rna-seq . A SAM file from a mapper (preferably CRAC)
  • --sage . A TSV file (Tabulation-Separated Values) with 4 required columns (from transcriRef/SAGE génie DB)
  • --tar . A bed file with information about tiling arrays (built from UCSC)

More information about the 8 columns of the TSV file:

  1. the tag sequence
  2. number of occurences of the tag
  3. name of the library
  4. the total of sequences in the library

Examples

This software is ditributed with some example data files (called 'toys') in folder ./extra/, in order to quickly try the software. Once you have installed the program following instruction provided in section "INSTALLATION" you will be able to launch the software "digitagCT". For more information run digitagCT --help or digitagCT --man

  • Generate annotation for DGE tags digitagCT extra/toyDGE.sam --gff file.gff --summary summary.txt (Add "ANNOTATION_GFF file.gff" in ~/CracTools.cfg in order to simplify digitagCT command lines.)

  • Cross DGE tags with RNASeq data digitagCT extra/toyDGE.sam --rna-seq extra/toyRNASeq.sam

  • Cross DGE tags with SAGE genie file digitagCT extra/toyDGE.sam --sage extra/toySageGenieFile.csv

  • Cross DGE tags with tiling arrays digitagCT extra/toyDGE.sam --tar extra/toyTAR.bed.gz

Note that you can combine the three previous "crossing options" as you want.

Output file format

According to two level of annotation (process A for protein_coding and process B for non_coding), digitag generate a tsv file with the following columns:

  1. the tag sequence
  2. number of occurences of the tag
  3. the annotation process A of the tag
  4. the gene name (HUGO Gene Nomenclature Committee)
  5. the gene type (protein_coding)
  6. Ensembl ID of the Gene
  7. chr of the tag sequence
  8. location of the tag relative to the chr
  9. strand of the tag
  10. the annotation process B of the tag
  11. the gene name (HUGO Gene Nomenclature Committee)
  12. the gene type (non_coding)

Other columns about RNA-Seq, TranscriRef and Tiling features are added when the non-mandatory arguments are specified (respectively --rna-seq, --sage, --tar) .

Dependencies

This package requires these other programs, modules and libraries* :

  • CracTools-core
  • perl 5.1 or +
  • strict
  • warnings
  • Carp

Notice that almost required modules/libraries are standard.


Auteurs/Authors: Jérôme AUDOUX jerome.audoux@univ-montp2.fr, Alban MANCHERON alban.mancheron@lirmm.fr and Nicolas PHILIPPE nicolas.philippe@inserm.fr