AoE GSPEI DataHub > Jbrowse > Analysis Pipeline

Analysis Pipeline

RNA-seq data analysis

We downloaded the fastq format of raw data from the NCBI database according to the accession number.

Quality Control

Firstly, the quality control process was conducted by trim_galore (version 0.4.3)¹.

trim_galore --three_prime_clip_R1 5 \
            --three_prime_clip_R2 5 \
            --clip_R1 10 \
            --clip_R2 10 \
            --phred33 \
            --illumina --fastqc \
            --length 40 --paired --gzip \
            --max_n 10 --trim-n -r1 50 -r2 50 \
            --retain_unpaired ${sample}_1.fastq.gz ${sample}_2.fastq.gz

RNA-seq Mapping

Then the data were mapped to correspond genome by hisat2 (version 2.0.4)².

hisat2 --dta --rna-strandness RF -q \
       -x ${genome_file} \
       -1 ${sample}_1_val_1.fq.gz \
       -2 ${sample}_2_val_2.fq.gz \
       -S ${mapped_results}

After mapping, the concordantly aligned reads were extracted and sorted by samtools (version 1.5)³.

Finally, the bigwig files were generated by deepTools (version 3.1.1)⁴.

bamCoverage --bam ${sorted_mapped_file} \
            -o ${output_folder} \
            --binSize 1

ChIP-seq and ATAC-seq data analysis

We downloaded the fastq format of raw data from the NCBI database according to the accession number. The quality control step was conducted as mentioned above.

ChIP-seq Mapping

Mapping was performed by bowtie2 (version 2.2.7)⁵.

bowtie2 -k 2 -q -x ${genome_file} \
        -1 ${sample}_1_val_1.fq.gz \
        -2 ${sample}_2_val_2.fq.gz \
        -S ${mapped_results}

The mapping results were also processed by samtools and deepTools.

bamCoverage --bam ${sorted_mapped_file} \
            -o ${output_folder} \
            --binSize1 \
            --effectiveGenomeSize ${genome_size}

WGBS data analysis

QC performed the same as above.

WGBS Mapping

WGBS data were mapped to correspond genome by bismark (version 0.15.2)⁶.

## Mapping
bismark --genome ${genome_folder} \
        -1 ${sample}_1_val_1.fq.gz \
        -2 ${sample}_2_val_2.fq.gz \
        -o ${sample_mapped_results}

## Deduplication
deduplicate_bismark -bam ${sample_mapped_results}

## Extract
bismark_methylation_extractor --cutoff 5 --gzip \
                              --bedGraph \
                              --cytosine_report \
                              --CX \
                              --genome_folder ${genome_folder} ${deduplicated_mapped_results}

Mapping result were converted to bigwig format and visualized by DNA methylation jbrowse plugin ⁷.

python allc_to_bigwig_pe_v3.py -sort -L=${sample} \
                               -o=${output_folder} \
                               ${length_of_each_chromosome} ${Methylation_level_of_each_cytosine}

Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10 (2011). ↩︎
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced alignerwith low memory requirements. Nat. Methods 12, 357–360 (2015). ↩︎
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). ↩︎
Ramírez, F., Dündar, F., Diehl, S., Grüning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014). ↩︎
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). ↩︎
Krueger, F. & Andrews, S. R. Bismark: A flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011). ↩︎
Hofmeister, B. T. & Schmitz, R. J. Enhanced JBrowse plugins for epigenomics data visualization. BMC Bioinformatics 19, 159 (2018). ↩︎