Trim Rna Seq Reads Based on Fastqc

Taking appropriate QC measures for RRBS-blazon or other -Seq applications with Trim Galore!

Last update: 02/07/2019

Table of Contents

  • Introduction
  • Methodology
    1. Quality Trimming
    2. Adapter Trimming
    • Auto-detection
    • Manual adapter sequence specification
    1. Removing Curt Sequences
    2. Specialised Trimming - hard- and Epigenetic Clock Trimming
  • Total list of options for Trim Galore!
    • RRBS-specific options
    • Paired-end specific options

Version 0.6.4

For all loftier throughput sequencing applications, we would recommend performing some quality control on the information, as it tin can often straight away betoken you towards the side by side steps that need to exist taken (e.g. with FastQC). Thorough quality control and taking advisable steps to remove problems is vital for the analysis of almost all sequencing applications. This is even more critical for the proper assay of RRBS libraries since they are susceptible to a variety of errors or biases that one could probably get away with in other sequencing applications. In our brief guide to RRBS we hash out the following points:

  • poor qualities – affect mapping, may pb to incorrect methylation calls and/or mis-mapping
  • adapter contagion – may pb to low mapping efficiencies, or, if mapped, may result in wrong methylation calls and/or mis-mapping
  • positions filled in during end-repair will infer the methylation state of the cytosine used for the backup reaction but not of the true genomic cytosine
  • paired-end RRBS libraries (especially with long read length) yield redundant methylation data if the read pairs overlap
  • RRBS libraries with long read lengths endure more than from all of the above due to the brusque size- selected fragment size

Poor base call qualities or adapter contamination are nevertheless merely as relevant for 'normal', i.e. non-RRBS, libraries.

Adaptive quality and adapter trimming with Trim Galore

Nosotros have tried to implement a method to rid RRBS libraries (or other kinds of sequencing datasets) of potential problems in i convenient process. For this we accept developed a wrapper script (trim_galore) that makes employ of the publicly available adapter trimming tool Cutadapt and FastQC for optional quality control once the trimming process has completed.

Even though Trim Galore works for any (base of operations space) high throughput dataset (due east.g. downloaded from the SRA) this section describes its use mainly with respect to RRBS libraries.

Pace one: Quality Trimming

In the offset step, low-quality base of operations calls are trimmed off from the 3' end of the reads earlier adapter removal. This efficiently removes poor quality portions of the reads.

image image
Before Quality Trimming After Quality Trimming
image image

Hither is an case of a dataset downloaded from the SRA which was trimmed with a Phred score threshold of 20 (data set DRR001650_1 from Kobayashi et al., 2012).

Step 2: Adapter Trimming

In the side by side step, Cutadapt finds and removes adapter sequences from the iii' terminate of reads.

Adapter auto-detection

If no sequence was supplied, Trim Galore will attempt to auto-detect the adapter which has been used. For this it will analyse the outset 1 one thousand thousand sequences of the starting time specified file and attempt to find the first 12 or 13bp of the post-obit standard adapters:

              Illumina:   AGATCGGAAGAGC Modest RNA:  TGGAATTCTCGG Nextera:    CTGTCTCTTATA                          

If no adapter contagion tin be detected within the beginning 1 million sequences, or in instance of a necktie betwixt several different adapters, Trim Galore defaults to --illumina, equally long every bit the Illumina adapter sequence was i of the options. If there was a tie between the Nextera and small RNA adapter, the default is --nextera. The motorcar-detection results are shown on screen and printed to the trimming report for hereafter reference.

Transmission adapter sequence specification

The machine-detection behaviour can be overruled by specifying an adapter sequence manually or by using --illumina, --nextera or --small_rna. Please annotation: the first 13 bp of the standard Illumina paired-end adapters (AGATCGGAAGAGC) recognise and removes adapter from most standard libraries, including the Illumina TruSeq and Sanger iTag adapters. This sequence is present on both sides of paired-end sequences, and is present in all adapters before the unique Index sequence occurs. And so for any 'normal' kind of sequencing you do not demand to specify anything but --illumina, or better yet merely utilise the auto-detection.

To control the stringency of the adapter removal process one gets to specify the minimum number of required overlap with the adapter sequence; else it will default to 1. This default setting is extremely stringent, i.e. an overlap with the adapter sequence of even a unmarried bp is spotted and removed. This may appear unnecessarily harsh; however, equally a reminder adapter contagion may in a Bisulfite-Seq setting atomic number 82 to mis-alignments and hence wrong methylation calls, or result in the removal of the sequence as a whole because of too many mismatches in the alignment process.

Tolerating adapter contamination is most likely detrimental to the results, but nosotros realize that this process may in some cases as well remove some genuine genomic sequence. It is unlikely that the removed bits of sequence would have been involved in methylation calling anyway (since only the quaternary and 5th adapter base would possibly be involved in methylation calls, for directional libraries). Still, it is quite likely that truthful adapter contamination – irrespective of its length – would be detrimental for the alignment or methylation call procedure, or both.

Before Adapter Trimming Later on Adapter Trimming
image image

This case (aforementioned dataset as in a higher place) shows the dramatic effect of adapter contamination on the base limerick of the analysed library, e.g. the C content rises from ~one% at the start of reads to around 22% (!) towards the end of reads. Adapter trimming with Cutadapt gets rid of near signs of adapter contamination efficiently. Annotation that the precipitous decrease of A at the last position is a result of removing the adapter sequence very stringently, i.e. even a single trailing A at the terminate is removed.

RRBS Mode

Trim galore! also has an --rrbs option for DNA material that was digested with the restriction enzyme MspI. In this style, Trim Galore identifies sequences that were adapter-trimmed and removes another 2 bp from the 3' end of Read 1, and for paired-terminate libraries besides the outset 2 bp of Read ii (which is equally afflicted past the backup procedure). This is to avoid that the filled-in cytosine position close to the second MspI site in a sequence is used for methylation calls. Sequences which were merely trimmed because of poor quality will non exist shortened any further.

Not-directional mode

Trim Galore! likewise has a --non_directional option, which will screen adapter-trimmed sequences for the presence of either CAA or CGA at the kickoff of sequences and clip off the offset ii bases if establish. If CAA or CGA are constitute at the start, no bases will be trimmed off from the three' stop fifty-fifty if the sequence had some contaminating adapter sequence removed (in this case the sequence read probable originated from either the CTOT or CTOB strand; refer to the RRBS guide for the meaning of CTOT and CTOB strands).

Step 3: Removing Short Sequences

Lastly, since quality and/or adapter trimming may result in very short sequences (sometimes as short as 0 bp), Trim Galore! can filter trimmed reads based on their sequence length (default: xx bp). This is to reduce the size of the output file and to avoid crashes of alignment programs which crave sequences with a certain minimum length.

Paired-End Data

Annotation that it is not recommended to remove too-brusque sequences if the analysed FastQ file is one of a pair of paired-end files, since this confuses the sequence-past-sequence order of paired-terminate reads which is once again required past many aligners. For paired-end files, Trim Galore! has an pick --paired which runs a paired-end validation on both trimmed _1 and _2 FastQ files once the trimming has completed. This step removes entire read pairs if at least one of the 2 sequences became shorter than a certain threshold. If merely one of the two reads is longer than the set threshold, e.g. when one read has very poor qualities throughout, this singleton read tin exist written out to unpaired files (see pick retain_unpaired) which may exist aligned in a single-end manner.

Applying these steps to both cocky-generated and downloaded data tin can ensure that you lot really only use the loftier quality portion of the data for alignments and further downstream analyses and conclusions.

Step 4: Specialised Trimming

Hard-trimming to leave bases at the 5'-end

The option --hardtrim5 INT allows yous to hard-clip sequences from their three' terminate. This choice processes one or more files (plain FastQ or gzip compressed files) and produces hard-trimmed FastQ files ending in .{INT}bp_5prime.fq(.gz). This is useful when you want to shorten reads to a sure read length. Here is an case:

              before:         CCTAAGGAAACAAGTACACTCCACACATGCATAAAGGAAATCAAATGTTATTTTTAAGAAAATGGAAAAT --hardtrim5 20: CCTAAGGAAACAAGTACACT                          

Hard-trimming to get out bases at the 3'-finish

The pick --hardtrim3 INT allows you to difficult-clip sequences from their v' cease. This option processes 1 or more files (plain FastQ or gzip compressed files) and produces hard-trimmed FastQ files ending in .{INT}bp_3prime.fq(.gz). Nosotros found this quite useful in a number of scenarios where nosotros wanted to remove biased residues from the outset of sequences. Here is an example :

              earlier:         CCTAAGGAAACAAGTACACTCCACACATGCATAAAGGAAATCAAATGTTATTTTTAAGAAAATGGAAAAT --hardtrim3 20:                                                   TTTTTAAGAAAATGGAAAAT                          

Mouse Epigenetic Clock trimming

The option --clock trims reads in a specific way that is currently used for the Mouse Epigenetic Clock (meet here: Multi-tissue Deoxyribonucleic acid methylation age predictor in mouse, Stubbs et al., Genome Biological science, 2017 18:68). Following the trimming, Trim Galore exits.

In it's current implementation, the dual-UMI RRBS reads come in the following format:

              Read one  five' UUUUUUUU CAGTA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF TACTG UUUUUUUU 3' Read ii  3' UUUUUUUU GTCAT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF ATGAC UUUUUUUU 5'                          

Where UUUUUUUU is a random 8-mer unique molecular identifier (UMI), CAGTA is a constant region, and FFFFFFF... is the actual RRBS-Fragment to be sequenced. The UMIs for Read 1 (R1) and Read 2 (R2), likewise as the fixed sequences (F1 or F2), are written into the read ID and removed from the actual sequence. Here is an example:

              R1: @HWI-D00436:407:CCAETANXX:one:1101:4105:1905 i:N:0: CGATGTTT     ATCTAGTTCAGTACGGTGTTTTCGAATTAGAAAAATATGTATAGAGGAAATAGATATAAAGGCGTATTCGTTATTG R2: @HWI-D00436:407:CCAETANXX:1:1101:4105:1905 3:North:0: CGATGTTT     CAATTTTGCAGTACAAAAATAATACCTCCTCTATTTATCCAAAATCACAAAAAACCACCCACTTAACTTTCCCTAA  R1: @HWI-D00436:407:CCAETANXX:1:1101:4105:1905 1:Due north:0: CGATGTTT:R1:ATCTAGTT:R2:CAATTTTG:F1:CAGT:F2:CAGT                  CGGTGTTTTCGAATTAGAAAAATATGTATAGAGGAAATAGATATAAAGGCGTATTCGTTATTG R2: @HWI-D00436:407:CCAETANXX:one:1101:4105:1905 3:North:0: CGATGTTT:R1:ATCTAGTT:R2:CAATTTTG:F1:CAGT:F2:CAGT                  CAAAAATAATACCTCCTCTATTTATCCAAAATCACAAAAAACCACCCACTTAACTTTCCCTAA                          

Following clock trimming, the resulting files (.clock_UMI.R1.fq(.gz) and .clock_UMI.R2.fq(.gz)) should be adapter- and quality trimmed with a second Trim Galore run. Even though the data is technically RRBS, information technology doesn't crave the --rrbs pick. Instead the reads demand to be trimmed by 15bp from their iii' end to become rid of potential UMI and fixed sequences. All this is achieved with this boosted trimming command:

trim_galore --paired --three_prime_clip_R1 fifteen --three_prime_clip_R2 xv *.clock_UMI.R1.fq.gz *.clock_UMI.R2.fq.gz

Following this, reads should be aligned with Bismark and deduplicated with UmiBam in --dual_index mode (run into here: https://github.com/FelixKrueger/Umi-Grinder). UmiBam recognises the UMIs inside this pattern: R1:(ATCTAGTT):R2:(CAATTTTG): as (UMI R1=ATCTAGTT) and (UMI R2=CAATTTTG).

Full listing of options for Trim galore!

USAGE: trim_galore [options] <filename(s)>

General options:

  • -h/--help

    • Impress this help message and exits.
  • -v/--version

    • Print the version information and exits.
  • -q/--quality <INT>

    • Trim low-quality ends from reads in addition to adapter removal. For RRBS samples, quality trimming will be performed first, and adapter trimming is carried in a second circular. Other files are quality and adapter trimmed in a single pass. The algorithm is the same as the i used past BWA (Subtract INT from all qualities; compute partial sums from all indices to the terminate of the sequence; cutting sequence at the index at which the sum is minimal).
    • Default Phred score: 20
  • --phred33

    • Instructs Cutadapt to use ASCII+33 quality scores equally Phred scores (Sanger/Illumina 1.9+ encoding) for quality trimming.
    • Default: ON
  • --phred64

    • Instructs Cutadapt to use ASCII+64 quality scores every bit Phred scores (Illumina one.5 encoding) for quality trimming.
  • --fastqc

    • Run FastQC in the default mode on the FastQ file once trimming is complete.
  • --fastqc_args "<ARGS>"

    • Passes extra arguments to FastQC. If more than one argument is to exist passed to FastQC they must be in the form arg1 arg2 [..].
    • An example would be: --fastqc_args "--nogroup --outdir /home/".
    • Passing actress arguments will automatically invoke FastQC, so --fastqc does non have to be specified separately.
  • -a/--adapter <STRING>

    • Adapter sequence to be trimmed. If not specified explicitly, Trim Galore will try to auto-notice whether the Illumina universal, Nextera transposase or Illumina small RNA adapter sequence was used. As well meet --illumina, --nextera and --small_rna.
    • If no adapter tin exist detected within the first 1 meg sequences of the first file specified Trim Galore defaults to --illumina. A unmarried base may also be given as e.one thousand. -a A{10}, to be expanded to -a AAAAAAAAAA.
  • -a2/--adapter2 <String>

    • Optional adapter sequence to be trimmed off read ii of paired-finish files. This option requires --paired to be specified as well. If the libraries to exist trimmed are smallRNA then a2 will be set to the Illumina small RNA v' adapter automatically (GATCGTCGGACT). A unmarried base may also be given as e.thousand. -a2 A{x}, to exist expanded to -a2 AAAAAAAAAA.
  • --illumina

    • Adapter sequence to be trimmed is the offset 13bp of the Illumina universal adapter AGATCGGAAGAGC instead of the default auto-detection of adapter sequence.
  • --nextera

    • Adapter sequence to exist trimmed is the first 12bp of the Nextera adapter CTGTCTCTTATA instead of the default machine-detection of adapter sequence.
  • --small_rna

    • Adapter sequence to exist trimmed is the first 12bp of the Illumina Small RNA iii' Adapter TGGAATTCTCGG instead of the default machine-detection of adapter sequence.
    • Selecting to trim smallRNA adapters will as well lower the --length value to 18bp. If the smallRNA libraries are paired-end and so -a2 will be set to the Illumina small RNA 5' adapter automatically (GATCGTCGGACT) unless -a ii had been divers explicitly.
  • --max_length <INT>

    • Discard reads that are longer than bp after trimming. This is only advised for smallRNA sequencing to remove non-small RNA sequences.
  • --stringency <INT>

    • Overlap with adapter sequence required to trim a sequence.
    • Defaults to a very stringent setting of i, i.e. even a single base pair of overlapping sequence will be trimmed of the three' terminate of whatsoever read.
  • -due east <ERROR Rate>

    • Maximum allowed error rate (no. of errors divided by the length of the matching region)
    • Default: 0.ane
  • --gzip

    • Compress the output file with gzip.
    • If the input files are gzip-compressed the output files will be automatically gzip compressed as well.
  • --dont_gzip

    • Output files won't be compressed with gzip. This overrides --gzip.
  • --length <INT>

    • Discard reads that became shorter than length INT because of either quality or adapter trimming. A value of 0 effectively disables this behaviour.
    • Default: 20 bp.
    • For paired-cease files, both reads of a read-pair need to be longer than bp to be printed out to validated paired-end files (see option --paired). If only one read became too brusk there is the possibility of keeping such unpaired single-end reads (see --retain_unpaired).
    • Default pair-cutoff: twenty bp.
  • --max_n COUNT

    • The total number of Ns (as integer) a read may contain before it will be removed altogether.
    • In a paired-finish setting, either read exceeding this limit will result in the entire pair being removed from the trimmed output files.
  • --trim-north

    • Removes Ns from either side of the read.
    • This choice does currently not work in RRBS mode.
  • -o/--output_dir <DIR>

    • If specified all output will be written to this directory instead of the current directory. If the directory doesn't exist it volition exist created for yous.
  • --no_report_file

    • If specified no study file will be generated.
  • --suppress_warn

    • If specified whatever output to STDOUT or STDERR volition be suppressed.
  • --clip_R1 <int>

    • Instructs Trim Galore to remove bp from the 5' end of read i (or single-end reads). This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the five' stop.
    • Default: OFF
  • --clip_R2 <int>

    • Instructs Trim Galore to remove bp from the 5' end of read two (paired-terminate reads only). This may exist useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5' stop.
    • For paired-end BS-Seq, it is recommended to remove the start few bp considering the end-repair reaction may introduce a bias towards low methylation. Delight refer to the K-bias plot section in the Bismark User Guide for some examples.
    • Default: OFF
  • --three_prime_clip_R1 <int>

    • Instructs Trim Galore to remove <int> bp from the three' end of read one (or single-end reads) AFTER adapter/quality trimming has been performed. This may remove some unwanted bias from the iii' stop that is not directly related to adapter sequence or basecall quality.
    • Default: OFF
  • --three_prime_clip_R2 <int>

    • Instructs Trim Galore to re move <int> bp from the 3' end of read 2 After adapter/quality trimming has been performed. This may remove some unwanted bias from the 3' end that is non straight related to adapter sequence or basecall quality.
    • Default: OFF
  • --2colour/--nextseq INT

    • This enables the option --nextseq-trim=iii'CUTOFF within Cutadapt, which will fix a quality cutoff (that is normally given with -q instead), only qualities of G bases are ignored. This trimming is in common for the NextSeq- and NovaSeq-platforms, where basecalls without any signal are called every bit high-quality G bases. More on the outcome of G-overcalling may be found here: https://sequencing.qcfail.com/articles/illumina-two-colour-chemistry-can-overcall-high-confidence-g-bases/. This is mutually exlusive with -q INT.
  • --path_to_cutadapt </path/to/cutadapt>

    • You lot may use this option to specify a path to the Cutadapt executable, east.thousand. /my/home/cutadapt-i.7.1/bin/cutadapt. Else information technology is assumed that Cutadapt is in the PATH.
  • --basename <PREFERRED_NAME>

    • Apply PREFERRED_NAME as the basename for output files, instead of deriving the filenames from the input files. Single-finish data would be called PREFERRED_NAME_trimmed.fq(.gz), or PREFERRED_NAME_val_1.fq(.gz) and PREFERRED_NAME_val_2.fq(.gz) for paired-cease data. --basename only works when one file (single-terminate) or 2 files (paired-end) are specified, only not for longer lists.
  • -j/--cores INT

    • Number of cores to be used for trimming [default: 1]. For Cutadapt to work with multiple cores, information technology requires Python three as well equally parallel gzip (pigz) installed on the system. Trim Galore attempts to detect the version of Python used by calling Cutadapt. If Python ii is detected, --cores is set to 1. If the Python version cannot be detected, Python iii is assumed and we let Cutadapt handle potential issues itself.

    • If pigz cannot be detected on your system, Trim Galore reverts to using gzip compression. Delight annotation that gzip compression will tiresome downwardly multi-core processes then much that it is hardly worthwhile, please see: https://github.com/FelixKrueger/TrimGalore/bug/16#issuecomment-458557103 for more info).

    • Actual core usage: It should exist mentioned that the actual number of cores used is a little convoluted. Assuming that Python 3 is used and pigz is installed, --cores 2 would use 2 cores to read the input (probably non at a loftier usage though), 2 cores to write to the output (at moderately high usage), and 2 cores for Cutadapt itself + 2 additional cores for Cutadapt (not sure what they are used for) + 1 core for Trim Galore itself. So this can exist up to 9 cores, even though most of them won't exist used at 100% for about of the time. Paired-end processing uses twice equally many cores for the validation (= writing out) pace. --cores iv would then exist: 4 (read) + 4 (write) + 4 (Cutadapt) + ii (extra Cutadapt) + 1 (Trim Galore) = 15, and and then along.

    • It seems that --cores 4 could be a sweet spot, anything above has diminishing returns.

SPECIFIC TRIMMING - without adapter/quality trimming

  • --hardtrim5 <int>

    • Instead of performing adapter-/quality trimming, this pick will but hard-trim sequences to bp from the 3'-finish. Once hard-trimming of files is complete, Trim Galore will leave. Hard-trimmed output files volition terminate in .<int>bp_5prime.fq(.gz).
  • --hardtrim3 <int>

    • Instead of performing adapter-/quality trimming, this pick will simply hard-trim sequences to bp from the 5'-end. Once hard-trimming of files is consummate, Trim Galore will exit. Hard-trimmed output files will terminate in .<int>bp_3prime.fq(.gz).
  • --clock

    • In this mode, reads are trimmed in a specific way that is currently used for the Mouse Epigenetic Clock (see here: Multi-tissue DNA methylation historic period predictor in mouse, Stubbs et al., Genome Biology, 2017 18:68 https://doi.org/10.1186/s13059-017-1203-5). Following this, Trim Galore will exit.

    In it's electric current implementation, the dual-UMI RRBS reads come up in the following format:

                      Read 1  5' UUUUUUUU CAGTA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF TACTG UUUUUUUU iii' Read two  3' UUUUUUUU GTCAT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF ATGAC UUUUUUUU 5'                                  

    Where UUUUUUUU is a random 8-mer unique molecular identifier (UMI), CAGTA is a abiding region and FFFFFFF... is the bodily RRBS-Fragment to be sequenced. The UMIs for Read one (R1) and Read two (R2), too equally the fixed sequences (F1 or F2), are written into the read ID and removed from the actual sequence. Here is an example:

                      R1: @HWI-D00436:407:CCAETANXX:1:1101:4105:1905 one:N:0: CGATGTTT     ATCTAGTTCAGTACGGTGTTTTCGAATTAGAAAAATATGTATAGAGGAAATAGATATAAAGGCGTATTCGTTATTG R2: @HWI-D00436:407:CCAETANXX:1:1101:4105:1905 3:North:0: CGATGTTT     CAATTTTGCAGTACAAAAATAATACCTCCTCTATTTATCCAAAATCACAAAAAACCACCCACTTAACTTTCCCTAA  R1: @HWI-D00436:407:CCAETANXX:1:1101:4105:1905 1:N:0: CGATGTTT:R1:ATCTAGTT:R2:CAATTTTG:F1:CAGT:F2:CAGT                  CGGTGTTTTCGAATTAGAAAAATATGTATAGAGGAAATAGATATAAAGGCGTATTCGTTATTG R2: @HWI-D00436:407:CCAETANXX:one:1101:4105:1905 3:Due north:0: CGATGTTT:R1:ATCTAGTT:R2:CAATTTTG:F1:CAGT:F2:CAGT                  CAAAAATAATACCTCCTCTATTTATCCAAAATCACAAAAAACCACCCACTTAACTTTCCCTAA                                  

    Following clock trimming, the resulting files (.clock_UMI.R1.fq(.gz) and .clock_UMI.R2.fq(.gz)) should be adapter- and quality trimmed with Trim Galore as usual. In addition, reads need to be trimmed by 15bp from their 3' end to get rid of potential UMI and fixed sequences. The control is:

                      trim_galore --paired --three_prime_clip_R1 15 --three_prime_clip_R2 15 *.clock_UMI.R1.fq.gz *.clock_UMI.R2.fq.gz                                  

    Post-obit this, reads should be aligned with Bismark and deduplicated with UmiBam in --dual_index mode (meet here: https://github.com/FelixKrueger/Umi-Grinder). UmiBam recognises the UMIs inside this pattern: R1:(ATCTAGTT):R2:(CAATTTTG): as (UMI R1=ATCTAGTT) and (UMI R2=CAATTTTG).

RRBS-specific options (MspI digested cloth):

  • --rrbs

    • Specifies that the input file was an MspI digested RRBS sample (recognition site: CCGG). Sequences which were adapter-trimmed will accept a further two bp removed from their iii' end. This is to avoid that the filled-in C close to the second MspI site in a sequence is used for methylation calls. Sequences which were simply trimmed considering of poor quality will not be shortened further.
  • --non_directional

    • Selecting this option for not-directional RRBS libraries volition screen quality-trimmed sequences for CAA or CGA at the start of the read and, if found, removes the first two base pairs. Like with the pick --rrbs this avoids using cytosine positions that were filled-in during the end-repair footstep. --non_directional requires --rrbs to be specified every bit well.
  • --keep

    • Keep the quality trimmed intermediate file. If not specified the temporary file will be deleted afterward adapter trimming. Only has an upshot for RRBS samples since other FastQ files are non trimmed for poor qualities separately.
    • Default: OFF
Note for RRBS using MseI:

If your DNA fabric was digested with MseI (recognition motif: TTAA) instead of MspI it is NOT necessary to specify --rrbs or --non_directional since near all reads should start with the sequence TAA, and this holds true for both directional and non-directional libraries. As the end-repair of TAA restricted sites does not involve any cytosines it does not demand to be treated especially. Instead, only run Trim Galore! in the standard, i.e. non-RRBS, mode.

Paired-end specific options:

  • --paired
    • This selection performs length trimming of quality/adapter/RRBS trimmed reads for paired-end files. To pass the validation test, both sequences of a sequence pair are required to take a certain minimum length which is governed past the option --length (see above). If only one read passes this length threshold the other read can be rescued (see pick --retain_unpaired).
    • Using this option lets you discard besides short read pairs without agonizing the sequence-by-sequence lodge of FastQ files which is required past many aligners. Trim Galore! expects paired-stop files to be supplied in a pairwise style, e.one thousand. file1_1.fq file1_2.fq SRR2_1.fq.gz SRR2_2.fq.gz ... .
  • -t/--trim1
    • Trims ane bp off every read from its 3' end.
    • This may be needed for FastQ files that are to be aligned as paired-end data with Bowtie 1. This is considering Bowtie (i) regards alignments like this as invalid (whenever a outset/end coordinate is contained within the other read):
R1 --------------------------->              R2              <---------------------------                              #                or this:              R1 ----------------------->              R2              <-----------------
  • --retain_unpaired
    • If simply one of the two paired-end reads became too brusque, the longer read will be written to either .unpaired_1.fq or .unpaired_2.fq output files. The length cutoff for unpaired single-stop reads is governed past the parameters -r1/--length_1 and -r2/--length_2.
    • Default: OFF.
  • -r1/--length_1 <INT>
    • Unpaired single-terminate read length cutoff needed for read i to be written to .unpaired_1.fq output file. These reads may be mapped in unmarried-end style.
    • Default: 35 bp
  • -r2/--length_2 <INT>
    • Unpaired single-end read length cutoff needed for read ii to be written to .unpaired_2.fq output file. These reads may exist mapped in single-end mode.
    • Default: 35 bp

millerfiltaked.blogspot.com

Source: https://github.com/FelixKrueger/TrimGalore/blob/master/Docs/Trim_Galore_User_Guide.md

0 Response to "Trim Rna Seq Reads Based on Fastqc"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel