HomeUploadTrimmomaticRSEMPrepRSEMCalcDESeq2


Welcome to the full tutorial. We have designed the website to be used in a particular sequence. Each section begins by providing an introduction. Be sure and read the introductions to each section, before diving straight into pressing buttons, because it will inform what you are doing and enable learning.Following the introduction, there is a button linked to a video. This video is meant to be an overview to be watched before beginning the more detailed portion of the tutorial. The videos will familiarize users with what the screen will look like and give them a general sense of the process. After users have watched the video, they should then proceed through the step-by-step portion of the section.



Uploading Data Section

This section of the tutorial is dedicated to ensuring that you can get your data into Galaxy to begin the manipulation process. This is not something to be overlooked, because the way a researcher uploads their data is determined by a few variables and can determine format of the data when it arrives at the Galaxy workbench.



Uploading Data Video



  • Begin by selecting the Get Data heading in the Selection Tool Bar on the left of the screen. There are various other headings, but this is the the one that contains the File Upload tool.


  • A list of tools will then drop down under the Get Data heading. Select the Upload File tool. There are a variety of other ways to acquire data. For example, the EBI SRA tool takes you to an archive of freely available sequencing data set up by NCBI. (Consider setting up an auxiliary tutorial for the SRA tool, due to the amount of data)


  • This window will appear. As previously mentioned, there are a few options on how to upload data. If you already know the method you wish to pursue, click on that method, otherwise keep scrolling to learn more. The methods are as follows:
  • Choose local file
  • Choose FTP file's
  • Paste/Fetch Data





  • Choose local file: This option is for data that is saved on a machine's hard drive or an external hard drive.
  • For example: The data is contained in a file on the computer's desktop. This is the option to obtain that data for use in Galaxy.




  • Choose FTP (File Transfer Protocol) file: This option is for data that you have saved on a computer network. This option allows the Galaxy server to access files on a particular network.
  • For example: if a file is saved on a university account on the university's network, this is the desired option.



  • Paste/Fetch data: This option is for data that is accessible online. A link can be pasted into the box and Galaxy will download/upload it. It is also possible to paste the content of a file directly with this option. (Check how this works)
  • For example: if a file is saved in Dropbox, the link can be pasted and Galaxy will acquire that file from the link. This is how we will obtain our sample data.


  • When Paste/Fetch data option is selected this entry will appear in the window. Now the link can be pasted in the highlighted box


  • Copy & Paste the following link into the highlighted box to access sample data from Dropbox:
  • https://www.dropbox.com/s/080805035bjx8oy/SampleBurmeseIntestine.fastq?dl=1

CONTINUE HERE when file is in upload window


  • Once the link is pasted (our your file is in the window), there are a few options. The Upload File menu shows the Size of the file, the Type of file, the Genome of the organism, additional Settings, and the Status of the upload/download.


  • The Type menu automatically fills to Auto-detect and seems reliable, however the type of file format can be set by the user. Some example formats include: fasta, fastq, bam, etc. This formatting is very important for the type of tool that is being used in future steps. Some tools will only accept input files in a particular file. The type of the file is mutable in some instances, and tools also exist within Galaxy to assist with format conversions.


  • The Genome menu allows you to include species information if your organism is in the list provided. (Note: This is not vital, but is helpful if you are using a model organism.)


  • The Settings menu allows additional modifications to be made to the uploaded file.


  • When all appears appropriately configured and ready for uploading, press the start button in the lower right corner.
  • The green status bar to the right of your file will then begin to grow and when the upload is complete, the file will turn green in the upload window.


  • At the home screen, the uploaded file should now be visible and green if it was successfully uploaded.
  • If it is red, then an error has occurred and I would recommend starting the process over.


  • Clicking the pencil icon provides the users with a menu to allow editing of file attributes.


  • Changes in file formatting and datatypes can be made using the top tabs. Examples of the types of changes that can be made include: name, data type, and species data if something is incorrect
  • It is worth noting that the file format can be converted from this menu, via the top tabs.


  • Note: Filename is now SampleBurmeseIntestine due to editing
  • The contents of the file can be viewed by using the View Data command (Eye).


  • The file can be deleted using the delete command (X).






Trimmomatic Read Processing:


*Returning to the same menu where Upload File was located, there is also a Trimmomatic tool (not correct).

Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data. The goal of using Trimmomatic is to improve the the overall read (sequence) quality of your sample, so that the later read alignment is more accurate, which in turn produces a more legitimate list of differentially expressed genes. The selection of trimming steps and their associated parameters are associated with particular tool operations. The parameters provided are generic, and generally good for most samples.

The best way to think about Trimmomatic is a sliding box that review the quality of reads within the box. If a nucleotide read within the box is below a certain quality then that part of the sequence read is removed, trimming the sequence read and improving the overall quality. Also, if the overall sequence read quality is to low, the entire sequence read is removed. For more information about Trimmomatic click here.

Documentation and download is provided here:
http://www.usadellab.org/cms/index.php?page=trimmomatic



Trimmomatic Video



  • When Trimmomatic is selected, a screen appears that outlines the specifications for your particular job.


  • Note: In order to use Trimmomatic, file must be in some sort of fastq format (this includes fastqsanger). However, you will know if the data is improperly formatted, because Galaxy will not present the file as an option in the drop down menu with improper formatting.


Below are the specifications that our group used with this particular sample. Again, these are just generally good specifications to use, but you're welcome to modify these to suit your needs and data. The one that we changed beyond the default are the ones that have been included. To add the new Trimmomatic Operation (because only one will appear initially), click the Insert Trimmomatic Operation button.






  • The description of each operation can be found beneath the Trimmomatic Operation menu. If you have more extended questions refer to the documentation provided above.


  • Trimmomatic is now ready the execute command. Click the button to proceed.


  • Trimmomatic will run and this green message will appear in the window.
  • The file will appear as grey (or yellow if you refresh) at first in the History bar, but will turn to green upon completion. When the job is completed the data becomes accessible.




RSEM Prepare Reference:



Now this tutorial assumes that you at least have a source of reference transcripts for alignment. This can be a reference genome, or a combination of transcripts, but at least a reference of some sort. Non reference based assembly, or de novo assembly, is another capability of RNA-Seq technology that can be explored in further depth by clicking here, but will not be discussed in this tutorial.


Once your reference data is uploaded to the Galaxy history and accessible it must be configured so that RSEM can use it for alignment and expression calculation. Alignment in this context is the process of aligning the transcripts in the reference, to the expression levels that will be determined in the next phase of the process. For more information about RSEM click here.


RSEM Prep Video



  • Return to the Get Data heading and click on the Upload File tool.

  • Repeat the same process with the file you will use as your reference for alignment.
  • Note: The file should be in FASTA format.


  • Choose the Transcript Quantification heading in the Tools selection bar.


  • This window will appear. In our example we will have the specifications outlined, but it may vary.


Right here we need something about the Poly(a) tails being added

  • The Execute command is now ready. This should produce a file that can be used for RSEM calculate expression in the next phase of expression analysis.


  • RSEM will run and this message will appear. The file will appear as grey at first in the History bar, but again, will turn to green upon completion






RSEM Calculate Expression:


This portion of the tutorial is dedicated to teaching the alignment of expression sequence reads to a reference (which was developed in the previous section). The results of this section are what will be used for the differential gene expression analysis. We will be using default settings but they can be modified depending on your own data.

Additional documentation can be found here:
http://deweylab.biostat.wisc.edu/rsem/rsem-calculate-expression.html


RSEM Calc Video



  • Return to the Tool selection bar and under the Transcript Quantification heading there is a RSEM calculate expression tool.


  • This window will appear. In our example we will have the specifications outlined, but it may vary.
  • Note: Initially the reference file will not appear.


  • From the drop down menu, change the RSEM Reference Source from locally cached to From your history. Now the tool has access to the uploaded data. The Reference file should then appear in this field.


  • The specifications will vary depending on the job you are running, but the specifications for this example are outlined.


  • The Execute command is now ready. This should produce a file that can be used for DESeq Analysis.


  • RSEM will run and this message will appear. The five files will appear as grey at first in the History bar, but will turn to green upon completion.
  • Note: The number of files vary depending on the input for the Create BAM Results field. This job takes ~20 min to run






DESeq2 Differential Expression Analysis:


This portion of the tutorial we will review differential gene expression analysis using the DESeq2 tool. The tool is an R programming language based statistical package. You provide the software with a group of samples for analysis. It is important to remember that if statistics are to be done, then multiple samples are needed for each treatment group and their can be multiple treatment groups, but there must be a minimum of a control and an experimental. This tool attempts to quantify the differences in expression between treatment groups and determine, which differences are significant.

Original paper can found here

Additional documentation can be found here


DESeq2 Video



  • Return to the Tool selection bar and search for DESeq2 in the search tools search bar at the top of the tools column. Select the DESeq2 tool from the results.


  • Upon clicking the tool, this window will appear. The analysis will be conducted from within this window.


  • To analyze data, a factor must be specified based on treatment groups. For our example we were using unfed and fed Burmese pythons, so our factor name will be feeding level (because this is the nature of the difference of the two treatment groups).
  • Note: The ordering of the treatment does affect signs on the results. For more information see the Results section in the attached documentation above.


  • This is where it is necessary to specify the name of a particular treatment. This is more for the benefit of the user in regards to the generated output, rather than the program itself. In our example, our control group is first, so we will name this factor Unfed.


  • The files that are related to the treatment specified are selected


  • The second treatment group is specified in the next factor level. In our example, this will be the experimental, or fed group


  • Again, select the files that match this particular treatment
  • Note: The insert factor level will allow you to add another group for analysis, so going along with our example this might be excessively fed (or something like that, an intermediate treatment is tough with our example because the data is essentially binary). The same can be done for multiple experiments.


  • If more explanation of factors and DESeq2 is needed, there is an explanation provided at the bottom of the page


  • Then execute the job. When the job is complete the generated output will turn green in the history tool bar and at that point can be examined.


  • Two files will be generated. One is a table of genes with differential expression values.
  • Note: for more information on how to interpret the log change (positive or negative) see supplementary material here or see the package details provided above.


  • The other file is a set of visualizations and graphics that were generated by DESeq2 that enable the analysis of the differential expression levels


  • This is a key for each of the column headings of the output file containing the list of differentially expressed genes