Welcome to the full tutorial. We have designed the website to be used in a particular
sequence. Each section begins by providing an introduction. Be sure and read the introductions to each section, before diving straight
into pressing buttons, because it will inform what you are doing and enable learning.Following the introduction, there is a button linked to a video. This video is meant to be an overview to
be watched before beginning the more detailed portion of the tutorial. The videos will familiarize users with what the screen will look like
and give them a general sense of the process. After users have watched the video, they should then proceed
through the step-by-step portion of the section.
Uploading Data Section
This section of the tutorial is dedicated to ensuring that you can get your data into
Galaxy to begin the manipulation process. This is not something to be overlooked, because the
way a researcher uploads their data is determined by a few variables and can determine
format of the data when it arrives at the Galaxy workbench.
Uploading Data Video
- Begin by selecting the Get Data heading in the
Selection Tool Bar on the left of the screen. There
are various other headings, but this is the the one
that contains the File Upload tool.
- A list of tools will then drop down under the Get Data heading.
Select the Upload File tool. There are a variety of other ways
to acquire data. For example, the EBI SRA tool takes you to an
archive of freely available sequencing data set up by NCBI.
(Consider setting up an auxiliary tutorial for the SRA tool,
due to the amount of data)
- This window will appear. As previously mentioned,
there are a few options on how to upload data. If you already know
the method you wish to pursue, click on that method, otherwise keep
scrolling to learn more. The methods are as follows:
- Choose local file
- Choose FTP file's
- Paste/Fetch Data
- Choose local file:
This option is for data that is saved on a machine's
hard drive or an external hard drive.
- For example: The data
is contained in a file on the computer's desktop. This is
the option to obtain that data for use in Galaxy.
- Choose FTP (File Transfer Protocol) file: This option is for data that you have saved on a computer network. This option
allows the Galaxy server to access files on a particular network.
- For example: if a file
is saved on a university account on the university's network, this is the desired
- Paste/Fetch data: This option is for data that is accessible online.
A link can be pasted into the box and Galaxy will download/upload it. It is also possible
to paste the content of a file directly with this option. (Check how this works)
- For example:
if a file is saved in Dropbox, the link can be pasted and Galaxy will acquire that file from the link.
This is how we will obtain our sample data.
- When Paste/Fetch data option is selected this entry will appear
in the window. Now the link can be pasted in the highlighted box
- Copy & Paste the following link into the highlighted box to access sample data from Dropbox:
CONTINUE HERE when file is in upload window
- Once the link is pasted (our your file is in the window), there are a few options.
The Upload File menu shows the Size of the file, the Type of file, the Genome of the organism,
additional Settings, and the Status of the upload/download.
- The Type menu automatically fills to Auto-detect and seems reliable,
however the type of file format can be set by the user. Some example
formats include: fasta,
bam, etc. This formatting is very
important for the type of tool that is being used in future steps. Some tools will only accept
input files in a particular file. The type of the file is mutable in some instances, and tools
also exist within Galaxy to assist with format conversions.
- The Genome menu allows you to include species information if your organism is in the list provided.
(Note: This is not vital, but is helpful if you are using a model organism.)
- The Settings menu allows additional modifications to be made to the uploaded file.
- When all appears appropriately configured and ready for uploading,
press the start button in the lower right corner.
- The green status bar to the right of your file will then begin to grow
and when the upload is complete, the file will turn green in the upload window.
- At the home screen, the uploaded file should now be visible and green if it was successfully uploaded.
- If it is red, then an error has occurred and I would recommend starting the process over.
- Clicking the pencil icon provides the users with a menu to allow
editing of file attributes.
- Changes in file formatting and datatypes can be made using the top tabs. Examples of
the types of changes that can be made include: name, data type, and species data if something is
- It is worth noting that the file format can be converted from this menu, via the top tabs.
- Note: Filename is now SampleBurmeseIntestine due to editing
- The contents of the file can be viewed by using the View Data command (Eye).
- The file can be deleted using the delete command (X).
Trimmomatic Read Processing:
*Returning to the same menu where Upload File was located, there is also a Trimmomatic tool (not correct).
Trimmomatic performs a variety of useful trimming tasks for illumina paired-end
and single ended data. The goal of using Trimmomatic is to improve the the overall
read (sequence) quality of your sample, so that the later read alignment is more accurate,
which in turn produces a more legitimate list of differentially expressed genes.
The selection of trimming steps and their associated parameters are associated with
particular tool operations. The parameters provided are generic, and generally good
for most samples.
The best way to think about Trimmomatic is a sliding box that review the quality of reads within the box.
If a nucleotide read within the box is below a certain quality then that part of the sequence read
is removed, trimming the sequence read and improving the overall quality.
Also, if the overall sequence read quality is to low, the entire sequence read is removed.
For more information about Trimmomatic click here.
Documentation and download is provided here:
- When Trimmomatic is selected, a screen appears that outlines the specifications
for your particular job.
- Note: In order to use Trimmomatic, file must be in some sort of
fastq format (this includes fastqsanger). However, you will know if the
data is improperly formatted, because Galaxy will not present the file as
an option in the drop down menu with improper formatting.
Below are the specifications that our group used with this particular sample.
Again, these are just generally good specifications to use, but you're welcome
to modify these to suit your needs and data. The one that we changed beyond the
default are the ones that have been included. To add the new Trimmomatic Operation
(because only one will appear initially), click the Insert Trimmomatic Operation button.
- The description of each operation can be found beneath the Trimmomatic Operation menu.
If you have more extended questions refer to the documentation provided above.
- Trimmomatic is now ready the execute command. Click the button to proceed.
- Trimmomatic will run and this green message will appear in the window.
- The file will appear as grey (or yellow if you refresh) at first in the History bar,
but will turn to green upon completion. When the job is completed the data becomes accessible.
RSEM Prepare Reference:
Now this tutorial assumes that you at least have a source of reference transcripts for alignment.
This can be a reference genome, or a combination of transcripts, but at least a reference of some sort.
Non reference based assembly, or de novo assembly, is another capability of RNA-Seq technology
that can be explored in further depth by clicking here, but will not
be discussed in this tutorial.
Once your reference data is uploaded to the Galaxy history and accessible it
must be configured so that RSEM can use it for alignment and expression calculation.
Alignment in this context
is the process of aligning the transcripts in the reference, to the expression levels
that will be determined in the next phase of the process. For more information
about RSEM click here.
RSEM Prep Video
- Return to the Get Data heading and click on the Upload File tool.
- Repeat the same process with the file you will use as your reference for alignment.
- Note: The file should be in FASTA format.
- Choose the Transcript Quantification heading in the Tools selection bar.
- This window will appear. In our example we will have the specifications outlined, but it may vary.
Right here we need something about the Poly(a) tails being added
- The Execute command is now ready. This should produce a file that can
be used for RSEM calculate expression in the next phase of expression analysis.
- RSEM will run and this message will appear. The file will appear as
grey at first in the History bar, but again, will turn to green upon completion
DESeq2 Differential Expression Analysis:
This portion of the tutorial we will review differential gene expression analysis using the DESeq2 tool.
The tool is an R programming language based statistical package. You provide the software with a group of samples for analysis.
It is important to remember that if statistics are to be done, then multiple samples are needed for each treatment group and their
can be multiple treatment groups, but there must be a minimum of a control and an experimental. This tool attempts to quantify the differences
in expression between treatment groups and determine, which differences are significant.
Original paper can found here
Additional documentation can be found here
- Return to the Tool selection bar and search for DESeq2 in the
search tools search bar at the top of the tools column.
Select the DESeq2 tool from the results.
- Upon clicking the tool, this window will appear.
The analysis will be conducted from within this window.
- To analyze data, a factor must be specified based on treatment groups.
For our example we were using unfed and fed Burmese pythons, so our factor name
will be feeding level (because this is the nature of the difference of the two
- Note: The ordering of the treatment does affect signs on the results. For more information
see the Results section in the attached documentation above.
- This is where it is necessary to specify the name of a particular treatment. This is more
for the benefit of the user in regards to the generated output, rather than the program itself.
In our example, our control group is first, so we will name this factor Unfed.
- The files that are related to the treatment specified are selected
- The second treatment group is specified in the next
factor level. In our example, this will be the experimental, or fed group
- Again, select the files that match this particular treatment
- Note: The insert factor level will allow you to add another group for analysis, so
going along with our example this might be excessively fed (or something like that, an intermediate
treatment is tough with our example because the data is essentially binary). The same can be done for multiple experiments.
- If more explanation of factors and DESeq2 is needed, there is an explanation provided at the bottom of the page
- Then execute the job. When the job is complete the generated output will turn green in the history tool bar and at that point can be examined.
- Two files will be generated. One is a table of genes with differential expression values.
- Note: for more information on how to interpret the log change (positive or negative) see supplementary
material here or see the package details provided above.
- The other file is a set of visualizations and graphics that were generated by DESeq2 that enable the analysis of the differential expression levels
- This is a key for each of the column headings of the output file containing the list of differentially expressed genes