TUTORIAL

Preparing Input Files

Create one or two folders in Google Drive or personal Globus endpoint which hold all sequencing data files, organized by group. The input file must be in FASTQ or FASTA format. If the files are either empty or incorrectly formatted, they will not be processed. To create a Globus account, follow the instructions here. To create a Globus personal endpoint, follow the instructions here

Accessing CCMP

Access CCMP by entering https://ccmp.usc.edu into the web browser. Sign in to CCMP using Google or Globus, and follow the on-screen prompts to access the data upload page. For Globus, enter your endpoint ID and email address to which CCMP report will be sent. You can find your endpoint ID by log-in your Globus account, clicking “Endpoints” in upper right, selecting the endpoint you want to use, and copying the UUID.

Selecting a File Type

Select the file type corresponding to the format of the input data. FASTQ format is set as default, and if input data are in a FASTA format, click FASTA.

Reference Database

Select the database. Silva is set as default. Click GreenGenes or EzBio to use those databases.

Selecting Additional Properties

If the data are in the form of paired-end forward and reverse reads in the FASTQ format, click PAIRED-ENDS and type-in the forward read indicator and reverse read indicator into INDICATORS. Files names except the indicators should be the same between paired-end samples for CCMP to recognize them. 

If the data are paired samples, for example, different time point data from the same individuals, pre and post treatment data, or twin data, click PAIRED SAMPLES and type-in the pair indicators into INDICATORS. Samples will be processed into two sub-groups based on the specified indicators. Files names except the indicators should be the same between paired samples for CCMP to recognize them. 

If the data have already been processed by CCMP, and are in an annotated FASTA format from prior processing, select ANNOTATED. Files should be annotated FASTA files from CCMP.

Uploading Data

Google Drive: Select DATA UPLOAD and follow the on-screen prompt to re-authenticate CCMP for Google Drive. Then press SELECT FOLDER which opens Google Drive. From this window, pick the folder where input files are stored. If the data are organized into two groups, press ADD GROUP, then pick the second folder from Google Drive. 

Globus: Select ADD GROUP and type-in the folder path of your Globus endpoint where the input files are stored. You can find the folder path information  here. If data are organized in two groups, press ADD GROUP and type-in the second folder path. 

Submitting and Accessing Results

Google Drive: Select SUBMIT. When the pipeline is complete, the user will receive an email and CCMP results are uploaded directly to Google Drive as a single zip file named with the request ID in the email. 

Globus: Select SUBMIT. After submission, the user needs to keep their personal Globus endpoint and Globus Personal Hotspot on until they are notified via email that the submitted task is complete. The submitted task is queued by CCMP, and if the user turns off their personal Globus endpoint computer or Globus Personal Hotspot, CCMP will not be able to access their data files when the submitted task reaches at the top of the queue. 
When the pipeline is complete, the results are uploaded as a single zip file directly to a shared endpoint. When the user is notified via email, they visit Globus Transfer Files page here, click Endpoint and select the shared endpoint with the request ID in the email, then they can transfer the zip file from the shared endpoint to their personal endpoint.