User dataset import to PandaOmics is a two-step process:
Upload a file containing -omics data, e.g., gene or protein expression matrix
Upload a file describing the sample metadata attributes
Both -omics data and metadata should be formatted as either comma-separated or tab-separated text files (also referred to as CSV/TSV).
Data upload button is located in the main menu available at the top of any page in PandaOmics.
You can import a new dataset using the "Data Upload" page. Simply drag and drop to upload your files. Omics data files should be arranged as a matrix with samples presented in columns and genes/proteins presented in rows (see the example below).
The following -omics types are supported
Transcriptomics, both microarray and RNASeq data
Proteomics, the absolute quantification of protein levels in samples
Methylomics, methylation levels of CpG sites should be summarized at the gene level (TSS1500 region is recommended). Both M- and B-values are supported
After the upload of raw omics data and metadata attributes is complete, the following steps are performed to ensure that the dataset will be analyzed smoothly:
1
Gene/protein identifiers are converted to HGNC gene symbols. Unrecognized genes will be filtered out of the analysis
2
Samples and/or genes mostly containing missing values (NAs) will be filtered out of the analysis
3
Duplicated genes/proteins will be eliminated
4
The data will be log-transformed and normalized as necessary
5
When dealing with Methylome datasets, Beta-values will be automatically converted to M-values, as the latter are more suitable for differential analysis
The following gene identifiers are allowed
Gene Symbol ID
RefSeq ID
Ensembl Gene ID
UCSC Gene ID
UniProt ID
Entrez Gene ID
Data value formats are automatically recognized by the system and treated accordingly: The Positive Integer — a number that indicates gene counts
The Positive Decimal — a normalized gene expression signal
The Positive and Negative Decimal — normalized, log-transformed gene expression values. Please note that proteomics data should not be log-transformed.
Example data matrix
File formatting options are recognized by the system automatically
Manual selection of those parameters is available in advanced settings.
1
File encoding
2
Value delimiter
including a comma, a tab, and a space, etc.
3
Missing value indicator
The following symbols are handled as missing values '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', '<NA>', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'.
4
Decimal separator
a dot or a comma
5
Handling missing values
Detected missing values can be imputed with "Zero", "Geometric mean", or "Arithmetic mean" depending on data distribution
To complete data import, you should enter the name of the dataset and specify the omics technology (Microarray, RNA-Seq, Proteomics, Methylation), then click the upload button. The upload process may take some time, depending on the file size. Meanwhile, you can proceed using PandaOmics and get back to data upload later. Only one dataset can be uploaded at a time.
Once a dataset is uploaded, clicking on the 'Confirm' button will redirect you to the 'Dataset' page, where you can find the general information about the dataset that includes the total number of samples and genes/proteins. Here you also can click on the 'Add Metadata' button. This will open a 'Sample Metadata' upload page where you can drag and drop a file containing sample metadata.
Metadata annotation can be used for the further setup of the sample groups. Below is an example of a metadata table. The first column must always contain sample identifiers matching with the sample IDs of the original dataset matrix.
A metadata file can be associated with a dataset. Metadata annotation can be used for further setup of the sample groups. Below is an example of a metadata table
The first column should always contain sample identifiers that match with the sample IDs of the original dataset matrix
Once the metadata file is uploaded, you can browse 'Metadata Statistics' that includes:
The total number of detected sample attributes
The number of samples from the -omics data file matching sample names from the metadata file
The list of metadata attributes, including attribute type.
Clicking on the Confirm button will redirect you to the Dataset entity page where you can start creating experimental sample groups and run case/control comparisons.
Once the dataset is uploaded, it can be found in the Data Manager.