Experimental Group Setup

Experimental
Group Setup

On the dataset page, you can access study description, available sample groups and sample group comparisons

Selection of datasets
The datasets flagged with check marks have passed the manual quality control of meta-data annotation, study design relevance to the disease, and data quality.

To define the experimental group, choose a Sample Attribute of interest. Different datasets can be associated with more or less detailed sample annotation, so the corresponding choice options can vary.

In case of increased disease heterogeneity or a large number of samples, cluster analysis can be used to identify similar groups of samples based on the total gene expression profile

Training Video

Clustering Analysis

in PandaOmics has three important features

Gene subset selection

In order to contrast the differences between the potential groups of samples, the system identifies a set of genes with the expression varying the most across all the selected samples. By default, PandaOmics selects the top 25% of all genes by expression variability. This amount could be then changed manually
Samples visualization chart

In order to demonstrate the similarity of different samples on the basis of gene expression, PandaOmics needs to visualize them on a chart. It requires the application of an additional computational approach as each sample is a 20.000 dimensional object (if we take all 20.000 protein coding genes into account), on a two-dimensional space (a chart). Pandomics provides three solutions for data visualization utilizing machine learning approaches:

UMAP − Uniform Manifold Approximation and Projection for Dimension Reduction;

T-SNE − t-distributed stochastic neighbor embedding;

PCA − Principal Component Analysis;

The default visualization method is UMAP, however it is possible to switch to another one at any moment. You can overlay metadata (sample annotation) on this chart exploring the distribution of characteristics across the samples and compare them with clustering suggestions
Clustering methods propose selecting sample groups based on the similarity of expression profiles
PandaOmics provides three machine learning approaches:
- Spectral clustering;
- HDBSCAN − Density-based spatial clustering of applications with noise;
- k-means clustering
The default clustering method is Spectral,

however it is possible to switch to another one at any moment

PandaOmics helps you with the clustering analysis providing default parameters. We encourage you to modify default settings in order to explore the relationship between the samples.

Once a group of samples with desirable attributes is selected, you can either keep a default group name generated by the system (based on the selected sample attributes), or rename it.

You can define multiple groups on the same page. Click on Save and Close to create experimental sample groups and start comparing them.

Training Video

Back to Top