ImmuCo  Version 1.0     Welcome to the ImmuCo database
Frequently Asked Questions (FAQ)

What does the ImmuCo database do?

ImmuCo is a database of gene co-expression and correlation in immune cells. The database provides information regarding transcriptional co-expression and correlation between any gene pair in immune cells. Currently, 20,283 human and 20,963 mouse genes can be queried. A scatter plot of signal values calculated using the MAS5.0 method illustrates the extent of correlation between the queried gene pair. In addition, the Pearson correlation coefficient (r value) and the most correlated genes based on r values are also provided.

To see a detailed list of the genes and probe sets covered by ImmuCo, click here: human and mouse.
Where do the data come from?

In the current version, expression data for a total of 8,926 human and 3,682 mouse samples from the Affymetrix Human Genome U133 Plus 2.0 and Mouse Genome 430 2.0 microarrays, respectively, are provided for the correlation analysis. All of these microarray data sets were obtained from GEO ( Arrays related to immune cells, including T cells, B cells, plasma cells (mainly derived from patients with multiple myeloma), natural killer (NK) cells, monocytes, macrophages, dendritic cells (DCs), and neutrophils/PMNs (polymorphonuclear leukocytes) were retrieved by text mining of the annotated sample information and confirmed manually. Certain cell types are further divided into various groups according to their pathological states. For example, the "B cell (ALL)" group includes B cells from patients with acute lymphoblastic leukaemia, while the "B cell (CLL)" group includes B cells from patients with chronic lymphoid leukaemia.
How are the signal values calculated?

Affymetrix array analysis was performed with the "affy" package ( /html/affy.html) in Bio-conductor using the MAS 5.0 method. All default parameters were retained. The signal intensity value, detection p value, and detection call were generated for each probe set.
How is quality control performed?

For the current analysis, the following human QC markers were used: B cells, CD19 (206398_s_at); CD4+ T cells, CD4 (203547_at); CD8+ T cells, CD8A (205758_at); macrophages and PMNs, CD11b (205786_s_at); monocytes, CD14 (201743_at); NK cells, CD56 (212843_at); plasma cells, CD138 (201286_at); DCs, CD11c (210184_at); total T cells, CD3 (213539_at, 205456_at, 206804_at, 210031_at), CD4 (203547_at), and CD8A (205758_at); haematopoietic stem cells, CD34 (209543_s_at); BMMCs (bone marrow mononuclear cells), MPO (203948_s_at). For PBMC (peripheral blood mononuclear cells), CD3, CD19, CD14 and CD56 should be expressed. Unlike classic or conventional DCs (cDCs), plasmacytoid DCs (pDCs) do not express CD11c (12,13); arrays from pDC samples are also retained as a part of the DC samples.
For the current analysis, the following mouse QC markers were used: B cells, CD19 (1450570_a_at); CD4+ T cells, CD4 (1419696_at); CD8+ T cells, CD8A (1425335_at); macrophages, CD11b (1422046_at); DCs, CD11c (1419128_at); total T cells, CD3 (1426396_at, 1422828_at, 1422105_at, 1419178_at), CD4 (1419696_at) and CD8A (1425335_at); haematopoietic stem cells, CD34 (1416072_at); regulatory T cells (Tregs), CD4 (1419696_at) and CD25 (1420692_at); thymocytes, Rag1 (1450680_at) and Dntt (1449757_x_at); splenocytes, CD3 (1426396_at, 1422828_at, 1422105_at, 1419178_at) and CD19 (1450570_a_at).
Why is the calculated correlation cell type-specific?

Genes are generally expressed in a cell-specific manner, and gene expression is highly dynamic and plastic under different environmental or experimental conditions, even within the same cell types. For wet lab experiments, transcriptional correlation or co-expression in the same cells is most relevant. Thus, correlation analysis within the same cell type provides more precise and reliable results for guiding experiments.
What is the initial search's input and output?

The gene symbol or alias, gene ID, or probe set ID (if known) can be used as input (see below). The output shows a scatter plot of signal values for the queried gene pair, as well as information including probe set IDs, Gene IDs, and HUGO Gene names and descriptions. The probe sets most relevant to the queried genes are also provided. Moreover, users can download an Excel table containing the signal values for the queried genes in the cell type of interest.

To see the look-up tables, click here: human and mouse.
What do the data points in the scatter plot mean?

The signal values of queried genes are graphed on a scatter plot. Different points represent different GEO sample (GSM) sources. The x-axis indicates the signal values of queried gene A, whereas the y-axis indicates the values of queried gene B. Each data point indicates simultaneously the values of gene A and gene B in the same sample from the queried immune cells.
How to download the scatter plot?

Users can download the scatter plot and save it as a figure file. Please move your mouse cursor over the scatter plot, just click the right mouse button and "save as" or "copy" to your computer hard drive.
How to trace the GEO samples related to the abrupt data points in correlation?

All signal values of the scatter plot can be downloaded in a CSV format file via corresponding link in the result page. User can judge the value ranges of any abrupt data points in the scatter plot and trace the GEO samples by Excel operation. Alternatively, users can create a similar scatter plot in Excel and can easily find a one-to-one relationship between sample and signal value.
What do the PP rate and AA rate mean?

Detection calls can make users trace the co-existence state of the queried gene pair, that is, they show synchronously present (present-present, PP) or absent (absent-absent, AA) calls in the same samples. The PP (or AA) rate is calculated by dividing the sample count with PP (or AA) state by the total sample count of the queried cell group. The total co-existence rate is the sum of PP rate and AA rate.
What is the meaning of the most relevant probe sets to gene A (or B)?

Currently, twenty probe sets with the highest correlation (based on r values) with the queried gene A or gene B are shown in the result page. High Pearson correlation (which is a measure of linear correlation (dependence) between two variables) often indicates potential causality, suggesting that one variable determines another or that they are both determined by a third potential element. The biological meaning indicates that highly correlated genes may share common pattern of gene regulation, join the same signalling pathways, or encode components of the same protein complex. Therefore, highly correlated genes remind users the potential functional associations with the queried genes, and this will guide users to perform further experimental validation.
