Skip to content


G4X-output structure

Sample directory tree


Directory structure depends on run type.

<sample_root>
├── diagnostics
│   └── transcript_table.parquet
├── g4x_viewer 
│   ├── <sample_1>.bin
│   ├── <sample_1>.ome.tiff              
│   ├── <sample_1>.tar
│   ├── <sample_1>_HE.ome.tiff
│   ├── <sample_1>_nuclear.ome.tiff
│   └── <sample_1>_run_metadata.json
├── h_and_e 
│   ├── eosin.jp2
│   ├── eosin_thumbnail.png
│   ├── h_and_e.jp2
│   ├── h_and_e_thumbnail.jpg
│   ├── nuclear.jp2
│   └── nuclear_thumbnail.png
├── metrics 
│   ├── core_metrics.csv
|   ├── protein_core_metrics.csv
│   └── per_area_metrics.csv
├── protein                             
│   ├── <protein_1>.jp2
│   ├── <protein_1>_thumbnail.png
│   ├── <protein_2>.jp2
│   ├── <protein_2>_thumbnail.png
│   └── …
├── protein_panel.csv                   
├── rna
│   └── transcript_table.csv.gz
├── run_meta.json
├── samplesheet.csv
├── segmentation
│   └── segmentation_mask.npz
├── single_cell_data
│   ├── cell_by_protein.csv.gz          
│   ├── cell_by_transcript.csv.gz
│   ├── cell_metadata.csv.gz
│   ├── clustering_umap.csv.gz
│   ├── dgex.csv.gz
│   └── feature_matrix.h5
├── summary_<sample_1>.html
└── transcript_panel.csv
<sample_root>
├── diagnostics
│   └── transcript_table.parquet
├── g4x_viewer
│   ├── <sample_1>.bin
│   ├── <sample_1>.tar
│   ├── <sample_1>_HE.ome.tiff
│   ├── <sample_1>_nuclear.ome.tiff
│   └── <sample_1>_run_metadata.json
├── h_and_e
│   ├── eosin.jp2
│   ├── eosin_thumbnail.png
│   ├── h_and_e.jp2
│   ├── h_and_e_thumbnail.jpg
│   ├── nuclear.jp2
│   └── nuclear_thumbnail.png
├── metrics
│   ├── core_metrics.csv
│   └── per_area_metrics.csv
├── rna
│   └── transcript_table.csv.gz
├── run_meta.json
├── samplesheet.csv
├── segmentation
│   └── segmentation_mask.npz
├── single_cell_data
│   ├── cell_by_transcript.csv.gz
│   ├── cell_metadata.csv.gz
│   ├── clustering_umap.csv.gz
│   ├── dgex.csv.gz
│   └── feature_matrix.h5
├── summary_<sample_1>.html
└── transcript_panel.csv


Sample sub-directory reference


root of sample_folder

run_meta.json: JSON file containing versioning information for the panels used and analysis pipelines as well as the ID for the sequencer on which the experiment was run.

samplesheet.csv: CSV file containing detailed run information. Details the experimental design, flow cell layout, tissue type, panel utilized, etc. This file is useful for analysis, as it designates where each tissue section is positioned on the flow cell.

summary_<sample_id>.html: HTML file which gives a high level overview of the experiment outputs, data quality, and performance for the selected tissue block.

transcript_panel.csv: CSV file containing a full list of all targeted genes in this experiment and the panel(s) which they originated from.

protein_panel.csv: CSV file containing a full list of all targeted proteins in this experiment and the panel(s) which they originated from. Multiomics runs only


diagnostics/

transcript_table.parquet: Parquet file containing all decoded and non-decoded transcripts and associated metadata (e.g. spatial coordinate, gene identity, cell identity (if assigned to a cell), quality score, sequence). Parquet files can be loaded in Python using the polars, fastparquet, pandas, and pyarrow packages.

Expand to see column descriptions
column type description
x_coord_shift float The x coordinate for the transcript (shifted to global coordinates)
y_coord_shift float The y coordinate for the transcript (shifted to global coordinates)
z int z layer of identified transcript
demuxed bool Whether or not the transcript was demultiplexed
transcript_condensed str Shortened name of transcript
meanQS float Mean quality score for the transcript
cell_id uint Cell ID
sequence_to_demux str Sequence identified that will be demultiplexed
transcript str Long form transcript name (specific to single probe)
TXUID str Unique identifier for the transcript


g4x_viewer/

Tip

The G4X-viewer is a web-based tool for visualizing and exploring G4X-data. All files in this directory are designed to be loaded into and explored with the G4X-viewer. For more information on how to use the G4X-viewer, see G4X-viewer.

<sample_id>.bin: Binary file containing the segmentation mask for the stitched image. Can be easily read in Python with numpy.

<sample_id>.ome.tiff: Multidimensional OME-TIFF image file. On windows, this may appear as <sample_id>.ome. This image contains aggregated images for all protein targets as well as nuclear stain. Can be loaded into any standard ome.tiff readers, including our G4X-viewer, and napari. Multiomics runs only..

<sample_id>_HE.ome.tiff: OME-TIFF image file containing the fH&E stain images. Can be loaded into any standard OME-TIFF readers, including our G4X-viewer and napari.

<sample_id>_nuclear.ome.tiff: OME-TIFF image file containing the nuclear stain images. Can be loaded into any standard OME-TIFF readers, including our G4X-viewer and napari.

<sample_id>_run_metadata.json: JSON file containing much of the same information as the run_meta.json along with extra core metrics information (such as tissue area, total tx, etc).

<sample_id>.tar: Tarball containing all other files from this directory bundled into one file. This can be loaded into the G4X-viewer directly with the “single file upload” option to avoid dragging each file individually. May take longer to load than the individual files due to needing to untar the components before displaying on the Viewer.


metrics/

core_metrics.csv: CSV file containing a set of core metrics for the tissue block including total transcripts, total area, number of cells and more.

protein_core_metrics.csv: CSV file containing a set of core protein metrics for the tissue block including SNR, background intensity, and Fisher's exact scores for the co-occurrence of the protein signal with its associated transcript signal (<protein>_fisher_score) and a random background (<protein>_fisher_score_background). These scores indicate the likelihood of the signal being true signal compared to the measured background. Multiomics runs only.

per_area_metrics.csv: CSV file containing a set of per-area metrics for the tissue block (coordinate location, number of transcripts, and number of cells), separated out into images from before the images were stitched together into one whole block.


h_and_e/

Tip

The .jp2 images in this folder and the /protein/ folder are suitable to use for both nuclear and cytoplasmic segmentation. For more information on how you might do this, see segment data.

eosin.jp2: Full-sized eosin stained JPEG image used for analysis purposes for selected tissue block.

eosin_thumbnail.png: Downsampled PNG image from the .jp2 file for easier viewing of the eosin stain for selected tissue block.

h_and_e.jp2: Full-sized fH&E JPEG image used for analysis purposes for selected tissue block.

h_and_e_thumbnail.jpg: Downsampled PNG image from the .jp2 file for easier viewing of the fH&E stain for selected tissue block.

nuclear.jp2: Full-sized nuclear stained JPEG image used for analysis purposes for selected tissue block.

nuclear_thumbnail.png: Downsampled PNG image from the .jp2 file for easier viewing of the nuclear stain for selected tissue block.


protein/ (protein runs only)

<protein_name>.jp2: Full-sized JPEG image used for analysis purposes. Shows the <protein_name> stain for selected tissue block.

<protein_name>_thumbnail.png: Downsampled PNG image of the .jp2 file for easier viewing. Shows the <protein_name> stain for selected tissue block.


rna/

transcript_table.csv.gz: CSV file containing a transcript table showing all demuxed transcripts on the whole tissue block. Contains coordinate information, z-layer, gene identity, and cell_id fields. All transcripts here are high confidence transcripts post-filtering and processing.


segmentation/

segmentation_mask.npz: Compressed numpy array file containing the segmentation mask. This can be easily read with the numpy.load() function.


single_cell_data/

cell_by_protein.csv.gz: Gzipped CSV file in a cell x protein intensity format. Each entry in the table is the average protein intensity for a given protein in a given cell. Multiomics runs only.

cell_by_transcript.csv.gz: Gzipped CSV file in a cell x transcript format. Each entry in the table is the counts for a given transcript in a given cell.

cell_metadata.csv.gz: Gzipped CSV file containing the metadata associated with each cell, including cell_id, protein mean intensity, and transcript counts per cell, per transcript species. This is needed to launch a Seurat object and perform downstream analyses. For more information, see data import.

Expand to see column descriptions
name type description
label str Cell ID
<protein>_intensity_mean float Fluorescence intensity mean for a given protein
cell_id str Cell ID for a given cell
cell_x/y float Spatial X/Y coordinate for the nuclear segmentation centroid
expanded_cell_x/y float Spatial X/Y coordinate for the expanded nuclear segmentation centroid
log1p_n_genes_by_counts float Log number of unique genes detected
log1p_total_counts float Log number of total transcripts
n_genes_by_counts int Number of unique genes detected
nuclei_area int Area of the nuclear segmentation
nuclei_expanded_area int Area of the expanded nuclear segmentation
total_counts int Total transcript counts

clustering_umap.csv.gz: Gzipped CSV file containing the matrix of cell cluster annotations and UMAP coordinates for each cell. This is used to visualize the clustering of cells in 2D for a given leiden resolution or UMAP embedding setting.

Expand to see column descriptions
column type description
label str Cell ID
leiden_<resolution> int Leiden cluster identity for the cell at the specified resolution (0.2-1.0)
X_umap_<min_dist>_<spread>_<axis> float UMAP coordinate for the given cell with the given min_dist and spread for a given axis (1 is typically x/UMAP1, 2 is typically y/UMAP2)

dgex.csv.gz: Gzipped CSV file containing the differential gene expression (DGEx) results for the selected tissue block. Columns are detailed below.

Expand to see column descriptions
column type description
names str Gene symbol
scores float Z-score from Wilcoxon rank-sum test
logfoldchanges float LogFoldChange for the given cluster compared to all other clusters combined
pvals float P-value from Wilcoxon rank-sum test
pvals_adj float Adjusted P-value
pct_nz_group float Percentage of non-zero values in the given cluster.
pct_nz_reference float Percentage of non-zero values in all cells outside the given cluster.
group int Leiden cluster identity
leiden_res str Leiden clustering resolution that this entry is derived from (1 per gene per cluster)

feature_matrix.h5: H5ad file containing the full cell by gene matrix as well as a wide array of metadata and annotations you might want to use for downstream analysis. This file can be loaded into Python and run through scanpy or a number of other pipelines. See data import. The annotations in this file are detailed below.

Expand to see annotation layer descriptions
name layer str dimensions description
<protein>_intensity_mean float obs ncell x 1 Fluorescence intensity mean for a given protein per cell
cell_id str obs ncell x 1 Cell ID for a given cell
cell_x/y float obs ncell x 1 Spatial X/Y coordinate for the nuclear segmentation centroid
expanded_cell_x/y float obs ncell x 1 Spatial X/Y coordinate for the expanded nuclear segmentation centroid per cell
log1p_n_genes_by_counts float obs ncell x 1 Log number of unique genes detected per cell
log1p_total_counts float obs ncell x 1 Log number of total transcripts per cell
n_genes_by_counts int obs ncell x 1 Number of unique genes detected per cell
nuclei_area float obs ncell x 1 Area of the nuclear segmentation per cell
nuclei_expanded_area float obs ncell x 1 Area of the expanded nuclear segmentation per cell
total_counts int obs ncell x 1 Total transcript counts per cell
gene_id str var ngenes x 1 Gene symbol
log1p_mean_counts float var ngenes x 1 Log mean transcript counts across all cells
log1p_total_counts float var ngenes x 1 Log total transcript counts across all cells
mean_counts float var ngenes x 1 Mean counts of each transcript across all cells
modality str var ngenes x 1 G4X modality (transcript or protein)
n_cells_by_counts int var ngenes x 1 Number of cells with counts of each transcript
pct_dropout_by_counts float var ngenes x 1 Percentage of zero-count cells for each gene
probe_type str var ngenes x 1 Type of probe: Negative control probe/sequence (NCP/NCS) or transcript targeting (targeting)
total_counts int var ngenes x 1 Total transcript counts per gene