G4X data import

The multi-modal output of the G4X spatial sequencer comprises images, tables and annotated data matrices, which allow deep exploration of your sample. The single_cell_data folder in the G4X output contains the final processed form of the data, after transcript and image signals have been aggregated for each each segmented cell. There are several excellent open-source tools available that enable the full stack of analytical needs to gain biological insight from this data.

Below we illustrate data import strategies for Python and R users:

if you are working in Python

Some excellent tools are:

scanpy: full-featured single-cell analysis suite
rapids-singlecell: expands scanpy with GPU-support
squidpy: functionality for the analysis of spatial data
spatial-data: analysis and management of spatial data (requires data conversion)

All of these tools accept, or incorporate anndata objects as the basic representation of your single-cell data. There are several ways to load your data as an anndata object into your Python session. Below we will illustrate a few methods that will produce equivalent outputs.

1. G4X-helpers

If you have G4X-helpers installed, you can use the G4Xoutput() class to access the anndata of your sample via the load_adata() method.

import g4x_helpers as g4x

run_base = '/path/to/g4x_output/sample_x1'

sample = g4x.G4Xoutput(run_base=run_base)
adata = sample.load_adata(remove_nontargeting=False, load_clustering=False)

Note: load_adata() options

Two options are provided that will impact what will be loaded. In the above example we are overriding the default so that the output matches the other methods.
remove_nontargeting: bool (default=True)
load_clustering: bool (default=True)

2. scanpy

You can achieve the same result by pointing scanpy's read_h5ad() function the feature_matrix.h5 in your G4X output directory

from pathlib import Path
import scanpy as sc

run_base = Path('/path/to/g4x_output/sample_x1')
ad_file = run_base / 'single_cell_data' / 'feature_matrix.h5'

adata = sc.read_h5ad(ad_file)

Note

If you have G4X-helpers installed, then scanpy will be available as one of its dependencies. If not, please refer to the scanpy documentation for installation guides.

3. building from raw data

If you do not wish to use the pre-generated feature_matrix.h5 you can replicate a similar object by reading the counts and cell metadata into an anndata object.

from pathlib import Path
import anndata as ad
import pandas as pd
from scipy import sparse

run_base = Path('/path/to/g4x_output/sample_x1')

txcounts_path = run_base / 'single_cell_data' / 'cell_by_transcript.csv.gz'
metadata_path = run_base / 'single_cell_data' / 'cell_metadata.csv.gz'

counts = pd.read_csv(txcounts_path, index_col='label')
X = sparse.csr_matrix(counts.values)

metadata = pd.read_csv(metadata_path, index_col='label')

adata = ad.AnnData(X=X, obs=metadata)
adata.var_names = counts.columns

Note

If you have G4X-helpers installed, then anndata, pandas and scipy will be available as dependencies.

if you are working in R

The most feature rich package for single-cell analysis in R is:

Seurat: full-featured single-cell analysis suite with spatial analysis capabilities

To work with your data in Seurat, it needs to be loaded into a SeuratObject, which is an annotated data structure.

library('Seurat')

run_base = c('/path/to/g4x_output/sample_x1')

txcounts_path = file.path(run_base, "single_cell_data/cell_by_transcript.csv.gz")
metadata_path = file.path(run_base, "single_cell_data/cell_metadata.csv.gz")

counts <- read.csv(txcounts_path, row.names = 1) 
counts <- t(counts) # transpose to match Seurat input

metadata <- read.csv(metadata_path, row.names = 1) 

sobj <- CreateSeuratObject(counts = counts, meta.data = metadata)

⸻