Jupyter Notebook Binder

Flow cytometry#

Flow cytometry is a technique used to detect and measure physical and chemical characteristics of a population of cells or particles (wiki).

Here, we’ll walk through how to

  1. iteratively ingest datasets

  2. query, search, integrate & analyze datasets

Setup#

!lamin init --storage ./test-flow --schema bionty
Hide code cell output
✅ saved: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-09-22 18:45:27)
✅ saved: Storage(id='6o0SZiBV', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-flow', type='local', updated_at=2023-09-22 18:45:27, created_by_id='DzTjkKse')
💡 loaded instance: testuser1/test-flow
💡 did not register local instance on hub (if you want, call `lamin register`)

import lamindb as ln
import lnschema_bionty as lb
import readfcs

lb.settings.species = "human"
💡 loaded instance: testuser1/test-flow (lamindb 0.54.1)
ln.track()
💡 notebook imports: lamindb==0.54.1 lnschema_bionty==0.31.2 pytometry==0.1.4 readfcs==1.1.6 scanpy==1.9.5
💡 Transform(id='OWuTtS4SAponz8', name='Flow cytometry', short_name='facs', version='0', type=notebook, updated_at=2023-09-22 18:45:29, created_by_id='DzTjkKse')
💡 Run(id='ZigRiMxLUv91mJc88S2a', run_at=2023-09-22 18:45:29, transform_id='OWuTtS4SAponz8', created_by_id='DzTjkKse')

Ingest a first file#

Access #

We start with a flow cytometry file from Alpert et al., Nat. Med. (2019).

Calling the following function downloads the file and pre-populates a few relevant registries:

ln.dev.datasets.file_fcs_alpert19(populate_registries=True)
PosixPath('Alpert19.fcs')

We use readfcs to read the raw fcs file into memory:

adata = readfcs.read("Alpert19.fcs")
adata
AnnData object with n_obs × n_vars = 166537 × 40
    var: 'n', 'channel', 'marker', '$PnB', '$PnE', '$PnR'
    uns: 'meta'

Transform: normalize #

In this use case, we’d like to ingest & store curated data, and hence, we split signal and normalize using the pytometry package.

import pytometry as pm
2023-09-22 18:45:32,681:INFO - Failed to extract font properties from /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf: In FT2Font: Can not load face (unknown file format; error code 0x2)
2023-09-22 18:45:32,785:INFO - generated new fontManager
pm.pp.split_signal(adata, var_key="channel")
'area' is not in adata.var['signal_type']. Return all.

pm.tl.normalize_arcsinh(adata, cofactor=150)

Validate: cell markers #

First, we validate features in .var using CellMarker:

validated = lb.CellMarker.validate(adata.var.index)
13 terms (32.50%) are not validated for name: Time, Cell_length, Dead, (Ba138)Dd, Bead, CD19, CD4, IgD, CD11b, CD14, CCR6, CCR7, PD-1

We see that many features aren’t validated because they’re not standardized.

Hence, let’s standardize feature names & validate again:

adata.var.index = lb.CellMarker.standardize(adata.var.index)
validated = lb.CellMarker.validate(adata.var.index)
5 terms (12.50%) are not validated for name: Time, Cell_length, Dead, (Ba138)Dd, Bead

The remaining non-validated features don’t appear to be cell markers but rather metadata features.

Let’s move them into adata.obs:

adata.obs = adata[:, ~validated].to_df()
adata = adata[:, validated].copy()

Now we have a clean panel of 35 validated cell markers:

validated = lb.CellMarker.validate(adata.var.index)
assert all(validated)  # all markers are validated

Register: metadata #

Next, let’s register the metadata features we moved to .obs.

For this, we create one feature record for each column in the .obs dataframe:

features = ln.Feature.from_df(adata.obs)
ln.save(features)

We use the Experimental Factor Ontology through Bionty to create a “FACS” label for the dataset:

lb.ExperimentalFactor.bionty().search("FACS").head(2)  # search the public ontology
ontology_id definition synonyms parents molecule instrument measurement __ratio__
name
fluorescence-activated cell sorting EFO:0009108 A Flow Cytometry Assay That Provides A Method ... FACS|FAC sorting [] None None None 100.0
BALB/c EFO:0000602 Balb/C Is A Mouse Strain Of Albion Mice. BALB/cJ|C|BALBc [] None None None 90.0
# import the record from the public ontology and save it to the registry
lb.ExperimentalFactor.from_bionty(ontology_id="EFO:0009108").save()

# show the content of the registry
lb.ExperimentalFactor.filter().df()
name ontology_id abbr synonyms description molecule instrument measurement bionty_source_id updated_at created_by_id
id
lh5Cxy8w fluorescence-activated cell sorting EFO:0009108 None FACS|FAC sorting A Flow Cytometry Assay That Provides A Method ... None None None JpME 2023-09-22 18:45:35 DzTjkKse

Register: register data & annotate with metadata #

modalities = ln.Modality.lookup()
features = ln.Feature.lookup()
efs = lb.ExperimentalFactor.lookup()
species = lb.Species.lookup()
file = ln.File.from_anndata(
    adata, description="Alpert19", field=lb.CellMarker.name, modality=modalities.protein
)
... storing '$PnE' as categorical
... storing '$PnR' as categorical
file.save()

Annotate by linking FACS & species labels:

file.labels.add(efs.fluorescence_activated_cell_sorting, features.assay)
file.labels.add(species.human, features.species)

Inspect the registered file#

Inspect features on a high level:

file.features
Features:
  var: FeatureSet(id='whWYSbxEQMkDDLWfzRqW', n=35, type='number', registry='bionty.CellMarker', hash='ldY9_GmptHLCcT7Nrpgo', updated_at=2023-09-22 18:45:36, modality_id='QazFJwIU', created_by_id='DzTjkKse')
    'HLADR', 'Ccr7', 'Ccr6', 'ICOS', 'CXCR5', 'CD94', 'CD45RA', 'TCRgd', 'CD85j', 'Cd19', ...
  obs: FeatureSet(id='z1EbAOOPj9ikQIZZ8FOU', n=5, registry='core.Feature', hash='gvBjKhBZpGxfAQtdR05j', updated_at=2023-09-22 18:45:36, modality_id='MCGNJ0dW', created_by_id='DzTjkKse')
    Dead (number)
    Time (number)
    Cell_length (number)
    (Ba138)Dd (number)
    Bead (number)
  external: FeatureSet(id='sJyqRvhd6aNGeCQ5zlQa', n=2, registry='core.Feature', hash='uQDOjKt06ucK0_YIQPAV', updated_at=2023-09-22 18:45:36, modality_id='MCGNJ0dW', created_by_id='DzTjkKse')
    🔗 assay (1, bionty.ExperimentalFactor): 'fluorescence-activated cell sorting'
    🔗 species (1, bionty.Species): 'human'

Inspect low-level features in .var:

file.features["var"].df().head()
name synonyms gene_symbol ncbi_gene_id uniprotkb_id species_id bionty_source_id updated_at created_by_id
id
k0zGbSgZEX3q HLADR HLA‐DR|HLA-DR|HLA DR None None None uHJU vwab 2023-09-22 18:45:32 DzTjkKse
sYcK7uoWCtco Ccr7 CCR7 1236 P32248 uHJU vwab 2023-09-22 18:45:32 DzTjkKse
fpPkjlGv15C9 Ccr6 CCR6 1235 P51684 uHJU vwab 2023-09-22 18:45:32 DzTjkKse
0vAls2cmLKWq ICOS ICOS 29851 Q53QY6 uHJU vwab 2023-09-22 18:45:32 DzTjkKse
4uiPHmCPV5i1 CXCR5 CXCR5 643 A0N0R2 uHJU vwab 2023-09-22 18:45:32 DzTjkKse

Use auto-complete for marker names:

markers = file.features["var"].lookup()
import scanpy as sc

sc.pp.pca(adata)
sc.pl.pca(adata, color=markers.cd14.name)
https://d33wubrfki0l68.cloudfront.net/6cf1cff3b13837e5b37022281f4e5617009f2fb7/1d71f/_images/6028acd809d1a761a952221d3c0d91f30c5ae89a51328d053674a3a362a3d248.png