scRNA-seq#
scRNA-seq measures gene expression of individual cells.
Their analysis is typically based on data objects like AnnData, SingleCellExperiment & Seurat objects.
These objects often contain non-validated metadata, making data integration & interpretation hard.
In this notebook, LaminDB is used to turn AnnData
objects into validated & queryable assets.
Setup#
!lamin init --storage ./test-scrna --schema bionty
Show code cell output
β
saved: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-09-22 18:43:43)
β
saved: Storage(id='yNdwkjSP', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna', type='local', updated_at=2023-09-22 18:43:43, created_by_id='DzTjkKse')
π‘ loaded instance: testuser1/test-scrna
π‘ did not register local instance on hub (if you want, call `lamin register`)
import lamindb as ln
import lnschema_bionty as lb
import pandas as pd
ln.track()
π‘ loaded instance: testuser1/test-scrna (lamindb 0.54.1)
π‘ notebook imports: lamindb==0.54.1 lnschema_bionty==0.31.2 pandas==1.5.3
π‘ Transform(id='Nv48yAceNSh8z8', name='scRNA-seq', short_name='scrna', version='0', type=notebook, updated_at=2023-09-22 18:43:45, created_by_id='DzTjkKse')
π‘ Run(id='Nv39dIk0xeRfAOZAwfvB', run_at=2023-09-22 18:43:45, transform_id='Nv48yAceNSh8z8', created_by_id='DzTjkKse')
Human immune cells: Conde22#
lb.settings.species = "human"
Access #
Letβs look at a scRNA-seq count matrix in form of an AnnData
object that weβd like to ingest into LaminDB:
adata = ln.dev.datasets.anndata_human_immune_cells(
populate_registries=True # this pre-populates registries
)
adata
AnnData object with n_obs Γ n_vars = 1648 Γ 36503
obs: 'donor', 'tissue', 'cell_type', 'assay'
var: 'feature_is_filtered', 'feature_reference', 'feature_biotype'
uns: 'cell_type_ontology_term_id_colors', 'default_embedding', 'schema_version', 'title'
obsm: 'X_umap'
This AnnData
object does not require filtering, normalizing or formatting, hence, there is no step.
Validate #
Validate genes in .var
#
lb.Gene.validate(adata.var.index, lb.Gene.ensembl_gene_id);
β 148 terms (0.40%) are not validated for ensembl_gene_id: ENSG00000269933, ENSG00000261737, ENSG00000259834, ENSG00000256374, ENSG00000263464, ENSG00000203812, ENSG00000272196, ENSG00000272880, ENSG00000270188, ENSG00000287116, ENSG00000237133, ENSG00000224739, ENSG00000227902, ENSG00000239467, ENSG00000272551, ENSG00000280374, ENSG00000236886, ENSG00000229352, ENSG00000286601, ENSG00000227021, ...
148 gene identifiers canβt be validated (not currently in the Gene
registry). Ltβs inspect them to see what to do:
inspector = lb.Gene.inspect(adata.var.index, lb.Gene.ensembl_gene_id)
Show code cell output
β 148 terms (0.40%) are not validated for ensembl_gene_id: ENSG00000269933, ENSG00000261737, ENSG00000259834, ENSG00000256374, ENSG00000263464, ENSG00000203812, ENSG00000272196, ENSG00000272880, ENSG00000270188, ENSG00000287116, ENSG00000237133, ENSG00000224739, ENSG00000227902, ENSG00000239467, ENSG00000272551, ENSG00000280374, ENSG00000236886, ENSG00000229352, ENSG00000286601, ENSG00000227021, ...
detected 35 Gene terms in Bionty for ensembl_gene_id: 'ENSG00000198786', 'ENSG00000276760', 'ENSG00000274175', 'ENSG00000198840', 'ENSG00000198712', 'ENSG00000273554', 'ENSG00000198899', 'ENSG00000278384', 'ENSG00000276017', 'ENSG00000276345', 'ENSG00000198886', 'ENSG00000276256', 'ENSG00000273748', 'ENSG00000278633', 'ENSG00000198727', 'ENSG00000198938', 'ENSG00000275063', 'ENSG00000271254', 'ENSG00000268674', 'ENSG00000228253', ...
β add records from Bionty to your Gene registry via .from_values()
couldn't validate 113 terms: 'ENSG00000237133', 'ENSG00000277352', 'ENSG00000261737', 'ENSG00000224745', 'ENSG00000273923', 'ENSG00000261068', 'ENSG00000203812', 'ENSG00000236996', 'ENSG00000287388', 'ENSG00000228139', 'ENSG00000273496', 'ENSG00000226377', 'ENSG00000270672', 'ENSG00000236886', 'ENSG00000232295', 'ENSG00000272551', 'ENSG00000278927', 'ENSG00000273301', 'ENSG00000227220', 'ENSG00000239665', ...
β if you are sure, create new records via ln.Gene() and save to your registry
Logging says 35 of the non-validated ids can be found in the Bionty reference. Letβs register them:
records = lb.Gene.from_values(inspector.non_validated, lb.Gene.ensembl_gene_id)
ln.save(records)
Show code cell output
β did not create Gene records for 113 non-validated ensembl_gene_ids: 'ENSG00000112096', 'ENSG00000182230', 'ENSG00000203812', 'ENSG00000204092', 'ENSG00000215271', 'ENSG00000221995', 'ENSG00000224739', 'ENSG00000224745', 'ENSG00000225932', 'ENSG00000226377', 'ENSG00000226380', 'ENSG00000226403', 'ENSG00000227021', 'ENSG00000227220', 'ENSG00000227902', 'ENSG00000228139', 'ENSG00000228906', 'ENSG00000229352', 'ENSG00000231575', 'ENSG00000232196', ...
The remaining 113 are legacy IDs, not present in the current Ensembl assembly (e.g. ENSG00000112096).
Weβd still like to register them:
validated = lb.Gene.validate(adata.var.index, lb.Gene.ensembl_gene_id)
records = [lb.Gene(ensembl_gene_id=id) for id in adata.var.index[~validated]]
ln.save(records)
Show code cell output
β 113 terms (0.30%) are not validated for ensembl_gene_id: ENSG00000269933, ENSG00000261737, ENSG00000259834, ENSG00000256374, ENSG00000263464, ENSG00000203812, ENSG00000272196, ENSG00000272880, ENSG00000270188, ENSG00000287116, ENSG00000237133, ENSG00000224739, ENSG00000227902, ENSG00000239467, ENSG00000272551, ENSG00000280374, ENSG00000236886, ENSG00000229352, ENSG00000286601, ENSG00000227021, ...
Now all genes pass validation:
lb.Gene.validate(adata.var.index, lb.Gene.ensembl_gene_id);
Validate metadata in .obs
#
adata.obs.columns
Index(['donor', 'tissue', 'cell_type', 'assay'], dtype='object')
ln.Feature.validate(adata.obs.columns);
β 1 term (25.00%) is not validated for name: donor
1 feature is not validated: "donor"
. Letβs register it:
Tip
Use features = ln.Feature.from_df(df)
to bulk create features with types.
feature = ln.Feature(name="donor", type="category", registries=[ln.ULabel])
ln.save(feature)
All metadata columns are now validated as feature:
ln.Feature.validate(adata.obs.columns);
Next, letβs validate the corresponding labels of each feature.
Some of the metadata labels can be typed using dedicated registries like CellType
:
validated = lb.CellType.validate(adata.obs.cell_type)
β received 32 unique terms, 1616 empty/duplicated terms are ignored
β 2 terms (6.20%) are not validated for name: germinal center B cell, megakaryocyte
Register non-validated cell types - they can all be loaded from a public ontology through Bionty:
records = lb.CellType.from_values(adata.obs.cell_type[~validated], "name")
ln.save(records)
β now recursing through parents: this only happens once, but is much slower than bulk saving
lb.ExperimentalFactor.validate(adata.obs.assay)
lb.Tissue.validate(adata.obs.tissue);
Because we didnβt mount a custom schema that contains a Donor
registry, we use the ULabel
registry to track donor ids:
ln.ULabel.validate(adata.obs.donor);
β received 12 unique terms, 1636 empty/duplicated terms are ignored
β 12 terms (100.00%) are not validated for name: D496, 621B, A29, A36, A35, 637C, A52, A37, D503, 640C, A31, 582C
Donor labels are not validated, so letβs register them:
donors = [ln.ULabel(name=name) for name in adata.obs.donor.unique()]
ln.save(donors)
ln.ULabel.validate(adata.obs.donor);
Register #
modalities = ln.Modality.lookup()
experimental_factors = lb.ExperimentalFactor.lookup()
species = lb.Species.lookup()
features = ln.Feature.lookup()
Register data#
When we create a File
object from an AnnData
, weβll automatically link its feature sets and get information about unmapped categories:
file = ln.File.from_anndata(
adata, description="Conde22", field=lb.Gene.ensembl_gene_id, modality=modalities.rna
)
file.save()
The file has the following 2 linked feature sets:
file.features
Features:
var: FeatureSet(id='2gQIre5ht93RP9Br7AxJ', n=36503, type='number', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', updated_at=2023-09-22 18:44:16, modality_id='YVd1fHWO', created_by_id='DzTjkKse')
'LINC01088', 'AP2S1', 'ADSL', 'USP16', 'None', 'None', 'SCAT2', 'ZNF45-AS1', 'LINC02132', 'XIRP2-AS1', ...
obs: FeatureSet(id='ACQDyVarceSpQOe20uFE', n=4, registry='core.Feature', hash='Pku8H0niKZ8uYnQMyx1J', updated_at=2023-09-22 18:44:21, modality_id='zaCpJM7g', created_by_id='DzTjkKse')
π tissue (0, bionty.Tissue):
π donor (0, core.ULabel):
π cell_type (0, bionty.CellType):
π assay (0, bionty.ExperimentalFactor):
Register metadata links#
Let us first link external labels for the entire file:
file.labels.add(species.human, feature=features.species)
file.labels.add(experimental_factors.single_cell_rna_sequencing, feature=features.assay)
Next, we parse the columns of adata.obs
for additional metadata:
file.labels.add(adata.obs.cell_type, feature=features.cell_type)
file.labels.add(adata.obs.assay, feature=features.assay)
file.labels.add(adata.obs.tissue, feature=features.tissue)
file.labels.add(adata.obs.donor, feature=features.donor)
file.features
Features:
var: FeatureSet(id='2gQIre5ht93RP9Br7AxJ', n=36503, type='number', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', updated_at=2023-09-22 18:44:16, modality_id='YVd1fHWO', created_by_id='DzTjkKse')
'LINC01088', 'AP2S1', 'ADSL', 'USP16', 'None', 'None', 'SCAT2', 'ZNF45-AS1', 'LINC02132', 'XIRP2-AS1', ...
obs: FeatureSet(id='ACQDyVarceSpQOe20uFE', n=4, registry='core.Feature', hash='Pku8H0niKZ8uYnQMyx1J', updated_at=2023-09-22 18:44:21, modality_id='zaCpJM7g', created_by_id='DzTjkKse')
π tissue (17, bionty.Tissue): 'caecum', 'bone marrow', 'lung', 'thymus', 'liver', 'mesenteric lymph node', 'lamina propria', 'jejunal epithelium', 'duodenum', 'thoracic lymph node', ...
π donor (12, core.ULabel): '582C', 'A35', 'D503', 'A29', 'A52', '640C', 'A31', 'D496', '621B', 'A36', ...
π cell_type (32, bionty.CellType): 'gamma-delta T cell', 'mast cell', 'non-classical monocyte', 'plasmablast', 'megakaryocyte', 'CD8-positive, alpha-beta memory T cell, CD45RO-positive', 'mucosal invariant T cell', 'plasmacytoid dendritic cell', 'progenitor cell', 'CD16-positive, CD56-dim natural killer cell, human', ...
π assay (4, bionty.ExperimentalFactor): 'single-cell RNA sequencing', '10x 5' v1', '10x 5' v2', '10x 3' v3'
external: FeatureSet(id='3z0VKYNNBxCC5evkC2px', n=1, registry='core.Feature', hash='0GejzMerWce6UTpUqz6i', updated_at=2023-09-22 18:44:22, modality_id='zaCpJM7g', created_by_id='DzTjkKse')
π species (1, bionty.Species): 'human'
The file is now queryable by everything we linked:
file.describe()
File(id='GzA3KMdHzowOYsClkbvy', suffix='.h5ad', accessor='AnnData', description='Conde22', size=28049505, hash='WEFcMZxJNmMiUOFrcSTaig', hash_type='md5', updated_at=2023-09-22 18:44:21)
Provenance:
ποΈ storage: Storage(id='yNdwkjSP', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna', type='local', updated_at=2023-09-22 18:43:43, created_by_id='DzTjkKse')
π« transform: Transform(id='Nv48yAceNSh8z8', name='scRNA-seq', short_name='scrna', version='0', type=notebook, updated_at=2023-09-22 18:44:16, created_by_id='DzTjkKse')
π£ run: Run(id='Nv39dIk0xeRfAOZAwfvB', run_at=2023-09-22 18:43:45, transform_id='Nv48yAceNSh8z8', created_by_id='DzTjkKse')
π€ created_by: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-09-22 18:43:43)
Features:
var: FeatureSet(id='2gQIre5ht93RP9Br7AxJ', n=36503, type='number', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', updated_at=2023-09-22 18:44:16, modality_id='YVd1fHWO', created_by_id='DzTjkKse')
'LINC01088', 'AP2S1', 'ADSL', 'USP16', 'None', 'None', 'SCAT2', 'ZNF45-AS1', 'LINC02132', 'XIRP2-AS1', ...
obs: FeatureSet(id='ACQDyVarceSpQOe20uFE', n=4, registry='core.Feature', hash='Pku8H0niKZ8uYnQMyx1J', updated_at=2023-09-22 18:44:21, modality_id='zaCpJM7g', created_by_id='DzTjkKse')
π tissue (17, bionty.Tissue): 'caecum', 'bone marrow', 'lung', 'thymus', 'liver', 'mesenteric lymph node', 'lamina propria', 'jejunal epithelium', 'duodenum', 'thoracic lymph node', ...
π donor (12, core.ULabel): '582C', 'A35', 'D503', 'A29', 'A52', '640C', 'A31', 'D496', '621B', 'A36', ...
π cell_type (32, bionty.CellType): 'gamma-delta T cell', 'mast cell', 'non-classical monocyte', 'plasmablast', 'megakaryocyte', 'CD8-positive, alpha-beta memory T cell, CD45RO-positive', 'mucosal invariant T cell', 'plasmacytoid dendritic cell', 'progenitor cell', 'CD16-positive, CD56-dim natural killer cell, human', ...
π assay (4, bionty.ExperimentalFactor): 'single-cell RNA sequencing', '10x 5' v1', '10x 5' v2', '10x 3' v3'
external: FeatureSet(id='3z0VKYNNBxCC5evkC2px', n=1, registry='core.Feature', hash='0GejzMerWce6UTpUqz6i', updated_at=2023-09-22 18:44:22, modality_id='zaCpJM7g', created_by_id='DzTjkKse')
π species (1, bionty.Species): 'human'
Labels:
π·οΈ species (1, bionty.Species): 'human'
π·οΈ tissues (17, bionty.Tissue): 'caecum', 'bone marrow', 'lung', 'thymus', 'liver', 'mesenteric lymph node', 'lamina propria', 'jejunal epithelium', 'duodenum', 'thoracic lymph node', ...
π·οΈ cell_types (32, bionty.CellType): 'gamma-delta T cell', 'mast cell', 'non-classical monocyte', 'plasmablast', 'megakaryocyte', 'CD8-positive, alpha-beta memory T cell, CD45RO-positive', 'mucosal invariant T cell', 'plasmacytoid dendritic cell', 'progenitor cell', 'CD16-positive, CD56-dim natural killer cell, human', ...
π·οΈ experimental_factors (4, bionty.ExperimentalFactor): 'single-cell RNA sequencing', '10x 5' v1', '10x 5' v2', '10x 3' v3'
π·οΈ ulabels (12, core.ULabel): '582C', 'A35', 'D503', 'A29', 'A52', '640C', 'A31', 'D496', '621B', 'A36', ...
A less well curated dataset#
Access #
Letβs now consider a dataset with less-well curated features:
pbmc68k = ln.dev.datasets.anndata_pbmc68k_reduced()
pbmc68k
AnnData object with n_obs Γ n_vars = 70 Γ 765
obs: 'cell_type', 'n_genes', 'percent_mito', 'louvain'
var: 'n_counts', 'highly_variable'
uns: 'louvain', 'louvain_colors', 'neighbors', 'pca'
obsm: 'X_pca', 'X_umap'
varm: 'PCs'
obsp: 'connectivities', 'distances'
We see that this dataset is indexed by gene symbols:
pbmc68k.var.head()
n_counts | highly_variable | |
---|---|---|
index | ||
HES4 | 1153.387451 | True |
TNFRSF4 | 304.358154 | True |
SSU72 | 2530.272705 | False |
PARK7 | 7451.664062 | False |
RBP7 | 272.811035 | True |
Validate #
lb.Gene.validate(pbmc68k.var.index, lb.Gene.symbol);
β 70 terms (9.20%) are not validated for symbol: ATPIF1, C1orf228, CCBL2, RP11-782C8.1, RP11-277L2.3, RP11-156E8.1, AC079767.4, GPX1, H1FX, SELT, ATP5I, IGJ, CCDC109B, FYB, H2AFY, FAM65B, HIST1H4C, HIST1H1E, ZNRD1, C6orf48, ...
lb.Gene.inspect(pbmc68k.var.index, lb.Gene.symbol);
Show code cell output
β 70 terms (9.20%) are not validated for symbol: ATPIF1, C1orf228, CCBL2, RP11-782C8.1, RP11-277L2.3, RP11-156E8.1, AC079767.4, GPX1, H1FX, SELT, ATP5I, IGJ, CCDC109B, FYB, H2AFY, FAM65B, HIST1H4C, HIST1H1E, ZNRD1, C6orf48, ...
detected 54 terms with synonyms: ATPIF1, C1orf228, CCBL2, AC079767.4, H1FX, SELT, ATP5I, IGJ, CCDC109B, FYB, H2AFY, FAM65B, HIST1H4C, HIST1H1E, ZNRD1, C6orf48, SEPT7, WBSCR22, RSBN1L-AS1, CCDC132, ...
β standardize terms via .standardize()
detected 5 Gene terms in Bionty for symbol: 'RN7SL1', 'SOD2', 'IGLL5', 'SNORD3B-2', 'GPX1'
β add records from Bionty to your Gene registry via .from_values()
couldn't validate 11 terms: 'RP11-291B21.2', 'RP11-277L2.3', 'CTD-3138B18.5', 'RP11-620J15.3', 'RP11-782C8.1', 'TMBIM4-1', 'RP11-390E23.6', 'AC084018.1', 'RP3-467N11.1', 'RP11-489E7.4', 'RP11-156E8.1'
β if you are sure, create new records via ln.Gene() and save to your registry
Standardize symbols and register additional symbols from Bionty:
pbmc68k.var.index = lb.Gene.standardize(pbmc68k.var.index, lb.Gene.symbol)
gene_records = lb.Gene.from_values(pbmc68k.var.index, lb.Gene.symbol)
ln.save(gene_records)
Show code cell output
β did not create Gene records for 11 non-validated symbols: 'AC084018.1', 'CTD-3138B18.5', 'RP11-156E8.1', 'RP11-277L2.3', 'RP11-291B21.2', 'RP11-390E23.6', 'RP11-489E7.4', 'RP11-620J15.3', 'RP11-782C8.1', 'RP3-467N11.1', 'TMBIM4-1'
In this case, we only want to register data with validated genes:
validated = lb.Gene.validate(pbmc68k.var.index, lb.Gene.symbol)
pbmc68k_validated = pbmc68k[:, validated].copy()
Show code cell output
β 11 terms (1.40%) are not validated for symbol: RP11-782C8.1, RP11-277L2.3, RP11-156E8.1, RP3-467N11.1, RP11-390E23.6, RP11-489E7.4, RP11-291B21.2, RP11-620J15.3, TMBIM4-1, AC084018.1, CTD-3138B18.5
Convert gene symbols into ensembl gene ids:
records = lb.Gene.filter(id__in=[record.id for record in gene_records])
mapper = pd.DataFrame(records.values_list("symbol", "ensembl_gene_id")).set_index(0)[1]
pbmc68k_validated.var.insert(0, "gene_symbol", pbmc68k_validated.var.index)
pbmc68k_validated.var.rename(index=mapper, inplace=True)
pbmc68k_validated.var.head()
gene_symbol | n_counts | highly_variable | |
---|---|---|---|
ENSG00000188290 | HES4 | 1153.387451 | True |
ENSG00000186827 | TNFRSF4 | 304.358154 | True |
ENSG00000160075 | SSU72 | 2530.272705 | False |
ENSG00000116288 | PARK7 | 7451.664062 | False |
ENSG00000162444 | RBP7 | 272.811035 | True |
Validate cell types:
# inspect shows none of the terms are mappable
lb.CellType.inspect(pbmc68k_validated.obs.cell_type)
# here we search the cell type names from the public ontology and grab the top match
# then add the cell type names from the pbmc68k as synonyms
celltype_bt = lb.CellType.bionty()
ontology_ids = []
mapper = {}
for ct in pbmc68k_validated.obs.cell_type.unique():
ontology_id = celltype_bt.search(ct).iloc[0].ontology_id
record = lb.CellType.from_bionty(ontology_id=ontology_id)
mapper[ct] = record.name
record.save()
record.add_synonym(ct)
# standardize cell type names in the dataset
pbmc68k_validated.obs.cell_type = pbmc68k_validated.obs.cell_type.map(mapper)
Show code cell output
β received 9 unique terms, 61 empty/duplicated terms are ignored
β 9 terms (100.00%) are not validated for name: Dendritic cells, CD19+ B, CD4+/CD45RO+ Memory, CD8+ Cytotoxic T, CD4+/CD25 T Reg, CD14+ Monocytes, CD56+ NK, CD8+/CD45RA+ Naive Cytotoxic, CD34+
couldn't validate 9 terms: 'CD8+/CD45RA+ Naive Cytotoxic', 'CD4+/CD45RO+ Memory', 'CD19+ B', 'CD4+/CD25 T Reg', 'Dendritic cells', 'CD56+ NK', 'CD14+ Monocytes', 'CD34+', 'CD8+ Cytotoxic T'
β if you are sure, create new records via ln.CellType() and save to your registry
β now recursing through parents: this only happens once, but is much slower than bulk saving
β now recursing through parents: this only happens once, but is much slower than bulk saving
β now recursing through parents: this only happens once, but is much slower than bulk saving
β now recursing through parents: this only happens once, but is much slower than bulk saving
β now recursing through parents: this only happens once, but is much slower than bulk saving
Now, all cell types are validated:
lb.CellType.validate(pbmc68k_validated.obs.cell_type);
Register #
file = ln.File.from_anndata(
pbmc68k_validated,
description="10x reference pbmc68k",
field=lb.Gene.ensembl_gene_id,
modality=modalities.rna,
)
Show code cell output
β 3 terms (75.00%) are not validated for name: n_genes, percent_mito, louvain
file.save()
file.labels.add(pbmc68k_validated.obs.cell_type, features.cell_type)
file.labels.add(species.human, feature=features.species)
file.labels.add(experimental_factors.single_cell_rna_sequencing, feature=features.assay)
file.features
Features:
var: FeatureSet(id='GglELLiZwTYIyev6GwOp', n=754, type='number', registry='bionty.Gene', hash='WMDxN7253SdzGwmznV5d', updated_at=2023-09-22 18:44:45, modality_id='YVd1fHWO', created_by_id='DzTjkKse')
'CYTL1', 'PSMC3', 'AP2S1', 'RHOC', 'PDAP1', 'TAGLN2', 'LBH', 'ADSL', 'CCL4', 'PLAC8', ...
obs: FeatureSet(id='tfrfeotun53IO4o0g2Pj', n=1, registry='core.Feature', hash='k3ON0Ea-SwSaTVbRu7kE', updated_at=2023-09-22 18:44:45, modality_id='zaCpJM7g', created_by_id='DzTjkKse')
π cell_type (9, bionty.CellType): 'gamma-delta T cell', 'cytotoxic T cell', 'CD4-positive, alpha-beta T cell', 'CD24-positive, CD4 single-positive thymocyte', 'B cell, CD19-positive', 'CD8-positive, CD25-positive, alpha-beta regulatory T cell', 'dendritic cell', 'CD16-positive, CD56-dim natural killer cell, human', 'monocyte'
external: FeatureSet(id='l8GZYinuhuSSFpV55ch4', n=2, registry='core.Feature', hash='2DlkyLpMca3LGwfc7E2N', updated_at=2023-09-22 18:44:46, modality_id='zaCpJM7g', created_by_id='DzTjkKse')
π species (1, bionty.Species): 'human'
π assay (1, bionty.ExperimentalFactor): 'single-cell RNA sequencing'
file.describe()
File(id='D4Soc2iFauHfymG956ss', suffix='.h5ad', accessor='AnnData', description='10x reference pbmc68k', size=660792, hash='a2V0IgOjMRHsCeZH169UOQ', hash_type='md5', updated_at=2023-09-22 18:44:45)
Provenance:
ποΈ storage: Storage(id='yNdwkjSP', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna', type='local', updated_at=2023-09-22 18:43:43, created_by_id='DzTjkKse')
π« transform: Transform(id='Nv48yAceNSh8z8', name='scRNA-seq', short_name='scrna', version='0', type=notebook, updated_at=2023-09-22 18:44:45, created_by_id='DzTjkKse')
π£ run: Run(id='Nv39dIk0xeRfAOZAwfvB', run_at=2023-09-22 18:43:45, transform_id='Nv48yAceNSh8z8', created_by_id='DzTjkKse')
π€ created_by: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-09-22 18:43:43)
Features:
var: FeatureSet(id='GglELLiZwTYIyev6GwOp', n=754, type='number', registry='bionty.Gene', hash='WMDxN7253SdzGwmznV5d', updated_at=2023-09-22 18:44:45, modality_id='YVd1fHWO', created_by_id='DzTjkKse')
'CYTL1', 'PSMC3', 'AP2S1', 'RHOC', 'PDAP1', 'TAGLN2', 'LBH', 'ADSL', 'CCL4', 'PLAC8', ...
obs: FeatureSet(id='tfrfeotun53IO4o0g2Pj', n=1, registry='core.Feature', hash='k3ON0Ea-SwSaTVbRu7kE', updated_at=2023-09-22 18:44:45, modality_id='zaCpJM7g', created_by_id='DzTjkKse')
π cell_type (9, bionty.CellType): 'gamma-delta T cell', 'cytotoxic T cell', 'CD4-positive, alpha-beta T cell', 'CD24-positive, CD4 single-positive thymocyte', 'B cell, CD19-positive', 'CD8-positive, CD25-positive, alpha-beta regulatory T cell', 'dendritic cell', 'CD16-positive, CD56-dim natural killer cell, human', 'monocyte'
external: FeatureSet(id='l8GZYinuhuSSFpV55ch4', n=2, registry='core.Feature', hash='2DlkyLpMca3LGwfc7E2N', updated_at=2023-09-22 18:44:46, modality_id='zaCpJM7g', created_by_id='DzTjkKse')
π species (1, bionty.Species): 'human'
π assay (1, bionty.ExperimentalFactor): 'single-cell RNA sequencing'
Labels:
π·οΈ species (1, bionty.Species): 'human'
π·οΈ cell_types (9, bionty.CellType): 'gamma-delta T cell', 'cytotoxic T cell', 'CD4-positive, alpha-beta T cell', 'CD24-positive, CD4 single-positive thymocyte', 'B cell, CD19-positive', 'CD8-positive, CD25-positive, alpha-beta regulatory T cell', 'dendritic cell', 'CD16-positive, CD56-dim natural killer cell, human', 'monocyte'
π·οΈ experimental_factors (1, bionty.ExperimentalFactor): 'single-cell RNA sequencing'
file.view_flow()
π Now letβs continue with data integration: Integrate scRNA-seq datasets