FAQs#

What format should my query data be in to upload to ArchMap?#

Your query data should be in .h5ad format to upload to ArchMap. The .h5ad file should contain the raw count data in the .X attribute and the gene names or Ensembl IDs in the .var_names attribute. Also, your batch information should be labelled as “batch” in the .obs dataframe of your h5ad file. To check if you data meets the upload requirements, you can use the Google Colab notebook to convert your data

I am getting an error “Batch key information not specified…”#

If you are getting the error shown in the screenshot below when processing your query, this means that your query is missing batch information or you have not labelled your batch information correctly to “batch” in your .h5ad file. If your data is taken from a single sample and does not contain batch info or you are not sure how to relabel your batch info in you .h5ad file, you can make a copy of the following notebook and run it, which will label your data as being from a single sample/batch or will relabel the batch info based on the name that you input for “custom_batch_name”. After running this notebook you can upload the new .h5ad to ArchMap for mapping.

I am getting an error “Error message: ‘_index’ is a reserved name for dataframe columns.”#

If you are getting this error, this means that a column in the .obs of .var dataframe of your anndata object has a column name called ‘_index’. Please remove this column or rename it.

My query data has more than the limit of 200 000 cells. What can I do?#

Even if your query data has more than 200 000 cells, you are still able to map your full data to ArchMap by splitting your data into batches and creating separate projects for each mapping. After you have obtained your mapping results for each project, you can download your results on the “Your Mappings” page and concatenate your results. If you are not sure of the steps to take to do this, you can follow the linked notebooks that show how to correctly separate you query pre-mapping and concatenate your results post-mapping. Please make sure to copy the notebooks in order to make any needed edits.

Which classifier should I use?#

We recommend to use the KNN classifier, as it performs the best for all atlases, specifically when transfering higher resolution cell type labels. However, we recommend that you consider the evaluation metric “Percentage query cells unknown” for the respective mapping on the “Your Mappings” page to evaluate the performance of a classifier in labelling your data. A cell will be labelled as “Unknown” if its uncertainty score is larger than 0.5. If a large percentage of your query cells are labelled “Unknown” with your chosen classifier, we recommend to compare the label transfer with other available classifiers to determine if uncertainty can be improved. Note that the number of query cells labelled “Unknown” is also dependent on the type of data you are mapping. For example, a disease dataset or mouse or cell line data mapped to the human lung cell atlas will have a higher number of “Unknown” cell type classifications than mapping healthy, human data.

Where can I see the evaluation metrics?#

You can see the evaluation metrics of a mapping on the “Your Mappings” page by clicking on the information icon to the left of the CellxGene launch button.

Can I download my results?#

You can download your mapping results by clicking the download icon to the right of the CellxGene launch button on the “Your Mappings” page.

Where can I see the label transfer results?#

You can see the label transfer results by launching the CellxGene instance on the “Your mappings” page. For example if you chose the KNN classifier for label transfer when mapping to the human lung cell atlas, you will obtain cell type label transfer predictions for three levels of cell type annotations (ann_level_3, ann_level_4, ann_level_5). For example for ann_level_5, the label transfer predictions will be found under categories ann_level_5_prediction_knn and ann_level_5_prediction_knn_filtered_by_uncert>0.5 (where cell types with an uncertainty score greater than 0.5 are labelled as “Unknown”).

How can I interpret my label transfer results?#

Note the the label transfer results at the finest level for the fetal brain and hypomap atlases (that is, subregion_class and Author_CellType) may not be as accurate as the coarser layer predictions.

How can I visualize my downloaded results myself in cellxgene?#

To visualize your downloaded results yourself in cellxgene, you need to first install cellxgene locally. You can do so by following the steps here. ArchMap’s built-in visualization functionality includes only a subset of the original reference to allow for faster computation. Hence, the neighbourhood graph of the downloaded file containing the full mapping must be recomputed if visualization is desired downstream using cellxgene. You can use the colab notebook here to recalculate the neighbourhood graph of your mapping. Please make sure to copy the notebook in order to make any needed edits. Once you have run the notebook, you can visualize the output file by launching cellxgene in your terminal, as shown here.

Why can I not submit more than 40 projects at once?#

A limit of 40 projects per hour is set for each user. Thus, if you try to submit further projects within an hour, the newly submitted project will not show up on your project dashboard. This limit will reset after an hour.

How can I map my data to an older version of a model on scvi-hub?#

To map to an older version of a model on scvi-hub, the user can follow this tutorial to download their desired scvi-hub model (with specified version) and upload it to ArchMap to map their query to.

How do I upload a scPoli model to ArchMap?#

To upload a scPoli model to ArchMap, please follow the tutorial provided here. As the scPoli output after integration generates three separate files, it is necessary to combine these files to upload to ArchMap. The necessary steps are outlined in the linked tutorial.

How can I download atlas files from ArchMap?#

You can download all published atlases in ArchMap by going to either the References header (if you are not logged in) that can be accessed from the home page, or by clicking on the Search icon on the side bar (if you are logged in). You will then see a list of all reference atlases. Hover over your desired atlas and click on “Learn More”. This will take you to a new page where you will see a “Download” button. By clicking this button, the atlas files will be downloaded. This includes three separate files, namely data.h5ad (containing all atlas metadata and the reference embedding), model.pt (the deep learning model), and data_only_count.h5ad (containing the reference count data).

How can I convert an Rds file to h5ad format?#

To convert your data from Rds format to h5ad, you can run this Google Colab notebook. This notebook checks that your data meets the requirements for ArchMap and converts the data to h5ad format. Once you have run this notebook your data is ready to be uploaded to ArchMap!

I am receiving the error: “Less than 5% of genes in your query overlap with the reference data.” What does this mean?#

This error means that there is insufficient overlap between the genes in your query dataset and the genes in the reference atlas you are trying to map to. This means that either you are not using an appropriate atlas for your query data or you do not have the gene symbols or Ensembl IDs saved in the .var_names attribute of your h5ad file. Please make sure either gene symbols or Ensembl IDs are used in your h5ad file. The choice of using gene symbols or Ensembl IDs is reference agnostic and the format will be converted automatically to match the reference in the ArchMap pipeline. You can use the following Google Colab notebook to check and modify your h5ad file accordingly.