e-learning

Generating a single cell matrix using Alevin

Abstract

This tutorial will take you from raw FASTQ files to a cell x gene data matrix in AnnData format. What's a data matrix, and what's AnnData format? Well you'll find out! Importantly, this is the first step in processing single cell data in order to start analysing it. Currently you have a bunch of strings of ATGGGCTT etc. in your sequencing files, and what you need to know is how many cells you have and what genes appear in those cells. These steps are the most computationally heavy in the single cell world, as you're starting with 100s of millions of reads, each with 4 lines of text. Later on in analysis, this data becomes simple gene counts such as 'Cell A has 4 GAPDHs', which is a lot easier to store! Because of this data overload, we have downsampled the FASTQ files to speed up the analysis a bit. Saying that, you're still having to map loads of reads to the massive murine genome, so get yourself a cup of coffee and prepare to analyse!

About This Material

This is a Hands-on Tutorial from the GTN which is usable either for individual self-study, or as a teaching material in a classroom.

Questions this will address

  • I have some single cell FASTQ files I want to analyse. Where do I start?

Learning Objectives

  • Generate a cellxgene matrix for droplet-based single cell sequencing data
  • Interpret quality control (QC) plots to make informed decisions on cell thresholds
  • Find relevant information in GTF files for the particulars of their study, and include this in data matrix metadata

Licence: Creative Commons Attribution 4.0 International

Keywords: 10x, MIGHTS, Single Cell, paper-replication

Target audience: Students

Resource type: e-learning

Version: 19

Status: Active

Prerequisites:

  • An introduction to scRNA-seq data analysis
  • Introduction to Galaxy Analyses
  • Understanding Barcodes

Learning objectives:

  • Generate a cellxgene matrix for droplet-based single cell sequencing data
  • Interpret quality control (QC) plots to make informed decisions on cell thresholds
  • Find relevant information in GTF files for the particulars of their study, and include this in data matrix metadata

Date modified: 2024-10-28

Date published: 2021-03-03

Authors: Jonathan Manning, Wendi Bacon

Contributors: Helena Rasche, Julia Jakiela

Scientific topics: Transcriptomics


Activity log