Meanings of Fields in Downloaded Customer Data Files

We occasionally are asked about the precise meaning of all the fields in customer data files (.tsv) downloaded from the immunoSEQ website. Perhaps the easiest way to explain this is a table with the name of each field along with a definition.

Column Title Definition
sequenceID This is the flow cell lane_sample ID_primer versions
container Project ID associated with the samples
nucleotide Reconstructed nucleotide sequence
aminoAcid Inferred amino acid sequence
normalizedFrequency Frequency calculated based on normalized copy number *PCR normalization is currently considered reliable only for Human TCR beta
normalizedCopy Each copy count is multiplied by a factor depending on the V-J pair identity, to account for V and J primer bias. *For Human TCR beta
rawFrequency Frequency calculated based on raw copy number
copy The raw copy count
cdr3Length Length of the CDR3 region-including indels
VFamilyName V family number if known (i.e. including if there are ties)
VGeneName V family number and specific V gene number-or if known to either of two related genes-or unknown if more
VTies Names of potential V genes
DGeneName D gene name including unknowns
JGeneName Name of J gene. If a J tie, returns the alphabetically first J gene name
JTies Names of potential J genes
VDeletion The number of deletions inferred after V identification
d5Deletion The number of deletions between the end of the V segment and the 5′ end of the D segment
d3Deletion The number of deletions between the end of the J segment and the 3′ end of the D segment
JDeletion The number of deletions inferred after J identification
n2Insertion The number of insertions in the N2 region
n1Insertion The number of insertions in the N1 region
sequenceStatus Whether the nucleotide sequence generates a functional amino acid sequence
VIndex Distance from the start of the V gene (designated 0) to a conserved V motif
n1Index Distance from 0 to the start of the N1 region
n2Index Distance from 0 to the start of the N2 region
DIndex Distance from 0 to the start of the D region
JIndex Distance from 0 to the 3′ end of the J region

Below is a cartoon illustrating the V,D,J, and N index positions.

 

As for the difference between “raw” and “normalized,” the set of normalization factors we currently feature with our online tools is derived from comparing sequence data to results from a Betamark (flow) assay that measures relative amounts of each V. The results for most V’s matched quite closely but any remaining differences are fixed by the multiplication of V-J specific normalization factors.

This entry was posted in Software Tips. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>