htsjdk库VariantContext类介绍
创始人
2024-11-11 18:36:34
0

HTSJDK是一个用于处理高通量测序数据的 Java 库,其中VariantContext是这个库中的一个重要类,用于表示变异数据。VariantContext类是htsjdk.variant包的一部分,主要用于处理和描述变异(如 SNPs、插入和删除等)。

VariantContext类介绍

功能

VariantContext类用于存储和操作与变异相关的信息,包括:

  • 变异的位置(位置坐标)
  • 变异的类型(SNP、插入、删除等)
  • 变异的等位基因
  • 变异的质量分数
  • 变异的过滤信息
  • 样本级别的信息(如基因型)
主要属性和方法
  1. 位置

    • getContig(): 返回变异所在的染色体或 contig 名称。
    • getStart(): 返回变异的起始位置(1-based)。
    • getEnd(): 返回变异的结束位置(1-based)。
  2. 变异类型

    • isSNP(): 检查变异是否是一个单核苷酸多态性(SNP)。
    • isIndel(): 检查变异是否是插入或删除(Indel)。
  3. 等位基因

    • getAlleles(): 返回变异的等位基因列表。
    • getReference(): 返回参考等位基因。
    • getAlternateAlleles(): 返回变异等位基因列表。
  4. 基因型

    • getGenotypes(): 返回所有样本的基因型信息。
    • getGenotype(String sampleName): 获取特定样本的基因型。
  5. 过滤和质量

    • isFiltered(): 检查变异是否被过滤掉。
    • getPhredScaledQual(): 返回变异的质量分数。
  6. 附加信息

    • getAttributes(): 获取变异的附加信息(例如注释、额外的标记等)。

源代码:

/* * Copyright (c) 2012 The Broad Institute *  * Permission is hereby granted, free of charge, to any person * obtaining a copy of this software and associated documentation * files (the "Software"), to deal in the Software without * restriction, including without limitation the rights to use, * copy, modify, merge, publish, distribute, sublicense, and/or sell * copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following * conditions: *  * The above copyright notice and this permission notice shall be * included in all copies or substantial portions of the Software. *  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR * THE USE OR OTHER DEALINGS IN THE SOFTWARE. */  package htsjdk.variant.variantcontext;  import htsjdk.beta.plugin.HtsRecord; import htsjdk.tribble.Feature; import htsjdk.tribble.TribbleException; import htsjdk.tribble.util.ParsingUtils; import htsjdk.variant.utils.GeneralUtils; import htsjdk.variant.vcf.*;  import java.io.Serializable; import java.util.ArrayList; import java.util.Collection; import java.util.Collections; import java.util.Comparator; import java.util.EnumSet; import java.util.HashMap; import java.util.HashSet; import java.util.List; import java.util.Map; import java.util.Set; import java.util.regex.Pattern; import java.util.stream.Collectors;  /**  *   * 

High-level overview

* * The VariantContext object is a single general class system for representing genetic variation data composed of: *
    *
  • Allele: representing single genetic haplotypes (A, T, ATC, -) (note that null alleles are used here for illustration; see the Allele class for how to represent indels)
  • *
  • Genotype: an assignment of alleles for each chromosome of a single named sample at a particular locus
  • *
  • VariantContext: an abstract class holding all segregating alleles at a locus as well as genotypes * for multiple individuals containing alleles at that locus
  • *
*

* The class system works by defining segregating alleles, creating a variant context representing the segregating * information at a locus, and potentially creating and associating genotypes with individuals in the context. *

*

* All of the classes are highly validating -- call validate() if you modify them -- so you can rely on the * self-consistency of the data once you have a VariantContext in hand. The system has a rich set of assessor * and manipulator routines, as well as more complex static support routines in VariantContextUtils. *

*

* The VariantContext (and Genotype) objects are attributed (supporting addition of arbitrary key/value pairs) and * filtered (can represent a variation that is viewed as suspect). *

*

*VariantContexts are dynamically typed, so whether a VariantContext is a SNP, Indel, or NoVariant depends * on the properties of the alleles in the context. See the detailed documentation on the Type parameter below. *

*

* It's also easy to create subcontexts based on selected genotypes. *

*

Working with Variant Contexts

* By default, VariantContexts are immutable. In order to access (in the rare circumstances where you need them) * setter routines, you need to create MutableVariantContexts and MutableGenotypes. * *

Some example data

*
  * Allele A, Aref, T, Tref;  * Allele del, delRef, ATC, ATCref;  *
*

* A [ref] / T at 10 *

*
   * GenomeLoc snpLoc = GenomeLocParser.createGenomeLoc("chr1", 10, 10);  *
*

* A / ATC [ref] from 20-23 *

*
  * GenomeLoc delLoc = GenomeLocParser.createGenomeLoc("chr1", 20, 22);  *
*

* // A [ref] / ATC immediately after 20 *

*
  * GenomeLoc insLoc = GenomeLocParser.createGenomeLoc("chr1", 20, 20);  *
*

Alleles

* * See the documentation in the Allele class itself * *

What are they?

* *

Alleles can be either reference or non-reference

* *

Examples of alleles used here:

*
  *   A = new Allele("A");  *   Aref = new Allele("A", true);  *   T = new Allele("T");  *   ATC = new Allele("ATC");  *
*

Creating variant contexts

* *

By hand

* * Here's an example of a A/T polymorphism with the A being reference: * *
  * VariantContext vc = new VariantContext(name, snpLoc, Arrays.asList(Aref, T));  * 
* * If you want to create a non-variant site, just put in a single reference allele * *
  * VariantContext vc = new VariantContext(name, snpLoc, Arrays.asList(Aref));  * 
* * A deletion is just as easy: * *
  * VariantContext vc = new VariantContext(name, delLoc, Arrays.asList(ATCref, del));  * 
* * The only thing that distinguishes between an insertion and deletion is which is the reference allele. * An insertion has a reference allele that is smaller than the non-reference allele, and vice versa for deletions. * *
  * VariantContext vc = new VariantContext("name", insLoc, Arrays.asList(delRef, ATC));  * 
* *

Converting rods and other data structures to VariantContexts

* * You can convert many common types into VariantContexts using the general function: * *
  * VariantContextAdaptors.convertToVariantContext(name, myObject)  * 
* * dbSNP and VCFs, for example, can be passed in as myObject and a VariantContext corresponding to that * object will be returned. A null return value indicates that the type isn't yet supported. This is the best * and easiest way to create contexts using RODs. * * *

Working with genotypes

* *
  * List<Allele> alleles = Arrays.asList(Aref, T);  * Genotype g1 = new Genotype(Arrays.asList(Aref, Aref), "g1", 10);  * Genotype g2 = new Genotype(Arrays.asList(Aref, T), "g2", 10);  * Genotype g3 = new Genotype(Arrays.asList(T, T), "g3", 10);  * VariantContext vc = new VariantContext(snpLoc, alleles, Arrays.asList(g1, g2, g3));  * 
* * At this point we have 3 genotypes in our context, g1-g3. * * You can assess a good deal of information about the genotypes through the VariantContext: * *
  * vc.hasGenotypes()  * vc.isMonomorphicInSamples()  * vc.isPolymorphicInSamples()  * vc.getSamples().size()  *  * vc.getGenotypes()  * vc.getGenotypes().get("g1")  * vc.hasGenotype("g1")  *  * vc.getCalledChrCount()  * vc.getCalledChrCount(Aref)  * vc.getCalledChrCount(T)  * 
* *

NO_CALL alleles

* * The system allows one to create Genotypes carrying special NO_CALL alleles that aren't present in the * set of context alleles and that represent undetermined alleles in a genotype: *
  * Genotype g4 = new Genotype(Arrays.asList(Allele.NO_CALL, Allele.NO_CALL), "NO_DATA_FOR_SAMPLE", 10);  *
* *

subcontexts

* It's also very easy get subcontext based only the data in a subset of the genotypes: * *
  * VariantContext vc12 = vc.subContextFromGenotypes(Arrays.asList(g1,g2));  * VariantContext vc1 = vc.subContextFromGenotypes(Arrays.asList(g1));  * 
* * * *

Fully decoding.

* Currently VariantContexts support some fields, particularly those * stored as generic attributes, to be of any type. For example, a field AB might * be naturally a floating point number, 0.51, but when it's read into a VC its * not decoded into the Java presentation but left as a string "0.51". A fully * decoded VariantContext is one where all values have been converted to their * corresponding Java object types, based on the types declared in a VCFHeader. * * The fullyDecode(...) method takes a header object and creates a new fully decoded VariantContext * where all fields are converted to their true java representation. The VCBuilder * can be told that all fields are fully decoded, in which case no work is done when * asking for a fully decoded version of the VC. * * */ public class VariantContext implements HtsRecord, Feature, Serializable { public static final long serialVersionUID = 1L; private static final boolean WARN_ABOUT_BAD_END = true; private static final int MAX_ALLELE_SIZE_FOR_NON_SV = 150; private boolean fullyDecoded = false; protected CommonInfo commonInfo = null; public static final double NO_LOG10_PERROR = CommonInfo.NO_LOG10_PERROR; public static final Set PASSES_FILTERS = Collections.emptySet(); /** The location of this VariantContext */ protected final String contig; protected final long start; protected final long stop; private final String ID; /** The type (cached for performance reasons) of this context */ protected Type type = null; /** The type of this context, cached separately if ignoreNonRef is true */ protected Type typeIgnoringNonRef = null; /** A set of the alleles segregating in this context */ protected final List alleles; /** A mapping from sampleName -> genotype objects for all genotypes associated with this context */ protected GenotypesContext genotypes = null; /** Counts for each of the possible Genotype types in this context */ protected int[] genotypeCounts = null; public static final GenotypesContext NO_GENOTYPES = GenotypesContext.NO_GENOTYPES; // a fast cached access point to the ref / alt alleles for biallelic case private Allele REF = null; // set to the alt allele when biallelic, otherwise == null private Allele ALT = null; /* cached monomorphic value: null -> not yet computed, False, True */ private Boolean monomorphic = null; /* * Determine which genotype fields are in use in the genotypes in VC * @return an ordered list of genotype fields in use in VC. If vc has genotypes this will always include GT first */ public List calcVCFGenotypeKeys(final VCFHeader header) { final Set keys = new HashSet<>(); boolean sawGoodGT = false; boolean sawGoodQual = false; boolean sawGenotypeFilter = false; boolean sawDP = false; boolean sawAD = false; boolean sawPL = false; for (final Genotype g : this.getGenotypes()) { keys.addAll(g.getExtendedAttributes().keySet()); if ( g.isAvailable() ) sawGoodGT = true; if ( g.hasGQ() ) sawGoodQual = true; if ( g.hasDP() ) sawDP = true; if ( g.hasAD() ) sawAD = true; if ( g.hasPL() ) sawPL = true; if (g.isFiltered()) sawGenotypeFilter = true; } if ( sawGoodQual ) keys.add(VCFConstants.GENOTYPE_QUALITY_KEY); if ( sawDP ) keys.add(VCFConstants.DEPTH_KEY); if ( sawAD ) keys.add(VCFConstants.GENOTYPE_ALLELE_DEPTHS); if ( sawPL ) keys.add(VCFConstants.GENOTYPE_PL_KEY); if ( sawGenotypeFilter ) keys.add(VCFConstants.GENOTYPE_FILTER_KEY); List sortedList = ParsingUtils.sortList(new ArrayList<>(keys)); // make sure the GT is first if (sawGoodGT) { final List newList = new ArrayList<>(sortedList.size() + 1); newList.add(VCFConstants.GENOTYPE_KEY); newList.addAll(sortedList); sortedList = newList; } if (sortedList.isEmpty() && header.hasGenotypingData()) { // this needs to be done in case all samples are no-calls return Collections.singletonList(VCFConstants.GENOTYPE_KEY); } else { return sortedList; } } // --------------------------------------------------------------------------------------------------------- // // validation mode // // --------------------------------------------------------------------------------------------------------- //no controls and white-spaces characters, no semicolon. public static final Pattern VALID_FILTER = Pattern.compile("^[!-:<-~]+$"); public enum Validation { ALLELES() { void validate(VariantContext variantContext) { validateAlleles(variantContext); } }, GENOTYPES() { void validate(VariantContext variantContext) { validateGenotypes(variantContext); } }, FILTERS { void validate(VariantContext variantContext) { validateFilters(variantContext); } }; abstract void validate(VariantContext variantContext); private static void validateAlleles(fina

相关内容

热门资讯

规律教程!xpoker辅助助手... 规律教程!xpoker辅助助手德州辅助app官方最新版本介绍下载(果真真的有挂)1、每一步都需要思考...
详细教程!!wpk私人辅助德扑... 详细教程!!wpk私人辅助德扑之星透视软件(一直真的有挂)1、下载好wpk私人辅助辅助软件之后点击打...
教你教程!cloudpoker... 教你教程!cloudpoker外挂WePOKer有没有透视方法(一般真的有挂)1、cloudpoke...
专业教程!wejoker黑侠辅... 专业教程!wejoker黑侠辅助器HHpoker可以开挂吗(一贯真的有挂);进入游戏-大厅左侧-新手...
透牌教程!哈糖大菠萝辅助器we... 透牌教程!哈糖大菠萝辅助器wepoker到底能不能透视(都是真的有挂);1、首先打开哈糖大菠萝辅助器...
2025新版技巧!wpk模拟器... 2025新版技巧!wpk模拟器是什么天天畅玩德州有挂吗(好像真的有挂);1、首先打开wpk模拟器是什...
新版2025教程!wepoke... 新版2025教程!wepoker有没有透视方法wpk德州辅助(的确是有挂的)1、点击下载安装,wep...
扑克教程!!impoker辅助... 扑克教程!!impoker辅助wepoker私人局作弊开挂方法(其实真的有挂);一、impoker辅...
wpk教程!pokemmo脚本... wpk教程!pokemmo脚本手机版wepoker德州开挂辅助方法(的确是有挂的)1、构建自己的po...
攻略教程!!菠萝辅助器免费版的... 攻略教程!!菠萝辅助器免费版的功能介绍德州大菠萝摆牌攻略(一贯真的有挂)1)菠萝辅助器免费版的功能介...