matchPWM {Biostrings}R Documentation

PWM creating, matching, and related utilities

Description

Position Weight Matrix (PWM) creating, matching, and related utilities for DNA data. (PWM for amino acid sequences are not supported.)

Usage

  PWM(x, type = c("log2probratio", "prob"),
      prior.params = c("A"=0.25, "C"=0.25, "G"=0.25, "T"=0.25))

  matchPWM(pwm, subject, min.score="80%", ...)
  countPWM(pwm, subject, min.score="80%", ...)
  PWMscoreStartingAt(pwm, subject, starting.at=1)

  ## Utility functions for basic manipulation of the Position Weight Matrix
  maxWeights(x)
  minWeights(x)
  maxScore(x)
  minScore(x)
  unitScale(x)
  ## S4 method for signature 'matrix'
reverseComplement(x, ...)

Arguments

x For PWM a character string or DNAStringSet whose elements all have the same number of characters.

For maxWeights, minWeights, maxScore, minScore, unitScale , and reverseComplement a numeric matrix with row names A, C, G and T representing a Position Weight Matrix.

type The type of position weight matrix, either "log2probratio" or "prob". See Details section for more information.
prior.params A positive numeric vector, which represents the parameters of the Dirichlet conjugate prior, with names A, C, G, and T. See Details section for more information.
pwm A numeric matrix with row names A, C, G and T representing a Position Weight Matrix.
subject An DNAString, XStringViews or MaskedDNAString object for matchPWM and countPWM.

A DNAString object containing the subject sequence.

min.score The minimum score for counting a match. Can be given as a character string containing a percentage (e.g. "85%") of the highest possible score or as a single number.
starting.at An integer vector specifying the starting positions of the Position Weight Matrix relatively to the subject.
... Additional arguments for methods.

Details

The PWM function uses a multinomial model with a Dirichlet conjugate prior to calculate the estimated probability of base b at position i. As mentioned in the Arguments section, prior.params supplies the parameters for the DNA bases A, C, G, and T in the Dirichlet prior. These values result in a position independent initial estimate of the probabilities for the bases to be priorProbs = prior.params/sum(prior.params) and the posterior (data infused) estimate for the probabilities for the bases in each of the positions to be postProbs = (consensusMatrix(x) + prior.params)/(length(x) + sum(prior.params)). When type = "log2probratio", the PWM = unitScale(log2(postProbs/priorProbs)). When type = "prob", the PWM = unitScale(postProbs).

Value

A numeric matrix representing the Position Weight Matrix for PWM.

A numeric vector containing the Position Weight Matrix-based scores for PWMscoreStartingAt.

An XStringViews object for matchPWM.

A single integer for countPWM.

A vector containing the max weight for each position in pwm for maxWeights.

A vector containing the min weight for each position in pwm for minWeights.

The highest possible score for a given Position Weight Matrix for maxScore.

The lowest possible score for a given Position Weight Matrix for maxScore.

The modified numeric matrix given by (x - minScore(x)/ncol(x))/(maxScore(x) - minScore(x)) for unitScale.

A PWM obtained by reverting the column order in PWM x and by reassigning each row to its complementary nucleotide for reverseComplement.

Author(s)

H. Pages and P. Aboyoun

References

Wasserman, WW, Sandelin, A., (2004) Applied bioinformatics for the identification of regulatory elements, Nat Rev Genet., 5(4):276-87.

See Also

matchPattern, reverseComplement, DNAString-class, XStringViews-class

Examples

  ## Data setup
  data(HNF4alpha)
  library(BSgenome.Dmelanogaster.UCSC.dm3)
  chr3R <- Dmelanogaster$chr3R
  chr3R

  ## Create a PWM and perform some general routines
  pwm <- PWM(HNF4alpha)
  round(pwm, 2)
  maxWeights(pwm)
  maxScore(pwm)
  reverseComplement(pwm)

  ## Score the first 5 positions
  PWMscoreStartingAt(pwm, unmasked(chr3R), starting.at=1:5)

  ## Match the plus strand
  matchPWM(pwm, chr3R)
  countPWM(pwm, chr3R)

  ## Match the minus strand
  matchPWM(reverseComplement(pwm), chr3R)

[Package Biostrings version 2.18.2 Index]