XDataFrame-class {IRanges}R Documentation

External Data Frame

Description

The XDataFrame emulates the interface of data.frame, but it supports the storage of any type of object as a column, as long as the length and [ methods are implemented. The “X” in its name indicates that it attempts to coerce its columns to external XSequence objects in a way that is completely transparent to the user. This helps to avoid unncessary copying.

Details

On the whole, the XDataFrame behaves very similarly to data.frame, in terms of construction, subsetting, splitting, combining, etc. The most notable exception is that the row names are optional. This means calling rownames(x) will return NULL if there are no row names. Of course, it could return seq_len(nrow(x)), but returning NULL informs, for example, combination functions that no row names are desired (they are often a luxury when dealing with large data).

As XDataFrame derives from AnnotatedList, it is possible to set an annotaiton string. Also, another XDataFrame can hold metadata on the columns.

Accessors

In the following code snippets, x is an XDataFrame.

dim(x): Get the length two integer vector indicating in the first and second element the number of rows and columns, respectively.
dimnames(x), dimnames(x) <- value: Get and set the two element list containing the row names (character vector of length nrow(x) or NULL) and the column names (character vector of length ncol(x)).

Subsetting

In the following code snippets, x is an XDataFrame.

x[i,j,drop]: Behaves very similarly to the [.data.frame method, except i can be a logical Rle object and subsetting by matrix indices is not supported. Due to limitations in the subsetting of XSequence objects, indices containing NA's are not supported.
x[[i]]: Behaves very similarly to the [[.data.frame method, except arguments j (why?) and exact are not supported. Column name matching is always exact. Subsetting by matrices is not supported.
x[[i]] <- value: Behaves very similarly to the [[<-.data.frame method, except the argument j is not supported. An attempt is made to coerce value to a XSequence object.

Constructor

XDataFrame(..., row.names = NULL): Constructs an XDataFrame in similar fashion to data.frame. Each argument in ... is coerced to an XDataFrame and combined column-wise. No special effort is expended to automatically determine the row names from the arguments. The row names should be given in row.names; otherwise, there are no row names. This is by design, as row names are normally undesirable when data is large.

Splitting and Combining

In the following code snippets, x is an XDataFrame.

split(x, f, drop = FALSE): Splits x into a SplitXDataFrameList, according to f, dropping elements corresponding to unrepresented levels if drop is TRUE.
rbind(...): Creates a new XDataFrame by combining the rows of the XDataFrame objects in .... Very similar to rbind.data.frame, except in the handling of row names. If all elements have row names, they are concatenated and made unique. Otherwise, the result does not have row names. Currently, factors are not handled well (their levels are dropped). This is not a high priority until there is an XFactor class.
cbind(...): Creates a new XDataFrame by combining the columns of the XDataFrame objects in .... Very similar to cbind.data.frame, except row names, if any, are dropped. Consider the XDataFrame as an alternative that allows one to specify row names.

Coercion

as(from, "XDataFrame"): By default, constructs a new XDataFrame with from as its only column. If from is a matrix or data.frame, all of its columns become columns in the new XDataFrame. In any case, there is an attempt to coerce columns to XSequence before inserting them into the XDataFrame. If from is a list, its elements become columns in the same way. Note that for the XDataFrame to behave correctly, each column object must support element-wise subsetting via the [ method and return the number of elements with length. It is recommended to use the XDataFrame constructor, rather than this interface.
as.list(x): Coerces x, an XDataFrame, to a list, converting any XSequence objects to vectors along the way.
as.data.frame(x, row.names=NULL, optional=FALSE): Coerces x, an XDataFrame, to a data.frame. Each column is coerced to a vector and stored as a column in the data.frame. If row.names is NULL, they are retrieved from x, if it has any. Otherwise, they are inferred by the data.frame constructor.
as(from, "data.frame"): Coerces a XDataFrame to a data.frame by calling as.data.frame(from).

Note

In the future, the general data frame functionality will probably be moved to a DataFrame class. XDataFrame will derive from DataFrame and encapsulate the behavior of attempting to coerce or even requiring columns to be XSequence.

Author(s)

Michael Lawrence

See Also

RangedData, which makes heavy use of this class.

Examples

  score <- c(1L, 3L, NA)
  counts <- c(10L, 2L, NA)
  row.names <- c("one", "two", "three")
  
  xdf <- XDataFrame(score) # single column
  xdf[["score"]]
  xdf <- XDataFrame(score, row.names = row.names) #with row names
  rownames(xdf)
  
  xdf <- XDataFrame(vals = score) # explicit naming
  xdf[["vals"]]
  
  # a data.frame
  sw <- XDataFrame(swiss)
  as.data.frame(sw) # swiss, without row names
  # now with row names
  sw <- XDataFrame(swiss, row.names = rownames(swiss))
  as.data.frame(sw) # swiss

  # subsetting
    
  sw[] # identity subset
  sw[,] # same

  sw[NULL] # no columns
  sw[,NULL] # no columns
  sw[NULL,] # no rows

  ## select columns
  sw[1:3]
  sw[,1:3] # same as above
  sw[,"Fertility"]
  sw[,c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE)]

  ## select rows and columns
  sw[4:5, 1:3]
  
  sw[1] # one-column XDataFrame
  ## the same
  sw[, 1, drop = FALSE]
  sw[, 1] # a (unnamed) vector
  sw[[1]] # the same
  sw[["Fertility"]]

  sw[["Fert"]] # should return 'NULL'
  
  sw[1,] # a one-row XDataFrame
  sw[1,, drop=TRUE] # a list

  ## duplicate row, unique row names are created
  sw[c(1, 1:2),]

  ## indexing by row names  
  sw["Courtelary",]
  subsw <- sw[1:5,1:4]
  subsw["C",] # partially matches

  ## row and column names
  cn <- paste("X", seq_len(ncol(swiss)), sep = ".")
  colnames(sw) <- cn
  colnames(sw)
  rn <- seq(nrow(sw))
  rownames(sw) <- rn
  rownames(sw)

  ## column replacement

  xdf[["counts"]] <- counts
  xdf[["counts"]]
  xdf[[3]] <- score
  xdf[["X"]]
  xdf[[3]] <- NULL # deletion

  ## split

  sw <- XDataFrame(swiss)
  swsplit <- split(sw, sw[["Education"]])
  
  ## rbind

  do.call(rbind, as.list(swsplit))

  ## cbind

  cbind(XDataFrame(score), XDataFrame(counts))

[Package IRanges version 1.2.3 Index]