public class AvroParquetInputFormat extends ParquetInputFormat<org.apache.avro.generic.IndexedRecord>
InputFormat
for Parquet files.READ_SUPPORT_CLASS, UNBOUND_RECORD_FILTER
Constructor and Description |
---|
AvroParquetInputFormat() |
Modifier and Type | Method and Description |
---|---|
static void |
setAvroReadSchema(org.apache.hadoop.mapreduce.Job job,
org.apache.avro.Schema avroReadSchema)
Override the Avro schema to use for reading.
|
static void |
setRequestedProjection(org.apache.hadoop.mapreduce.Job job,
org.apache.avro.Schema requestedProjection)
Set the subset of columns to read (projection pushdown).
|
createRecordReader, getFooters, getFooters, getGlobalMetaData, getReadSupport, getReadSupportClass, getSplits, getSplits, getUnboundRecordFilter, listStatus, setReadSupportClass, setReadSupportClass, setUnboundRecordFilter
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, isSplitable, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
public static void setRequestedProjection(org.apache.hadoop.mapreduce.Job job, org.apache.avro.Schema requestedProjection)
This is useful if the full schema is large and you only want to read a few columns, since it saves time by not reading unused columns.
If a requested projection is set, then the Avro schema used for reading
must be compatible with the projection. For instance, if a column is not included
in the projection then it must either not be included or be optional in the read
schema. Use setAvroReadSchema(org.apache.hadoop.mapreduce.Job,
org.apache.avro.Schema)
to set a read schema, if needed.
job
- requestedProjection
- setAvroReadSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema)
,
AvroParquetOutputFormat.setSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema)
public static void setAvroReadSchema(org.apache.hadoop.mapreduce.Job job, org.apache.avro.Schema avroReadSchema)
Differences between the read and write schemas are resolved using Avro's schema resolution rules.
job
- avroReadSchema
- setRequestedProjection(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema)
,
AvroParquetOutputFormat.setSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema)
Copyright © 2015. All rights reserved.