Table of contents
Hive Tables
Internal Table
- internal data managed by hive
- external data stored in warehouse
External Table
- internal data managed by hive
- external data is point to source'
File Formats
Parquet File
Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk
ORC File
[The Optimized Row Columnar (ORC)](cwiki.apache.org/confluence/display/hive/la.. file format provides a highly efficient way to store Hive data. It was designed to overcome limitations of the other Hive file formats. Using ORC files improves performance when Hive is reading, writing, and processing data
Avro File
Avro files include markers that can be used to split large data sets into subsets suitable for Apache MapReduce processing. Some data exchange services use a code generator to interpret the data definition and produce code to access the data. Avro doesn't require this step, making it ideal for scripting languages
CSV File
A CSV (comma-separated values) file is a text file that has a specific format which allows data to be saved in a table structured format