Hive Architecture

What is Hive ?

Hive is built on top of Apache Hadoop, which is an open-source framework used to efficiently store and process large datasets. As a result, Hive is closely integrated with Hadoop, and is designed to work quickly on petabytes of data. What makes Hive unique is the ability to query large datasets, leveraging MapReduce, with a SQL-like interface

Screenshot 2022-11-19 at 12.12.09 PM.png

Screenshot 2022-11-19 at 12.38.53 PM.png

Hive Client

JDBC client

A JDBC driver connects to Hive using the Thrift framework. Hive Server communicates with the Java applications using the JDBC driver

ODBC client

The Hive ODBC driver uses Thrift to connect to Hive. However, the ODBC driver uses the Hive Server to communicate with it instead of the Hive Server

Thrift Clients

The Hive server can handle requests from a client by using Apache Thrift

CLI

The Hive CLI (Command Line Interface) is a shell where we can execute Hive queries and commands

Hive Web Interface

The Hive Web UI is just an alternative of Hive CLI. It provides a web-based GUI for executing Hive queries and commands

Driver

Controller for HQL statements
Creates session for query
Maintains lifecycle of HQL
Maintains metadata for execution
Collects output and display

Parsing / Compilation

Syntex check
Execution plan
Prepare different steps to get an output
Raise compile time errors

Optimizer

Compares execution plans
Calculate cost
Execution plan of DAG
Try to place or combine transformations together

Execution

Optimizer generates the logical plan in the form of DAG of map-reduce tasks and HDFS tasks. In the end, the execution engine executes the tasks

Metastore

Metastore stores metadata information about tables and partitions, including column and column type information, in order to improve search engine indexing.

Two types

Internal Databases (Derby Database)	External Database
Can't have metadata backup	Provides metadata backup
Only one connection at a time	More multiple connnection
Only for internal use cases	Expose to external use cases

Benefits of using Hive

Simple to use
Built on top of hadoop
Typical SQL kind of framework
Logic will be converted into map-reduce code

Hive Introduction And Architecture

Table of contents

What is Hive ?