Dibyanshu
2 min readMay 12, 2021

Hive Optimization

  1. Partitioning:-
    Works by dividing the data into smaller logical segments. We finally scan only one partition.

partitioning can be done on columns with low cardinality.

for example, partitioning can be done on the state column.

2. Bucketing:-

Works by dividing the data into smaller segments.

These segments are created based on system-defined hash functions (not logical).

We finally scan only one bucket.

bucketing can be done on columns with high cardinality.

for example, bucketing can be done on the member_id column.

3. Join optimizations techniques

Map side joins, Bucket Map Join, Sort Merge Bucket Join also called SMB join.

All of them try to minimize shuffling.

4. Use Orc file format with a compression-like snappy.

Orc can reduce the data storage by 75% of the original.

It uses techniques like predicate push-down, compression, and more to improve the performance of the query.

Snappy provides a fast compression.

5. UDF’s are not very optimized.

filter operations are evaluated left to right.

For best performance, put UDFs on the right in an ANDed list of expressions in the WHERE clause.

6. Execution engine:-
In hive, we have the flexibility to use different execution engine like Map-Reduce, Tez, Spark.
We should mostly avoid using map-reduce execution and prefer tez, spark execution engine.
to change the execution engine, set the following properties :
set hive.execution.engine= tez;
or- set hive.execution.engine=spark;

7. Vectorization:-
Using vectorization, hive process batch of raws together instead of one raw at a time.
To enable vectorization, set this configuration parameter:
set hive.vectorized.execution.enabled=true

8. Cost Based Optimization(CBO):
The main aim of CBO is to provide efficient execution plans by look at the tables and the query, by this way one can cut down on query execution time and reducing resource utilization.

ANALYZE TABLE [tbl_name] COMPUTE STATISTICS;

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response