Impala VS Spark

news/2024/5/12 5:57:03

Impala vs spark性能测试

impala版本chd2.3;4个server节点

spark版本1.6.2;5个 Executors,每个10cores,20G memory

1.8亿的表

sqlsparkSql(取三次平均)impala(取三次平均)
select * from dadddcb8179f4345a53819a7f75abdfb where fk87561dc6 > 131097 limit 10001s0.9s
select fkeefe9a40, fkeb26f85b, count(1) from dadddcb8179f4345a53819a7f75abdfb group by fkeefe9a40, fkeb26f85b limit 10003.5s5.1s
select fkeefe9a40, fkeb26f85b, count(1) from dadddcb8179f4345a53819a7f75abdfb where fkeefe9a40 > ‘2015-03-06’ and fkeefe9a40 < ‘2015-12-06’ group by fkeefe9a40, fkeb26f85b limit 10003.5s5.2s
select fkeefe9a40, fkca691145, sum(cast(fk87561dc6 as int)) as total from dadddcb8179f4345a53819a7f75abdfb where fkeb26f85b in (‘4490’,‘3588’) group by fkeefe9a40, fkca691145 order by total desc limit 10003.9s3.2s
select fkeefe9a40, fk1412e410, max(fk87561dc6), min(fk87561dc6) from dadddcb8179f4345a53819a7f75abdfb where fkeefe9a40 > ‘2015-03-06’ group by fkeefe9a40, fk1412e410 limit 10009.5s13.5s

1千万的表

sqlsparkSql(取三次平均)impala(取三次平均)
select * from 6e0643254e9848f5b71c676649e336cb where fk153ddc37 > 1000 limit 10000.58s0.6s
select fkfe37137d,fk49466f92,count(1) from 6e0643254e9848f5b71c676649e336cb group by fkfe37137d,fk49466f92 limit 10003.2s3.4s
select fkfe37137d,fk49466f92,count(1) from 6e0643254e9848f5b71c676649e336cb where fkfe37137d > ‘2017-01-10’ and fkfe37137d < ‘2017-03-20’ group by fkfe37137d,fk49466f92 limit 10001.7s1.6s
select fkfe37137d, fk49466f92, sum(cast(fk153ddc37 as int)) as total from 6e0643254e9848f5b71c676649e336cb where fkd2f91355 in(‘e339058d8ddb7bd89a3e939e54372f38’, ‘eab700943c25bd7f522cb9c7a187e344’) group by fkfe37137d, fk49466f92 order by total desc limit 10002.9s1.8s
select fkfe37137d, fk1dba2b17, max(fk153ddc37), min(fk153ddc37) from 6e0643254e9848f5b71c676649e336cb where fkfe37137d > ‘2017-01-10’ group by fkfe37137d, fk1dba2b17 limit 10002.3s1.7s

1百万的表

sqlsparkSql(取三次平均)impala(取三次平均)
select * from o49ed7e906424d90a0ea2f11bea5e1db where fk0770b65e > 1000 limit 10000.35s0.14
select fk28a55b60,fk418dc234,count(1) from o49ed7e906424d90a0ea2f11bea5e1db group by fk28a55b60,fk418dc234 limit 10001.4s1.6s
select fk28a55b60,fk418dc234,count(1) from o49ed7e906424d90a0ea2f11bea5e1db where fk28a55b60 > ‘2017-02-10’ and fk28a55b60 < ‘2017-03-20’ group by fk28a55b60,fk418dc234 limit 10000.6s1.1s
select fk28a55b60, fk8b32095b, sum(cast(fk0770b65e as int)) as total from o49ed7e906424d90a0ea2f11bea5e1db where fk815f8898 in (‘0’,‘3’) group by fk28a55b60, fk8b32095b order by total desc limit 10000.9s1.2s
select fk28a55b60, fk418dc234, max(fk0770b65e), min(fk0770b65e) from o49ed7e906424d90a0ea2f11bea5e1db where fk28a55b60 > ‘2017-02-10’ group by fk28a55b60, fk418dc234 limit 10001.7s1.6s

结论:

impala2.3和spark1.6查询性能并没有相差太大,impala2.6和spark2.0的话按照官方的宣传的话最后也差不多。

Spark 2.0 improved its large query performance by an average of 2.4X over Spark 1.6 (so upgrade!). Small query performance was already good and remained roughly the same.

Impala 2.6 is 2.8X as fast for large queries as version 2.3. Small query performance was already good and remained roughly the same.

与 Spark 1.6 相比,Spark 2.0 将其大型查询性能平均提高了 2.4 倍(所以升级!)。 小型查询性能已经很好并且大致保持不变。
Impala 2.6 的大型查询速度是 2.3 版的 2.8 倍。 小型查询性能已经很好并且大致保持不变。


http://wed.xjx100/news/112822.html

相关文章

java_集合统计

1.代码实现: import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.</

从永远到永远-电吉他综合效果器

电吉他综合效果器 1.名词释义1.GLOBAL全局菜单1.左侧窗口1.PAGE 12.PAGE 2 2.中间窗口PAGE 1PAGE 2 3.右侧窗口1.USB AUDIO2.VERSION 2.PATCH&#xff08;效果&#xff09;1.窗口释义2.调整PATCH1.三级目录 1.名词释义 参考 SIG:信号PATCH&#xff1a;补丁&#xff0c;音色补…

计网—TCP与UDP的区别

一、TCP与UDP TCP 是面向连接的、可靠的、基于字节流的传输层通信协议。 UDP是数据报传输协议&#xff0c;利用 IP 提供面向无连接的通信服务。 二、TCP与UDP的区别 首先TCP与UDP的头部报文格式不同 TCP多了序列号、确认号、标志位、窗口大小、紧急指针等 UDP多了报文长度字段…

Java题目训练——反转部分单向链表和猴子分桃

一、反转部分单向链表 题目描述&#xff1a; 给定一个单链表&#xff0c;在链表中把第 L 个节点到第 R 个节点这一部分进行反转。 输入描述&#xff1a; n 表示单链表的长度。 val 表示单链表各个节点的值。 L 表示翻转区间的左端点。 R 表示翻转区间的右端点。 输出描述&…

【Python】【进阶篇】1、Django是什么?

目录 1、Django是什么&#xff1f;1. Django的由来2. Django的命名3. Django的版本发布1) 功能版2) 补丁版3) LTS 版本 4. Django框架的特点 1、Django是什么&#xff1f; Django 是使用 Python 语言开发的一款免费而且开源的 Web 应用框架。由于 Python 语言的跨平台性&#…

为什么需要内网穿透技术?

随着互联网技术的快速发展&#xff0c;企业和个人越来越依赖于网络资源&#xff0c;而内网穿透技术正是解决远程访问内网资源的关键。本文将详细介绍内网穿透的概念及其重要性&#xff0c;以帮助您了解为什么我们需要使用内网穿透技术。 目录 一、内网穿透技术简介 二、为什…

Linux 文件系统全面解析:从基本原理到实际应用

目录标题 引言&#xff1a;Linux文件系统概述 | Introduction: Overview of Linux File Systemsa. 什么是文件系统 | What is a File Systemb. Linux文件系统的重要性 | Importance of Linux File Systems Linux文件系统的种类 | Types of Linux File Systemsa. 常见的Linux文件…

腾讯云4核8g服务器支持多少人在线使用?

腾讯云轻量4核8G12M轻量应用服务器支持多少人同时在线&#xff1f;通用型-4核8G-180G-2000G&#xff0c;2000GB月流量&#xff0c;系统盘为180GB SSD盘&#xff0c;12M公网带宽&#xff0c;下载速度峰值为1536KB/s&#xff0c;即1.5M/秒&#xff0c;假设网站内页平均大小为60KB…