EMR创建向导默认选中的是
Automatically terminate cluster after idle time (Recommended)
,则表示创建的EMR集群是短时任务集群,任务运算完毕后会自动终止。自动终止也就是自动释放,相关EC2会被彻底删除。在集群终止配置界面,选择
Manually terminate cluster
,这表示创建的是长时运行集群,不会自动关闭和终止,只能手工删除集群。此外,
Use termination protection
选项是一个防止意外删除EC2的选项,推荐选中。在可选的
Bootstrap actions
位置,留空,不需要设置。如下截图。
aws emr create-cluster \
--name "Presto-workshop" \
--log-uri "s3n://emr-dev-exp-133129065110/logs/presto-workshop/" \
--release-label "emr-6.15.0" \
--service-role "arn:aws:iam::133129065110:role/EMR_DefaultRole" \
--termination-protected \
--ec2-attributes '{"InstanceProfile":"EMR_EC2_DefaultRole","EmrManagedMasterSecurityGroup":"sg-081d6bd2f2e24eb49","EmrManagedSlaveSecurityGroup":"sg-0a73ac72d490b95ae","KeyName":"lxy-oregon","AdditionalMasterSecurityGroups":[],"AdditionalSlaveSecurityGroups":[],"SubnetId":"subnet-0de831893a87a7861"}' \
--applications Name=Hadoop Name=Hue Name=Presto \
--configurations '[{"Classification":"presto-connector-hive","Properties":{"hive.metastore.glue.datacatalog.enabled":"true"}}]' \
--instance-groups '[{"InstanceCount":1,"InstanceGroupType":"MASTER","Name":"Primary","InstanceType":"m5.xlarge","EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32},"VolumesPerInstance":2}]}},{"InstanceCount":2,"InstanceGroupType":"CORE","Name":"Core","InstanceType":"m5.xlarge","EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32},"VolumesPerInstance":2}]}}]' \
--auto-scaling-role "arn:aws:iam::133129065110:role/EMR_AutoScaling_DefaultRole" \
--scale-down-behavior "TERMINATE_AT_TASK_COMPLETION" \
--region "us-west-2"
5、登录到EMR集群的Master节点使用Presto提交查询
进入EMR服务界面,点击EMR集群的名称,进入详情界面。如下截图。
WHEN fare_prct < 0.7 THEN 'FL70'
WHEN fare_prct < 0.8 THEN 'FL80'
WHEN fare_prct < 0.9 THEN 'FL90'
ELSE 'FL100'
END) FarePrctCtgry,
(CASE
WHEN tip_prct < 0.1 THEN 'TipsLowerThan10'
WHEN tip_prct < 0.15 THEN 'TipsLowerThan15'
WHEN tip_prct < 0.2 THEN 'TipsLowerThan20'
ELSE 'TipsGreaterThan20'
END) TipPrctCtgry
(SELECT TripID,
(fare_amount / total_amount) AS fare_prct,
(extra / total_amount) AS extra_prct,
(mta_tax / total_amount) AS tip_prct,
(tolls_amount / total_amount) AS mta_taxprct,
(tip_amount / total_amount) AS tolls_prct,
(improvement_surcharge / total_amount) AS imprv_suchrgprct,
total_amount
(SELECT *,
(CAST(pickup_longitude AS VARCHAR(100)) || '_' || CAST(pickup_latitude AS VARCHAR(100))) AS TripID
FROM presto_workshop_db.taxi
WHERE total_amount > 0
) AS t
) AS t
GROUP BY TipPrctCtgry
ORDER by TipPrctCtgry asc;
执行完毕,返回结果。如下截图。
curl "https://s3.amazonaws.com/session-manager-downloads/plugin/latest/mac_arm64/sessionmanager-bundle.zip" -o "sessionmanager-bundle.zip"
unzip sessionmanager-bundle.zip
sudo ./sessionmanager-bundle/install -i /usr/local/sessionmanagerplugin -b /usr/local/bin/session-manager-plugin
session-manager-plugin
当返回如下信息,则表示安装成功:
The Session Manager plugin was installed successfully. Use the AWS CLI to start a session.
3、使用Session Manager连接到Presto GUI
请确认按照上一步,配置好了AWSCLI、AKSK密钥、Session Manager插件for AWSCLI。
接下来通过EMR控制台,查询EMR主节点的EC2 Instance对应的ID。进入EMR集群界面,从
Cluster Management
下,获取其
Primary node public DNS
,点击小方块可将其复制到剪贴板。如下截图。
--region us-west-2 \
--target i-09023c1a32ed56185 \
--document-name AWS-StartPortForwardingSessionToRemoteHost \
--parameters '{"host":["ec2-35-164-60-179.us-west-2.compute.amazonaws.com"],"portNumber":["8889"], "localPortNumber":["58889"]}'
以上命令将EMR集群的Presto GUI所在的
8889
端口,转发到开发者本机的
58889
端口。在开发者本机执行。如下截图。
执行后,控制台返回如下:
Starting session with SessionId: xxxxx-01e3172aac81d458b
Port 58889 opened for sessionId xxxxx-01e3172aac81d458b.
Waiting for connections...
连接成功。如下截图。
--region us-west-2 \
--target i-09023c1a32ed56185 \
--document-name AWS-StartPortForwardingSessionToRemoteHost \
--parameters '{"host":["ec2-35-164-60-179.us-west-2.compute.amazonaws.com"],"portNumber":["8888"], "localPortNumber":["58888"]}'
执行命令的过程与上一步相同。
现在打开浏览器,访问本机
http://localhost:58888
。注意这里不带https,是http。可看到访问成功。由于是第一次使用Hue,因此需要输入用户名和密码,点击
Create Account
按钮,这个新输入的用户名和密码就会成为
superuser
权限。如下截图。
使用Session Manager登录EC2 Linux
使用Session Manager登录位于内网的EC2的RDP远程桌面
使用EC2 Instance Connect网页界面SSH工具登陆EC2 Linux
如何在IAM上生成AKSK
Windows和Linux安装CloudWatch Agent增加内存和磁盘监控
使用ec2-instance-selector查询规格
多种方式查询最新AMI
EC2 Linux 安装Mate GUI
导入CentOS的AMI到云上启动失败需要安装ENA/NVMe驱动的说明
Level 200系列:
EKS 101 & 201 系列合集(更新到1.30版本)
ECS 201系列
Kinesis 101系列三部曲
CloudFront汇总
保护API Gateway安全三部曲
查询各服务IP范围
在EC2 Windows上通过WSL运行Linux
Level 300系列:
Gateway Load Balancer 集中式和分布式流量检测
Quickstart
CloudFront签名
更多文章请使用站内搜索输入服务名称查找