添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

EMR创建向导默认选中的是 Automatically terminate cluster after idle time (Recommended) ,则表示创建的EMR集群是短时任务集群,任务运算完毕后会自动终止。自动终止也就是自动释放,相关EC2会被彻底删除。在集群终止配置界面,选择 Manually terminate cluster ,这表示创建的是长时运行集群,不会自动关闭和终止,只能手工删除集群。此外, Use termination protection 选项是一个防止意外删除EC2的选项,推荐选中。在可选的 Bootstrap actions 位置,留空,不需要设置。如下截图。

aws emr create-cluster \
 --name "Presto-workshop" \
 --log-uri "s3n://emr-dev-exp-133129065110/logs/presto-workshop/" \
 --release-label "emr-6.15.0" \
 --service-role "arn:aws:iam::133129065110:role/EMR_DefaultRole" \
 --termination-protected \
 --ec2-attributes '{"InstanceProfile":"EMR_EC2_DefaultRole","EmrManagedMasterSecurityGroup":"sg-081d6bd2f2e24eb49","EmrManagedSlaveSecurityGroup":"sg-0a73ac72d490b95ae","KeyName":"lxy-oregon","AdditionalMasterSecurityGroups":[],"AdditionalSlaveSecurityGroups":[],"SubnetId":"subnet-0de831893a87a7861"}' \
 --applications Name=Hadoop Name=Hue Name=Presto \
 --configurations '[{"Classification":"presto-connector-hive","Properties":{"hive.metastore.glue.datacatalog.enabled":"true"}}]' \
 --instance-groups '[{"InstanceCount":1,"InstanceGroupType":"MASTER","Name":"Primary","InstanceType":"m5.xlarge","EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32},"VolumesPerInstance":2}]}},{"InstanceCount":2,"InstanceGroupType":"CORE","Name":"Core","InstanceType":"m5.xlarge","EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32},"VolumesPerInstance":2}]}}]' \
 --auto-scaling-role "arn:aws:iam::133129065110:role/EMR_AutoScaling_DefaultRole" \
 --scale-down-behavior "TERMINATE_AT_TASK_COMPLETION" \
 --region "us-west-2"

5、登录到EMR集群的Master节点使用Presto提交查询

进入EMR服务界面,点击EMR集群的名称,进入详情界面。如下截图。

WHEN fare_prct < 0.7 THEN 'FL70' WHEN fare_prct < 0.8 THEN 'FL80' WHEN fare_prct < 0.9 THEN 'FL90' ELSE 'FL100' END) FarePrctCtgry, (CASE WHEN tip_prct < 0.1 THEN 'TipsLowerThan10' WHEN tip_prct < 0.15 THEN 'TipsLowerThan15' WHEN tip_prct < 0.2 THEN 'TipsLowerThan20' ELSE 'TipsGreaterThan20' END) TipPrctCtgry (SELECT TripID, (fare_amount / total_amount) AS fare_prct, (extra / total_amount) AS extra_prct, (mta_tax / total_amount) AS tip_prct, (tolls_amount / total_amount) AS mta_taxprct, (tip_amount / total_amount) AS tolls_prct, (improvement_surcharge / total_amount) AS imprv_suchrgprct, total_amount (SELECT *, (CAST(pickup_longitude AS VARCHAR(100)) || '_' || CAST(pickup_latitude AS VARCHAR(100))) AS TripID FROM presto_workshop_db.taxi WHERE total_amount > 0 ) AS t ) AS t GROUP BY TipPrctCtgry ORDER by TipPrctCtgry asc;

执行完毕,返回结果。如下截图。

curl "https://s3.amazonaws.com/session-manager-downloads/plugin/latest/mac_arm64/sessionmanager-bundle.zip" -o "sessionmanager-bundle.zip"
unzip sessionmanager-bundle.zip
sudo ./sessionmanager-bundle/install -i /usr/local/sessionmanagerplugin -b /usr/local/bin/session-manager-plugin
session-manager-plugin

当返回如下信息,则表示安装成功:

The Session Manager plugin was installed successfully. Use the AWS CLI to start a session.

3、使用Session Manager连接到Presto GUI

请确认按照上一步,配置好了AWSCLI、AKSK密钥、Session Manager插件for AWSCLI。

接下来通过EMR控制台,查询EMR主节点的EC2 Instance对应的ID。进入EMR集群界面,从 Cluster Management 下,获取其 Primary node public DNS ,点击小方块可将其复制到剪贴板。如下截图。

--region us-west-2 \ --target i-09023c1a32ed56185 \ --document-name AWS-StartPortForwardingSessionToRemoteHost \ --parameters '{"host":["ec2-35-164-60-179.us-west-2.compute.amazonaws.com"],"portNumber":["8889"], "localPortNumber":["58889"]}'

以上命令将EMR集群的Presto GUI所在的 8889 端口,转发到开发者本机的 58889 端口。在开发者本机执行。如下截图。

执行后,控制台返回如下:

Starting session with SessionId: xxxxx-01e3172aac81d458b
Port 58889 opened for sessionId xxxxx-01e3172aac81d458b.
Waiting for connections...

连接成功。如下截图。

--region us-west-2 \ --target i-09023c1a32ed56185 \ --document-name AWS-StartPortForwardingSessionToRemoteHost \ --parameters '{"host":["ec2-35-164-60-179.us-west-2.compute.amazonaws.com"],"portNumber":["8888"], "localPortNumber":["58888"]}'

执行命令的过程与上一步相同。

现在打开浏览器,访问本机 http://localhost:58888 。注意这里不带https,是http。可看到访问成功。由于是第一次使用Hue,因此需要输入用户名和密码,点击 Create Account 按钮,这个新输入的用户名和密码就会成为 superuser 权限。如下截图。

使用Session Manager登录EC2 Linux

使用Session Manager登录位于内网的EC2的RDP远程桌面

使用EC2 Instance Connect网页界面SSH工具登陆EC2 Linux

如何在IAM上生成AKSK

Windows和Linux安装CloudWatch Agent增加内存和磁盘监控

使用ec2-instance-selector查询规格

多种方式查询最新AMI

EC2 Linux 安装Mate GUI

导入CentOS的AMI到云上启动失败需要安装ENA/NVMe驱动的说明

Level 200系列:

EKS 101 & 201 系列合集(更新到1.30版本)

ECS 201系列

Kinesis 101系列三部曲

CloudFront汇总

保护API Gateway安全三部曲

查询各服务IP范围

在EC2 Windows上通过WSL运行Linux

Level 300系列:

Gateway Load Balancer 集中式和分布式流量检测 Quickstart

CloudFront签名

更多文章请使用站内搜索输入服务名称查找

2024贵在学习

November 2024