MaxCompute中select命令用法_云原生大数据计算服务 MaxCompute-阿里云帮助中心

功能介绍

select 语句用于从表中选取满足指定条件的数据。您可以根据实际场景结合以下功能完成多样化的查询操作。

类型	功能
子查询（SUBQUERY）	在某个查询的执行结果基础上进一步执行查询操作时，可以通过子查询操作实现。
交集（INTERSECT）、并集（UNION）和补集（EXCEPT）	对查询结果数据集执行取交集、并集或补集操作。
JOIN	通过 `join` 操作连接表并返回符合连接条件和查询条件的数据信息。
SEMI JOIN（半连接）	通过右表过滤左表的数据，右表的数据不出现在结果集中。
MAPJOIN HINT	对一个大表和一个或多个小表执行 `join` 操作时，可以在 `select` 语句中显式指定 `mapjoin` Hint提示以提升查询性能。
DISTRIBUTED MAPJOIN	`distributed mapjoin` 是 `mapjoin` 的升级版，适用于小表 `join` 大表的场景。
SKEWJOIN HINT	当两张表Join存在热点，导致出现长尾问题时，您可以通过取出热点key，将数据分为热点数据和非热点数据两部分处理，最后合并的方式，提高Join效率。
Lateral View	通过Lateral View与UDTF（表生成函数）结合，将单行数据拆成多行数据。
GROUPING SETS	对数据进行多维度的聚合分析。
SELECT TRANSFORM	`select transform` 语法允许您启动一个指定的子进程，将输入数据按照一定的格式通过标准输入至子进程，并且通过解析子进程的标准输出获取输出数据。
Split Size Hint	通过修改Split Size来控制并发度数量。
TimeTravel查询与Incremental查询	对于Transaction Table2.0类型的表，支持：通过TimeTravel查询，查询回溯到源表某个历史时间或者版本进行历史Snapshot查询。通过Incremental查询，指定源表某个历史时间区间或者版本区间进行历史增量查询。

使用限制

当使用 select 语句时，屏显最多只能显示10000行结果，同时返回结果要小于10 MB。当 select 语句作为子句时则无此限制， select 子句会将全部结果返回给上层查询。
select 语句查询分区表时默认禁止全表扫描。

自2018年1月10日20:00:00后，在新创建的项目上执行SQL语句时，默认情况下，针对该项目里的分区表不允许执行全表扫描操作。在查询分区表数据时必须指定分区，由此减少SQL的不必要I/O，从而减少计算资源的浪费以及按量计费模式下不必要的计算费用。
如果您需要对分区表进行全表扫描，可以在全表扫描的SQL语句前加上命令 set odps.sql.allow.fullscan=true; ，并和SQL语句一起提交执行。假设 sale_detail 表为分区表，需要同时执行如下语句进行全表查询：
```
set odps.sql.allow.fullscan=true;
select * from sale_detail;
```
如果整个项目都需要开启全表扫描，项目空间Owner执行如下命令打开开关：
```
setproject odps.sql.allow.fullscan=true;
```
当查询聚簇表（cluster表）时，目前版本只对单表扫描分区数小于等于400时进行分桶裁剪优化。当分桶裁剪优化未生效时，会导致扫描数据增加。如果您使用的是按需付费模式，则导致费用增加；如果您使用包年包月付费模式，则会导致SQL计算性能下降。

命令格式

[with <cte>[, ...] ]
select [all | distinct] <select_expr>[, <except_expr>)][, <replace_expr>] ...
       from <table_reference>
       [where <where_condition>]
       [group by {<col_list>|rollup(<col_list>)}]
           [having <having_condition>]
       [order by <order_condition>]
       [distribute by <distribute_condition> [sort by <sort_condition>]|[ cluster by <cluster_condition>] ]
       [limit <number>]
       [window <window_clause>]

命令中各字段的执行语序请参见 SELECT语序。

示例数据

为便于理解使用方法，本文为您提供源数据，基于源数据提供相关示例。创建表sale_detail，并添加数据，命令示例如下。

--创建一张分区表sale_detail。
create table if not exists sale_detail
shop_name     string,
customer_id   string,
total_price   double
partitioned by (sale_date string, region string);
--向源表增加分区。
alter table sale_detail add partition (sale_date='2013', region='china');
--向源表追加数据。
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);

查询分区表sale_detail中的数据，命令示例如下：

select * from sale_detail;
--返回结果。
+------------+------------+------------+------------+------------+
| shop_name  | price      | customer   | sale_date  | region     |
+------------+------------+------------+------------+------------+
| s1         | 100.1      | c1         | 2013       | china      |
| s2         | 100.2      | c2         | 2013       | china      |
| s3         | 100.3      | c3         | 2013       | china      |
+------------+------------+------------+------------+------------+

WITH子句（cte）

可选。WITH子句包含一个或多个常用的表达式CTE。CTE充当当前运行环境中的临时表，您可以在之后的查询中引用该表。CTE使用规则如下：

在同一WITH子句中的CTE必须具有唯一的名字。
在WITH子句中定义的CTE仅对在同一WITH子句中的其他CTE可以使用。
假设A是子句中的第一个CTE，B是子句中的第二个CTE：
- A引用A：无效。错误命令示例如下。
```
with 
A as (select 1 from A) 
select * from A;
```
- A引用B，B引用A：无效，不允许循环引用。错误命令示例如下
```
with 
A as (select * from B ), 
B as (select * from A ) 
select * from B;
```

正确命令示例如下。

with 
A as (select 1 as C),
B as (select * from A) 
select * from B;

返回结果如下。

+---+
| c |
+---+
| 1 |
+---+

列表达式（select_expr）

必填。 select_expr 格式为 col1_name, col2_name, 列表达式,... ，表示待查询的普通列、分区列或正则表达式。列表达式使用规则如下：

用列名指定要读取的列。

读取表 sale_detail 的列 shop_name 。命令示例如下。

select shop_name from sale_detail;

返回结果如下。

+------------+
| shop_name  |
+------------+
| s1         |
| s2         |
| s3         |
+------------+

用星号（ * ）代表查询所有的列。可配合 where 子句指定过滤条件。

读取表 sale_detail 中所有的列。命令示例如下。

--开启全表扫描，仅此Session有效。
set odps.sql.allow.fullscan=true;
select * from sale_detail;

返回结果如下。

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

在 where 子句中指定过滤条件。命令示例如下。

select * from sale_detail where shop_name='s1';

返回结果如下。

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

可以使用正则表达式。

选出 sale_detail 表中所有列名以 sh 开头的列。命令示例如下。

select `sh.*` from sale_detail;

返回结果如下。

+------------+
| shop_name  |
+------------+
| s1         |
| s2         |
| s3         |
+------------+

选出 sale_detail 表中列名不为 shop_name 的所有列。命令示例如下。

select `(shop_name)?+.+` from sale_detail;

返回结果如下。

+-------------+-------------+------------+------------+
| customer_id | total_price | sale_date  | region     |
+-------------+-------------+------------+------------+
| c1          | 100.1       | 2013       | china      |
| c2          | 100.2       | 2013       | china      |
| c3          | 100.3       | 2013       | china      |
+-------------+-------------+------------+------------+

选出 sale_detail 表中排除 shop_name 和 customer_id 两列的其它列。命令示例如下。

select `(shop_name|customer_id)?+.+` from sale_detail;

返回结果如下。

+-------------+------------+------------+
| total_price | sale_date  | region     |
+-------------+------------+------------+
| 100.1       | 2013       | china      |
| 100.2       | 2013       | china      |
| 100.3       | 2013       | china      |
+-------------+------------+------------+

选出 sale_detail 表中排除列名以 t 开头的其它列。命令示例如下。
```
select `(t.*)?+.+` from sale_detail;
```
返回结果如下。
```
+------------+-------------+------------+------------+
| shop_name  | customer_id | sale_date  | region     |
+------------+-------------+------------+------------+
| s1         | c1          | 2013       | china      |
| s2         | c2          | 2013       | china      |
| s3         | c3          | 2013       | china      |
+------------+-------------+------------+------------+
```

- ```
select distinct region from sale_detail;
```
```
+------------+
| region     |
+------------+
| china      |
+------------+
```
- ```
select distinct region, sale_date from sale_detail;
```
```
+------------+------------+
| region     | sale_date  |
+------------+------------+
| china      | 2013       |
+------------+------------+
```

--读取sale_detail表的数据，并排除region列的数据。
select * except(region) from sale_detail;

+-----------+-------------+-------------+-----------+
| shop_name | customer_id | total_price | sale_date |
+-----------+-------------+-------------+-----------+
| s1        | c1          | 100.1       | 2013      |
| s2        | c2          | 100.2       | 2013      |
| s3        | c3          | 100.3       | 2013      |
+-----------+-------------+-------------+-----------+

--读取sale_detail表的数据，并修改total_price、region两列的数据。
select * replace(total_price+100 as total_price, 'shanghai' as region) from sale_detail;

+-----------+-------------+-------------+-----------+--------+
| shop_name | customer_id | total_price | sale_date | region |
+-----------+-------------+-------------+-----------+--------+
| s1        | c1          | 200.1       | 2013      | shanghai |
| s2        | c2          | 200.2       | 2013      | shanghai |
| s3        | c3          | 200.3       | 2013      | shanghai |
+-----------+-------------+-------------+-----------+--------+

select customer_id from sale_detail;

+-------------+
| customer_id |
+-------------+
| c1          |
| c2          |
| c3          |
+-------------+

select * from (select region,sale_date from sale_detail) t where region = 'china';

+------------+------------+
| region     | sale_date  |
+------------+------------+
| china      | 2013       |
| china      | 2013       |
| china      | 2013       |
+------------+------------+

select * 
from sale_detail
where sale_date >= '2008' and sale_date <= '2014';
--等价于如下语句。
select * 
from sale_detail 
where sale_date between '2008' and '2014';

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

- - ```
  @com.aliyun.odps.udf.annotation.UdfProperty(isDeterministic=true)
```
- - ```
  --UDF必须放在查询的源表的where条件中：
  select key, value from srcp where udf(ds) = 'xx';
```
- ```
--放在join on后面分区裁剪不会生效
select A.c1, A.c2 from srcp1 A  join srcp2  B on A.c1 = B.c1 and udf(A.ds) ='xx';
```

select  task_name
        ,inst_id
        ,settings
        ,GET_JSON_OBJECT(settings, '$.SKYNET_ID') as skynet_id
        ,GET_JSON_OBJECT(settings, '$.SKYNET_NODENAME') as user_agent
from    Information_Schema.TASKS_HISTORY
where   ds = '20211215' and skynet_id is not null
limit 10;

```
select region from sale_detail group by region;
```
```
+------------+
| region     |
+------------+
| china      |
+------------+
```

select sum(total_price) from sale_detail group by region;

+------------+
| _c0        |
+------------+
| 300.6      |
+------------+

select region, sum (total_price) from sale_detail group by region;

+------------+------------+
| region     | _c1        |
+------------+------------+
| china      | 300.6      |
+------------+------------+

select region as r from sale_detail group by r;
--等效于如下语句。
select region as r from sale_detail group by region;

+------------+
| r          |
+------------+
| china      |
+------------+

select 2 + total_price as r from sale_detail group by 2 + total_price;

+------------+
| r          |
+------------+
| 102.1      |
| 102.2      |
| 102.3      |
+------------+

select region, total_price from sale_detail group by region;

select region, total_price from sale_detail group by region, total_price;

+------------+-------------+
| region     | total_price |
+------------+-------------+
| china      | 100.1       |
| china      | 100.2       |
| china      | 100.3       |
+------------+-------------+

--与下一条SQL语句一起执行。
set odps.sql.groupby.position.alias=true;
--1代表select的列中第一列即region，以region值分组，返回每一组的region值（组内唯一）及销售额总量。
select region, sum(total_price) from sale_detail group by 1;

+------------+------------+
| region     | _c1        |
+------------+------------+
| china      | 300.6      |
+------------+------------+

--为直观展示数据呈现效果，向sale_detail表中追加数据。
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
--使用having子句配合聚合函数实现过滤。
select region,sum(total_price) from sale_detail 
group by region 
having sum(total_price)<305;

+------------+------------+
| region     | _c1        |
+------------+------------+
| china      | 300.6      |
| shanghai   | 200.9      |
+------------+------------+

select * from sale_detail order by total_price limit 2;

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

select * from sale_detail order by total_price desc limit 2;

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s3         | c3          | 100.3       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

select * from sale_detail order by total_price limit 2;

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

select total_price as t from sale_detail order by total_price limit 3;
--等效于如下语句。
select total_price as t from sale_detail order by t limit 3;

+------------+
| t          |
+------------+
| 100.1      |
| 100.2      |
| 100.3      |
+------------+

--与下一条SQL语句一起执行。
set odps.sql.orderby.position.alias=true;
select * from sale_detail order by 3 limit 3;

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

```
select customer_id,total_price from sale_detail order by total_price limit 3 offset 2;
--等效于如下语句。
select customer_id,total_price from sale_detail order by total_price limit 2, 3;
```
```
+-------------+-------------+
| customer_id | total_price |
+-------------+-------------+
| c3          | 100.3       |
+-------------+-------------+
```

--查询表sale_detail中的列region值并按照region值进行哈希分片。
select region from sale_detail distribute by region;
--等价于如下语句。
select region as r from sale_detail distribute by region;
select region as r from sale_detail distribute by r;

--为直观展示数据呈现效果，向sale_detail表中追加数据。
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
select region,total_price from sale_detail distribute by region sort by total_price;

+------------+-------------+
| region     | total_price |
+------------+-------------+
| shanghai   | NULL        |
| china      | 100.1       |
| china      | 100.2       |
| china      | 100.3       |
| shanghai   | 100.4       |
| shanghai   | 100.5       |
+------------+-------------+

select region,total_price from sale_detail distribute by region sort by total_price desc;

+------------+-------------+
| region     | total_price |
+------------+-------------+
| shanghai   | 100.5       |
| shanghai   | 100.4       |
| china      | 100.3       |
| china      | 100.2       |
| china      | 100.1       |
| shanghai   | NULL        |
+------------+-------------+

select region,total_price from sale_detail sort by total_price desc;

+------------+-------------+
| region     | total_price |
+------------+-------------+
| china      | 100.3       |
| china      | 100.2       |
| china      | 100.1       |
| shanghai   | 100.5       |
| shanghai   | 100.4       |
| shanghai   | NULL        |
+------------+-------------+

--设置split size大小为1MB，此hint会在读表src时，按照1M的大小来切分task
select a.key from src a /*+split_size(1)*/ join src2 b on a.key=b.key;

功能介绍

使用限制

命令格式

示例数据

WITH子句（cte）

列表达式（select_expr）

排除列（except_expr）

修改列（replace_expr）

目标表信息（table_reference）

WHERE子句（where_condition）

GROUP BY分组查询（col_list）

HAVING子句（having_condition）

ORDER BY全局排序（order_condition）

DISTRIBUTE BY哈希分片（distribute_condition）

SORT BY局部排序（sort_condition）

LIMIT限制输出行数（number）

窗口子句（window_clause）

Split Size Hint