添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
相关文章推荐
重感情的水煮肉  ·  Reading Multiple ...·  1 月前    · 
安静的李子  ·  UNLOAD - Amazon Redshift·  1 月前    · 
刚毅的棒棒糖  ·  c# - XAML Based on ...·  2 年前    · 

DuckDB can read multiple files of different types (CSV, Parquet, JSON files) at the same time using either the glob syntax, or by providing a list of files to read. See the combining schemas page for tips on reading files with different schemas.

Read the CSV files flights1.csv and flights2.csv , unifying schemas by name and outputting a filename column:

SELECT *
FROM read_csv(['flights1.csv', 'flights2.csv'], union_by_name = true, filename = true);
        Parquet

DuckDB can also read a series of Parquet files and treat them as if they were a single table. Note that this only works if the Parquet files have the same schema. You can specify which Parquet files you want to read using a list parameter, glob pattern matching syntax, or a combination of both.

List Parameter

The read_parquet function can accept a list of filenames as the input parameter.

Read three Parquet files and treat them as a single table:

SELECT *
FROM read_parquet(['file1.parquet', 'file2.parquet', 'file3.parquet']);
        Glob Syntax

Note that the ? wildcard in globs is not supported for reads over S3 due to HTTP encoding issues.

Here is an example that reads all the files that end with .parquet located in the test folder:

Read all files that match the glob pattern:

SELECT *
FROM read_parquet('test/*.parquet');
        List of Globs

The glob syntax and the list input parameter can be combined to scan files that meet one of multiple patterns.

Read all Parquet files from 2 specific folders.

SELECT *
FROM read_parquet(['folder1/*.parquet', 'folder2/*.parquet']);

DuckDB can read multiple CSV files at the same time using either the glob syntax, or by providing a list of files to read.

Filename

The filename argument can be used to add an extra filename column to the result that indicates which row came from which file. For example:

SELECT *
FROM read_csv(['flights1.csv', 'flights2.csv'], union_by_name = true, filename = true);

The glob pattern matching syntax can also be used to search for filenames using the glob table function. It accepts one parameter: the path to search (which may include glob patterns).

Search the current directory for all files.

SELECT *
FROM glob('*');