Support data source sampling with TABLESAMPLE

### Is your feature request related to a problem or challenge?

It is helpful to have sampling support for queries to ease the exploration of data.

### Describe the solution you'd like

It should be supported on the SQL level (`SAMPLE` or `TABLESAMPLE` syntax). The sampling construct should be passed to the table source so the sampling is performed at the scan plan (e.g. in an optimised parquet reader).

This feature could be implemented in three sequential stages:
1. Support additional SQL syntax but fail in the physical plan builder
2. Transparently convert to `WHERE RANDOM() < P` filter
3. For eligible data sources push the sampling to the table source

### Describe alternatives you've considered

It is possible to use `WHERE RANDOM() < 0.1` selection (see discussion https://github.com/apache/datafusion/issues/13268 ), but the support in SQL is clearer.

Existing query engines and databases already implement sampling, but it is not in ANSI standard. There are different flavours, but essentially, they allow for specific sampling methods and percentages (or sometimes a number of rows) `TABLESAMPLE [SYSTEM | BERNOULLI] (PERCENTAGE | ROWS)`

[DuckDB](https://duckdb.org/docs/sql/samples.html#table-samples):
```sql
SELECT * FROM tbl TABLESAMPLE SYSTEM (10%),
```

[PostgreSQL](https://www.postgresql.org/docs/current/sql-select.html#SQL-FROM) and [Trino](https://trino.io/docs/current/sql/select.html#tablesample):
```sql
SELECT * FROM tbl TABLESAMPLE SYSTEM (10),
```

[Spark](https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-sampling.html)
```sql
SELECT * FROM tbl TABLESAMPLE SYSTEM (10 PERCENT)
```

[Clickhouse](https://clickhouse.com/docs/en/sql-reference/statements/select/sample) is different:
```sql
SELECT * FROM tbl SAMPLE 0.1
```

### Additional context

Also requested in #11554. The filter for sampling was refined in #13268.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support data source sampling with TABLESAMPLE #13563

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support data source sampling with TABLESAMPLE #13563

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions