Skip to content

Commit 456c920

Browse files
🎉 New Destination: Apache Iceberg (#18836)
* wip: developing Iceberg(s3 & hive catalog) Destination * wip: developing Iceberg(s3 & hive catalog) Destination 2 * wip: developing Iceberg(s3 & hive catalog) Destination 3 * wip: developing Iceberg(s3 & hive catalog) Destination 3 * wip: developing Iceberg(s3 & hive catalog) Destination 2 * refactor: config * feat: add hadoop and jdbc catalog implements * docs: add docs and config examples * style * feat: S3Config * fix: acceptance test, and unit test * chore: remove sensitive logs * docs: builds.md * refactor: 1.add flush batch size and auto compact configs 2.refactor package 3. add unit tests * test: add integration test * test: Add HadoopCatalog integration tests * docs: add bootstrap.md * test: Add HiveCatalog integration tests * perf: purge drop temp Iceberg table * chore: delete unnecessary log * remove iceberg accpt test file * run format * readd iceberg * regenrate spec Co-authored-by: marcosmarxm <[email protected]>
1 parent 2c451b3 commit 456c920

File tree

49 files changed

+3479
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+3479
-1
lines changed

airbyte-config/init/src/main/resources/seed/destination_definitions.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,12 @@
2424
dockerImageTag: 0.1.0
2525
documentationUrl: https://docs.airbyte.com/integrations/destinations/doris
2626
releaseStage: alpha
27+
- name: Apache Iceberg
28+
destinationDefinitionId: df65a8f3-9908-451b-aa9b-445462803560
29+
dockerRepository: airbyte/destination-iceberg
30+
dockerImageTag: 0.1.0
31+
documentationUrl: https://docs.airbyte.com/integrations/destinations/iceberg
32+
releaseStage: alpha
2733
- name: AWS Datalake
2834
destinationDefinitionId: 99878c90-0fbd-46d3-9d98-ffde879d17fc
2935
dockerRepository: airbyte/destination-aws-datalake

airbyte-config/init/src/main/resources/seed/destination_specs.yaml

Lines changed: 275 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,6 +249,281 @@
249249
supported_destination_sync_modes:
250250
- "append"
251251
- "overwrite"
252+
- dockerImage: "airbyte/destination-iceberg:0.1.0"
253+
spec:
254+
documentationUrl: "https://docs.airbyte.com/integrations/destinations/iceberg"
255+
connectionSpecification:
256+
$schema: "http://json-schema.org/draft-07/schema#"
257+
title: "Iceberg Destination Spec"
258+
type: "object"
259+
required:
260+
- "catalog_config"
261+
- "storage_config"
262+
- "format_config"
263+
properties:
264+
catalog_config:
265+
title: "Iceberg catalog config"
266+
type: "object"
267+
description: "Catalog config of Iceberg."
268+
oneOf:
269+
- title: "HiveCatalog: Use Apache Hive MetaStore"
270+
required:
271+
- "catalog_type"
272+
- "hive_thrift_uri"
273+
properties:
274+
catalog_type:
275+
title: "Catalog Type"
276+
type: "string"
277+
default: "Hive"
278+
enum:
279+
- "Hive"
280+
order: 0
281+
hive_thrift_uri:
282+
title: "Hive Metastore thrift uri"
283+
type: "string"
284+
description: "Hive MetaStore thrift server uri of iceberg catalog."
285+
examples:
286+
- "host:port"
287+
order: 1
288+
database:
289+
title: "Default database"
290+
description: "The default database tables are written to if the source\
291+
\ does not specify a namespace. The usual value for this field is\
292+
\ \"default\"."
293+
type: "string"
294+
default: "default"
295+
examples:
296+
- "default"
297+
order: 2
298+
- title: "HadoopCatalog: Use hierarchical file systems as same as storage\
299+
\ config"
300+
description: "A Hadoop catalog doesn’t need to connect to a Hive MetaStore,\
301+
\ but can only be used with HDFS or similar file systems that support\
302+
\ atomic rename."
303+
required:
304+
- "catalog_type"
305+
properties:
306+
catalog_type:
307+
title: "Catalog Type"
308+
type: "string"
309+
default: "Hadoop"
310+
enum:
311+
- "Hadoop"
312+
order: 0
313+
database:
314+
title: "Default database"
315+
description: "The default database tables are written to if the source\
316+
\ does not specify a namespace. The usual value for this field is\
317+
\ \"default\"."
318+
type: "string"
319+
default: "default"
320+
examples:
321+
- "default"
322+
order: 1
323+
- title: "JdbcCatalog: Use relational database"
324+
description: "Using a table in a relational database to manage Iceberg\
325+
\ tables through JDBC. Read more <a href=\"https://iceberg.apache.org/docs/latest/jdbc/\"\
326+
>here</a>. Supporting: PostgreSQL"
327+
required:
328+
- "catalog_type"
329+
properties:
330+
catalog_type:
331+
title: "Catalog Type"
332+
type: "string"
333+
default: "Jdbc"
334+
enum:
335+
- "Jdbc"
336+
order: 0
337+
database:
338+
title: "Default schema"
339+
description: "The default schema tables are written to if the source\
340+
\ does not specify a namespace. The usual value for this field is\
341+
\ \"public\"."
342+
type: "string"
343+
default: "public"
344+
examples:
345+
- "public"
346+
order: 1
347+
jdbc_url:
348+
title: "Jdbc url"
349+
type: "string"
350+
examples:
351+
- "jdbc:postgresql://{host}:{port}/{database}"
352+
order: 2
353+
username:
354+
title: "User"
355+
description: "Username to use to access the database."
356+
type: "string"
357+
order: 3
358+
password:
359+
title: "Password"
360+
description: "Password associated with the username."
361+
type: "string"
362+
airbyte_secret: true
363+
order: 4
364+
ssl:
365+
title: "SSL Connection"
366+
description: "Encrypt data using SSL. When activating SSL, please\
367+
\ select one of the connection modes."
368+
type: "boolean"
369+
default: false
370+
order: 5
371+
catalog_schema:
372+
title: "schema for Iceberg catalog"
373+
description: "Iceberg catalog metadata tables are written to catalog\
374+
\ schema. The usual value for this field is \"public\"."
375+
type: "string"
376+
default: "public"
377+
examples:
378+
- "public"
379+
order: 6
380+
order: 0
381+
storage_config:
382+
title: "Storage config"
383+
type: "object"
384+
description: "Storage config of Iceberg."
385+
oneOf:
386+
- title: "S3"
387+
type: "object"
388+
description: "S3 object storage"
389+
required:
390+
- "storage_type"
391+
- "access_key_id"
392+
- "secret_access_key"
393+
- "s3_warehouse_uri"
394+
properties:
395+
storage_type:
396+
title: "Storage Type"
397+
type: "string"
398+
default: "S3"
399+
enum:
400+
- "S3"
401+
order: 0
402+
access_key_id:
403+
type: "string"
404+
description: "The access key ID to access the S3 bucket. Airbyte requires\
405+
\ Read and Write permissions to the given bucket. Read more <a href=\"\
406+
https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys\"\
407+
>here</a>."
408+
title: "S3 Key ID"
409+
airbyte_secret: true
410+
examples:
411+
- "A012345678910EXAMPLE"
412+
order: 0
413+
secret_access_key:
414+
type: "string"
415+
description: "The corresponding secret to the access key ID. Read\
416+
\ more <a href=\"https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys\"\
417+
>here</a>"
418+
title: "S3 Access Key"
419+
airbyte_secret: true
420+
examples:
421+
- "a012345678910ABCDEFGH/AbCdEfGhEXAMPLEKEY"
422+
order: 1
423+
s3_warehouse_uri:
424+
title: "S3 Warehouse Uri for Iceberg"
425+
type: "string"
426+
description: "The Warehouse Uri for Iceberg"
427+
examples:
428+
- "s3a://my-bucket/path/to/warehouse"
429+
- "s3://my-bucket/path/to/warehouse"
430+
order: 2
431+
s3_bucket_region:
432+
title: "S3 Bucket Region"
433+
type: "string"
434+
default: ""
435+
description: "The region of the S3 bucket. See <a href=\"https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions\"\
436+
>here</a> for all region codes."
437+
enum:
438+
- ""
439+
- "us-east-1"
440+
- "us-east-2"
441+
- "us-west-1"
442+
- "us-west-2"
443+
- "af-south-1"
444+
- "ap-east-1"
445+
- "ap-south-1"
446+
- "ap-northeast-1"
447+
- "ap-northeast-2"
448+
- "ap-northeast-3"
449+
- "ap-southeast-1"
450+
- "ap-southeast-2"
451+
- "ca-central-1"
452+
- "cn-north-1"
453+
- "cn-northwest-1"
454+
- "eu-central-1"
455+
- "eu-north-1"
456+
- "eu-south-1"
457+
- "eu-west-1"
458+
- "eu-west-2"
459+
- "eu-west-3"
460+
- "sa-east-1"
461+
- "me-south-1"
462+
- "us-gov-east-1"
463+
- "us-gov-west-1"
464+
order: 3
465+
s3_endpoint:
466+
title: "Endpoint"
467+
type: "string"
468+
default: ""
469+
description: "Your S3 endpoint url. Read more <a href=\"https://docs.aws.amazon.com/general/latest/gr/s3.html#:~:text=Service%20endpoints-,Amazon%20S3%20endpoints,-When%20you%20use\"\
470+
>here</a>"
471+
examples:
472+
- "http://localhost:9000"
473+
- "localhost:9000"
474+
order: 4
475+
s3_path_style_access:
476+
type: "boolean"
477+
description: "Use path style access"
478+
examples:
479+
- true
480+
- false
481+
default: true
482+
order: 5
483+
order: 1
484+
format_config:
485+
title: "File format"
486+
type: "object"
487+
required:
488+
- "format"
489+
description: "File format of Iceberg storage."
490+
properties:
491+
format:
492+
title: "File storage format"
493+
type: "string"
494+
default: "Parquet"
495+
description: ""
496+
enum:
497+
- "Parquet"
498+
- "Avro"
499+
order: 0
500+
flush_batch_size:
501+
title: "Data file flushing batch size"
502+
description: "Iceberg data file flush batch size. Incoming rows write\
503+
\ to cache firstly; When cache size reaches this 'batch size', flush\
504+
\ into real Iceberg data file."
505+
type: "integer"
506+
default: 10000
507+
order: 1
508+
auto_compact:
509+
title: "Auto compact data files"
510+
description: "Auto compact data files when stream close"
511+
type: "boolean"
512+
default: false
513+
order: 2
514+
compact_target_file_size_in_mb:
515+
title: "Target size of compacted data file"
516+
description: "Specify the target size of Iceberg data file when performing\
517+
\ a compaction action. "
518+
type: "integer"
519+
default: 100
520+
order: 3
521+
order: 2
522+
supportsNormalization: false
523+
supportsDBT: false
524+
supported_destination_sync_modes:
525+
- "overwrite"
526+
- "append"
252527
- dockerImage: "airbyte/destination-aws-datalake:0.1.1"
253528
spec:
254529
documentationUrl: "https://docs.airbyte.com/integrations/destinations/aws-datalake"

airbyte-integrations/builds.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,8 @@
164164
| Google Cloud Storage (GCS) | [![destination-gcs](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-gcs%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-gcs) |
165165
| Google Firestore | [![destination-firestore](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-firestore%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-firestore) |
166166
| Google PubSub | [![destination-pubsub](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-pubsub%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-pubsub) |
167-
| Google Sheets | [![destination-sheets](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-sheets%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-sheets) |
167+
| Google Sheets | [![destination-sheets](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-sheets%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-sheets) | |
168+
| Apache Iceberg | [![destination-iceberg](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-iceberg%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-iceberg)
168169
| Kafka | [![destination-kafka](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-kafka%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-kafka) |
169170
| Keen (Chargify) | [![destination-keen](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-keen%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-keen) |
170171
| Local CSV | [![destination-csv](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-csv%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-csv) |
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
*
2+
!Dockerfile
3+
!build
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
FROM airbyte/integration-base-java:dev AS build
2+
3+
WORKDIR /airbyte
4+
ENV APPLICATION destination-iceberg
5+
6+
COPY build/distributions/${APPLICATION}*.tar ${APPLICATION}.tar
7+
8+
RUN tar xf ${APPLICATION}.tar --strip-components=1 && rm -rf ${APPLICATION}.tar
9+
10+
FROM airbyte/integration-base-java:dev
11+
12+
WORKDIR /airbyte
13+
ENV APPLICATION destination-iceberg
14+
15+
ENV JAVA_OPTS="--add-opens java.base/java.lang=ALL-UNNAMED \
16+
--add-opens java.base/java.util=ALL-UNNAMED \
17+
--add-opens java.base/java.lang.reflect=ALL-UNNAMED \
18+
--add-opens java.base/java.text=ALL-UNNAMED \
19+
--add-opens java.base/sun.nio.ch=ALL-UNNAMED \
20+
--add-opens java.base/java.nio=ALL-UNNAMED "
21+
22+
COPY --from=build /airbyte /airbyte
23+
24+
LABEL io.airbyte.version=0.1.0
25+
LABEL io.airbyte.name=airbyte/destination-iceberg

0 commit comments

Comments
 (0)