Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition field source columns should be primitive typed #25144

Draft
wants to merge 11 commits into
base: dev
Choose a base branch
from

Conversation

oleiman
Copy link
Member

@oleiman oleiman commented Feb 23, 2025

The source columns, selected by ids, must be a primitive type and cannot be contained in a map or list, but may be nested in a struct.

This PR is sequenced on #25114 - only commit of interest is 62c61f7

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

Logical DATE type should be bound to AVRO_INT only. Previously we used
AVRO_LONG, so the Avro lib would throw when exercising this code path.

Also adds a unit test for Iceberg types that are backed by Avro logicalTypes.

Note that timestamptz serialization is currently bugged, as it relies on an
extra non-standard field ('adjust-to-utc') in the Avro type description.
Technically this should appear in both micro precision timestamp types,
'false' for 'timestamp'; 'true' for 'timestamptz'. As written, both timestamp
types serialize to logical type timestamp-micros, with no way to distinguish
between the two in deser.

See https://iceberg.apache.org/spec/#avro for detail

Signed-off-by: Oren Leiman <[email protected]>
To support special promotions that affect partition transforms

type_promoted:
  - no
  - yes
  - unless_partition

Signed-off-by: Oren Leiman <[email protected]>
Specify either an annotate error or a validate error. This simplifies some of
the conditional logic around instantiating test suites. No functional changes.

Signed-off-by: Oren Leiman <[email protected]>
Though this restriction does not appear in the spec, dropping a data column
that also appears in the table's partition spec can cause validation errors
in clients, downstream. It may be possible to avoid this by performing live
partition spec reconciliation inline with the schema update, but for now we
simply reject such an update out of hand.

Also slightly refactors to consolidate error checks in validate_schema_transform.

Signed-off-by: Oren Leiman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant