You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SUPPORT] HoodieStreamer: Encountering ClassNotFoundException: io.confluent.kafka.schemaregistry.protobuf.ProtobufSchemaProvider after upgrading to the latest version
#12838
Open
YousifS7 opened this issue
Feb 13, 2025
· 1 comment
Hello,
We are using org.apache.hudi.utilities.streamer.HoodieStreamer class to extract data out of Kafka and write to Hudi table. The Kafka topic is populated via Debezium using SQL Server table. The converter used in Debezium is Avro. We are using EMR 7.6.0 to run Spark-Submit.
This works perfectly when using Hudi 0.15.0. However, after switching to Hudi 1.0.1, we started encountering this error:
CACHE TABLE dbz_filtered AS
SELECT CONCAT(source.commit_lsn, ':', ifnull(source.change_lsn, 0), ':', ifnull(source.event_serial_no, 0)) AS ts, CASE WHEN op = 'd' THEN before ELSE after END AS source_fields, CASE WHEN op = 'd' THEN true ELSE false END AS is_deleted FROM <SRC> WHERE op IN ('d', 'u', 'c', 'r');
SELECT ts, is_deleted, source_fields.*, CONCAT(trim(source_fields.some_col1), source_fields.some_col2, latest) AS hudi_key, from_unixtime(source_fields.some_col2/1000, 'yyyyMM') AS partition_path FROM dbz_filtered;
Environment Description
Hudi version : 1.0.1
Spark version : 3.5.3
Hive version : Glue Metastore
Hadoop version : N/A
Storage : S3
Running on Docker? : No
Error Message
ERROR Client: Application diagnostics message: User class threw exception: java.lang.NoClassDefFoundError: io/confluent/kafka/schemaregistry/protobuf/ProtobufSchemaProvider
at org.apache.hudi.utilities.schema.SchemaRegistryProvider.lambda$new$1e9d4812$1(SchemaRegistryProvider.java:113)
at org.apache.hudi.utilities.schema.SchemaRegistryProvider.fetchSchemaFromRegistry(SchemaRegistryProvider.java:173)
at org.apache.hudi.utilities.schema.SchemaRegistryProvider.parseSchemaFromRegistry(SchemaRegistryProvider.java:141)
at org.apache.hudi.utilities.schema.SchemaRegistryProvider.getSourceSchema(SchemaRegistryProvider.java:252)
at org.apache.hudi.utilities.streamer.SourceFormatAdapter.avroDataInRowFormat(SourceFormatAdapter.java:212)
at org.apache.hudi.utilities.streamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:238)
at org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:639)
at org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:582)
at org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:554)
at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:464)
at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:911)
at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
at org.apache.hudi.common.util.Option.ifPresent(Option.java:101)
at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:226)
at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:646)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:741)
Caused by: java.lang.ClassNotFoundException: io.confluent.kafka.schemaregistry.protobuf.ProtobufSchemaProvider
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
We are not using protobuf anywhere in the pipeline. Not sure why this version is complaining about it. If we switch back to 0.15.0 the error goes away. Any help would be appreciated.
Thank you
The text was updated successfully, but these errors were encountered:
Hello,
We are using
org.apache.hudi.utilities.streamer.HoodieStreamer
class to extract data out of Kafka and write to Hudi table. The Kafka topic is populated via Debezium using SQL Server table. The converter used in Debezium is Avro. We are using EMR 7.6.0 to run Spark-Submit.This works perfectly when using Hudi 0.15.0. However, after switching to Hudi 1.0.1, we started encountering this error:
To Reproduce
Steps to reproduce the behavior:
Environment Description
Hudi version : 1.0.1
Spark version : 3.5.3
Hive version : Glue Metastore
Hadoop version : N/A
Storage : S3
Running on Docker? : No
Error Message
We are not using protobuf anywhere in the pipeline. Not sure why this version is complaining about it. If we switch back to 0.15.0 the error goes away. Any help would be appreciated.
Thank you
The text was updated successfully, but these errors were encountered: