Answer a question

I went through the Cassandra Sink doc but I don't see how to specify the partition and clustering keys.

The doc says this:

You can configure this connector to manage the schema on the Cassandra cluster. When altering an existing table the key is ignored. This is to avoid the potential issues around changing a primary key on an existing table. The key schema is used to generate a primary key for the table when it is created.

If it is a new table, the Connector will use the Key schema (from the KStream I suppose) to create the primary key. That might be Ok for the Partition Key, but not for the Clustering key.

So are we forced to create all the tables with the right keys before running the Streaming app, or is there a way to adjust things ?

Answers

Confluent's connector requires that all columns that are in the primary key should be in the key of the topic (as struct, if I remember correctly). This is one of the its limitations, as it may not be matching your output from application. In this case you'll need to transform topic to match this requirement.

Instead of Confluent's connector, I recommend to take DataStax's Kafka Connector that is carefully designed to effective load of data into Cassandra/DSE. It has following features (more information is in the following blog post):

  • Store data from one topic into one or multiple Cassandra tables (to support data denormalization);
  • Mapping of data in topic into Cassandra columns is defined by configuration file, so you can take any piece of key or value of the message, and map into column;
  • very effective by using unlogged batches where possible & lightweight;
  • support different security features of Cassandra/DSE;

Connector is free to use for DSE starting with DSE 4.8, and Cassandra starting with 2.1.

Logo

华为、百度、京东云现已入驻,来创建你的专属开发者社区吧!

更多推荐