Kafka Summit 2023: Announcements & Trends

data architecture, data engineering

To drive successful digital transformation in today's fast-paced business landscape, organisations need robust streaming platforms that can handle the scale, speed, and complexity of data in real-time. Enter Apache Kafka, a distributed streaming platform that has emerged as an integral component of modern data architectures.

Our CTO Sabri Skhiri recently travelled to London for the Kafka Summit 2023, an annual event dedicated to Kafka and its ecosystem. In this article, he will delve into some of the keynotes and talks that took place during the summit, highlighting the noteworthy insights, the practical applications of event streaming, and the innovations shared by industry leaders.

 

Main Announcements for Kafka Enthusiasts
Before starting, let me mention that I will not go into technical details in this article, but feel free to consult my longer blog post on our research website, where I go deeper into Jay’s keynote, practical applications, tech discoveries, and favourite talks.

The first keynote speech was delivered by Mr Jay Kreps, the Kafka guru who co-created Kafka with Neha Narkhede and Jun Rao in 2011 at LinkedIn and later co-founded Confluent in 2014. Kreps presented the main trends of the conference and highlighted key announcements that are set to transform the Kafka experience.
Interestingly, it reminded me of the Spark Summit 2018 when Databricks went 100% on the cloud with Delta Lake. At that time, there was a notable trend among major players in the data processing and analytics space, such as Kafka, Cloudera, and Spark, to shift towards offering their platforms as cloud-based services. They realised that their existing customer bases were huge, but they were not financially exploiting it due to limitations associated with on-premises deployments—the solution: transitioning to the cloud.

Back in 2017, Kafka released a fully managed platform that could welcome these millions of unreachable customers, relieved of the burden of infrastructure management. Indeed, that should have made their Kafka experience way easier: auto-scaling, high availability, multi-AZ management (a nightmare in Kafka), etc. It should have been great, right?

But if you use a central component like Kafka, you need several things:

  • Integration with your existing data landscape
  • Integration with strong governance
  • A strong data flow processing system that can offer a real-time data service, but also cataloguing, data mining, etc.
  • Integration with your partners' ecosystem by sharing your data.

All these elements were missing until ... the announcements of the Kafka 2023 Summit:

  1. A new cloud management platform: KORA is a large-scale, multi-tenant Kafka platform in the cloud. With KORA, users can now scale their Kafka infrastructure, ensure high availability, and manage multi-AZ deployments, thereby simplifying the Kafka experience for millions of users previously beyond reach.
  2. Integration of Flink SQL within Confluent Cloud Platform: this integration brings powerful stream processing capabilities, data exploration, and real-time analytics to users. This is a really nice piece of work. By connecting to a cloud-native Flink service integrated into the Confluent Cloud platform, users gain access to a streamlined experience that includes viewing the database catalogue, exploring tables, materialising results in persistent Kafka topics, auto-capabilities, and more. The Confluent cloud management platform seamlessly handles deployment, upgrades, and continuous integration across different environments.
  3. Enhanced Connectivity with Custom Connectors: Confluent answered the demand for greater flexibility in data integration by announcing the general availability (GA) of custom connector features within the Confluent Cloud platform. This allows users to deploy their own connectors or choose from a selection of over 100 open-source connectors. That should help to boost your on-prem and on-cloud integration.
  4. Empowering Data Governance: they also announced at the summit the GA of the governance feature that supports schema management, data validation, lineage tracking, and now incorporates data quality rules! Users can apply data quality rules directly to their data streams, configure necessary transformations, and implement a dead-letter queue pattern. Although, from the discussion I had with Confluent experts, it is mainly a constraint language used directly at the schema level. They are compatible with Google CEL. Currently, there is nothing to support more than that, such as referential integrity or cross-message constraints.
  5. Stream Sharing Beyond Enterprise Boundaries: they announced the GA of the stream-sharing feature. It enables users to seamlessly share streams of data outside their enterprise, facilitating data-driven partnerships and enabling deeper insights and innovation across the ecosystem.

The Big Trends

Categories of Talks

Most of the talks at the Kafka Summit can be categorised into three key areas:

  1. Customer deployments showcased specific use cases where Kafka is implemented in production contexts.
  2. Speakers delved into best practices to enhance understanding and utilisation of the Kafka ecosystem. Topics such as rebalancing partitions, the role of consumer groups, Kafka Connect architecture, transactional operations, and other intricate aspects were discussed, providing attendees with insights to optimise their Kafka implementations.
  3. Observability emerged as a central theme, with a focus on distributed application tracing.

The Rise of Apache Flink and Java Usage

Two other things I still want to add here. One notable trend observed throughout the summit was the widespread adoption of Apache Flink as the preferred stream processing framework. Talks highlighted Flink's scalability, stateful streaming capabilities, and seamless integration with Kafka, positioning it as the de facto standard for stateful streaming applications.

It was fascinating to see the complete absence of Apache Spark Structured streaming. Is there a strong communication strategy from Confluent? I don't know, but that was cool to see Flink everywhere!

Another interesting observation was the prevalent usage of Java as the primary development language in the Kafka ecosystem. Presenters predominantly showcased examples and demonstrations using Java, emphasising its widespread adoption among developers. This reinforces the notion that Java and Scala remain prominent languages for building event-driven architectures and leveraging the Kafka ecosystem.

To Sum Up
The summit showcased the industry's commitment to simplifying the Kafka experience and addressing the evolving needs of businesses, with advancements like the KORA cloud management platform, seamless integration of Flink SQL within Confluent Cloud, enhanced connectivity with custom connectors, or the data governance features. For companies seeking to lead their digital transformation initiatives, Kafka can offer them the capacity to harness the power of real-time data and drive innovation at an unprecedented pace. Yet, as we move forward, it is essential to stay abreast of the ever-evolving streaming ecosystem, collaborate with industry experts, and explore new ways to leverage Kafka's capabilities. 

Do not hesitate to reach out to us for guidance, support or custom solutions to navigate the complexities of event streaming, microservices, and real-time data processing. 

All blog articles