Event-driven architecture on OKD
This article delves into the intricacies of building an event-driven microservices architecture on OpenShift (OKD). Detailing the rigorous process of design and implementation, we explore how the robust architecture handled high web traffic and optimized resource utilization, ultimately leading to operational cost savings, improved user experience, and faster deployment cycles.
15 min read
In this Article:
- Explore the technical and business objectives of a project aimed at redesigning a client’s web application using a microservices-based architecture.
- Uncover the features and functionalities of an asynchronous, event-driven communication system hosted on OpenShift (OKD).
- Understand the process and outcomes of stress testing on Apache Flink, PostgreSQL, and Nginx, revealing the resilience and robustness of the implemented architecture.
- Get insights into the suite of technologies and tools used in the project, including Grafana, Prometheus, Apache Kafka, PostgreSQL, and more.
- Discover the extensive client benefits delivered by the project, including business agility, cost efficiency, enhanced user experience, and improved security measures.
Venturing into the realm of microservices, we embarked on an ambitious project aimed at designing an advanced web application capable of handling heavy web traffic and changing workload conditions. Our client sought not just an upgrade, but a transformation that could yield operational cost savings, an enhanced user experience, and a faster time-to-market. This article outlines our approach and experience throughout this project, along with its significant results. The upcoming sections will provide a detailed account of our architectural approach, the critical features we implemented, the technologies we utilized, the stress tests we conducted, and the substantial benefits our client received as a result.
Project Background
Our client, operating in a competitive digital space, found their existing system increasingly unable to cope with the demands of their expanding user base and the dynamic nature of web traffic. This scenario resulted in a pressing need to optimize resource utilization, improve system scalability, resilience, and offer real-time insights into system performance. The shortcomings of the existing system were not just about dealing with traffic; they were about missed opportunities and incremental business costs.
The decision was taken to transform their web application into a microservices-based architecture, specifically utilizing an asynchronous event-driven communication model. This transformative step was driven by both a technical necessity and a business imperative.
In the sections that follow, we will take a deeper dive into the approach we took to achieve this, the technologies and tools that enabled our journey, the testing scenarios we navigated, and the immense benefits that resulted from this transition.
Our Approach and Implementation
Our task was to construct a microservices architecture using asynchronous, event-driven communication hosted on OpenShift (OKD). This design had to demonstrate resilience under various workloads and potential component overloads, all while maintaining an optimized user experience. In line with our client’s expectations, our efforts focused on implementing features such as liveness and readiness probes, pod auto-scaling, and configuring the system’s proxy.
Project Features & Functionalities
The architecture and design of the project were characterized by a range of features and functionalities, each carefully chosen and developed to meet the project’s business and technical objectives. Both from a business and a technical standpoint, these features formed the bedrock of the project, providing the capabilities required to fulfill scalability, resilience, and real-time monitoring.
From a business perspective, the key features included scalability and resilience. The architecture demonstrated auto-scalability, a critical requirement for any system dealing with peak times or an expanding user base. Auto-scalability ensured a high-quality user experience at all times, regardless of traffic volumes. Resilience was another core feature, highlighting the system’s capability to continue functioning efficiently under potentially overloaded conditions, thereby safeguarding business continuity.
Real-time monitoring and alerts provided a significant business advantage. This feature offered immediate visibility into the system’s performance, enabling timely interventions to prevent or minimize service disruptions. In a rapidly evolving business environment, such timely insights can be invaluable in maintaining high levels of user satisfaction and achieving operational efficiency.
On the technical side, the project was replete with a multitude of features and functionalities. Asynchronous communication using message queues was a key technical feature, facilitating effective inter-service communication. The stateless gateway application integrated with KeyCloak played a pivotal role in managing identity and access, while real-time event processing was handled by Apache Flink.
Database operations were performed by Flink jobs on PostgreSQL, providing robust and efficient data management capabilities. The project also demonstrated the system’s behavior under load and component overload, giving valuable insights into its resilience. Additionally, pod auto-scaling with liveness and readiness checks were vital for maintaining the system’s health and performance.
A salient feature was the use of Nginx as a reverse proxy with rate limits. This setup optimized the distribution of network traffic to the servers, ensuring that the system could handle large volumes of data and user requests efficiently.
The combination of these features and functionalities was instrumental in shaping the project’s successful outcome. By seamlessly integrating business and technical features, the project was able to meet its objectives and deliver a robust, scalable, and efficient system capable of driving substantial business benefits.
Stress Testing & Observations
An essential aspect of any robust technical system lies in its ability to withstand stress, heavy loads, and unexpected failures. For this project, stress testing was conducted on three crucial components: Apache Flink, PostgreSQL, and Nginx. The results of these tests provided invaluable insights into the system’s behavior and response under stress, further strengthening its resilience and reliability.
The Apache Flink component was subjected to overload scenarios, both when the database was responding slowly and when there were a lot of incoming messages from Kafka Topic. Interestingly, even in these cases, the overall system was unaffected due to the asynchronous nature of data processing. Furthermore, if Flink Jobs failed due to lack of resources or an Apache Flink Cluster Failure, the messages produced by the gateway were stored in Kafka Topic and waited for processing upon restoration of the Apache Flink Cluster.
PostgreSQL, the database component, was also tested under reduced resource conditions and heavy data processing loads. Despite being slower in the overloaded state, all operations were completed, proving the system’s resilience under stress. If PostgreSQL was not functioning, the unprocessed messages grew in Kafka but did not impact its performance. This behavior underlined the system’s ability to maintain performance and reliability even under failure scenarios.
Nginx, the reverse proxy, was configured with specific limits to test its behavior under heavy loads. The tests showed that if the limits were reached, new queries were rejected by the proxy server, indicating an effective protection mechanism against potential overloads. An example configuration illustrated the versatility of Nginx, showcasing how different limit and buffer configurations can affect system behavior.
These stress tests and the subsequent observations provided essential insights into the system’s robustness and reliability. By testing under extreme conditions and potential failure scenarios, the project demonstrated its readiness to handle real-world challenges and maintain its performance and efficiency, ensuring the client’s business continuity and user satisfaction.
Tools & Technologies Used
The successful execution and delivery of this project involved harnessing a suite of modern tools and technologies, each playing a crucial role in designing, implementing, and monitoring the microservices-based architecture.
OpenShift Container Platform, or OKD, was the foundation of the project, providing a robust and scalable platform for hosting the microservices architecture. OKD’s built-in features for container orchestration, security, and management made it the ideal choice for this project.
Helm was utilized for the management of Kubernetes applications, simplifying the deployment and configuration processes. For visualization and real-time monitoring of system metrics, Grafana was paired with Prometheus and Prometheus PushGateway, creating an effective monitoring and alerting system.
Jfrog Artifactory served as a universal artifact repository. For asynchronous communication, Apache Kafka and RabbitMQ were chosen, while Apache Flink enabled real-time event processing.
PostgreSQL was used for database operations, renowned for its reliability, data integrity, and extensibility. For identity and access management, Keycloak played a pivotal role, integrating seamlessly with the stateless gateway application. Java and Spring were used to develop the Gateway application, offering a comprehensive programming and configuration model.
Nginx acted as the reverse proxy, efficiently managing and directing network traffic. It was also used to implement rate limits, enhancing the security of the system. For continuous integration and delivery, GitLab CI/CD was utilized, automating the software development lifecycle.
Finally, for simulating application traffic and stress testing, Locust and WRK were employed, providing valuable insights into the system’s behavior under varying workloads.
The judicious selection and effective use of these tools and technologies were instrumental in building a resilient, scalable, and efficient system that not only met the project’s technical objectives but also aligned strategically with the client’s business goals.
Client Benefits
The project delivered a plethora of significant benefits for the client, fundamentally enhancing their technical capabilities while strategically positioning them for improved operational efficiency, business agility, and customer satisfaction.
The event-driven, microservices architecture proved to be a major win for the client’s business. This flexible architecture allows for independent modification, scaling, or introduction of new services, ensuring that changing business requirements can be quickly adapted to. As a result, the client can now get new features or updates to market faster than before.
The project also led to significant cost-efficiencies. By showcasing auto-scaling and optimal resource allocation, the client is now empowered to make more informed decisions regarding resource usage. This can result in substantial cost savings by efficiently scaling resources based on demand and preventing resource overprovisioning or underprovisioning.
Another crucial advantage for the client was the enhancement of user experience. The system’s ability to maintain performance and availability under varying workloads and potential component failures ensures a consistently high-quality user experience. Furthermore, the implementation of rate limiting ensures fair resource usage, contributing to maintaining a consistent service level for all users.
One of the key aspects of this project was the focus on risk mitigation. The stress testing and observation of system behavior under overload conditions provided the client with a deep understanding of their system’s capabilities. This knowledge can be invaluable in formulating robust contingency plans, thereby improving system reliability and mitigating potential business risks.
The use of modern monitoring tools like Grafana and Prometheus provides the client with a robust framework for data-driven decision making. With real-time insights into their system, they can proactively identify potential bottlenecks or areas of improvement, leading to continuous system optimization.
This rich pool of real-time data and intuitive visualization helps the client make proactive and informed decisions about their system. They can now anticipate issues before they become problems, ensuring the smooth operation of their services and ultimately contributing to continuous system optimization.
This approach extends beyond resolving immediate issues—it helps shape the strategic direction of the client’s system infrastructure, enabling them to prioritize resources effectively, optimize their architecture, and consistently enhance system performance over time.
Security also received a significant boost from the project. The implementation of rate limiting at the Nginx proxy level brought about a vital layer of security to the client’s web application. This precautionary measure guards against potential abuse or denial-of-service attacks by controlling the request load that reaches their application.
The client can now better protect their services without compromising on user experience. Rate limiting ensures fair usage of resources among clients and maintains consistent service levels, significantly contributing to the overall integrity, reliability, and resilience of the system. This security enhancement not only safeguards digital assets but also fortifies their reputation for providing secure, reliable services to their users.
A Summary
This project served as a definitive testament to the technical robustness of an event-driven microservices architecture, as well as its strategic alignment with business objectives. We demonstrated a system that not only withstood stress testing under various conditions but also showcased its scalable and resilient nature in handling large volumes of data and user requests. All in all, an event-driven microservices architecture, when backed by a well-designed architecture and effective use of modern tools and technologies, can deliver a resilient and scalable system that aligns perfectly with both technical and business objectives.
For a more in-depth and technical analysis of the project, check out this article, which delves into the specifics of the architecture, tools, and methodologies used in more detail. It’s an essential resource for those looking to gain a deeper understanding of implementing a resilient and scalable microservices architecture.