Manufacturing

AWS Redshift Transforms BSH's Data Management and Efficiency

AWS Redshift and associated services transformed BSH’s data management, enhancing operational efficiency, decision-making, and global strategic initiatives, while also providing a scalable foundation for future endeavors.

BSH

About the Customer

BSH Hausgeräte GmbH (German for ‘BSH Home Appliances’, stylized as B/S/H/) is the largest manufacturer of home appliances in Europe and one of the leading companies in the sector worldwide. The group stemmed from a joint venture set up in May 1967 between Robert Bosch GmbH (Stuttgart) and Siemens AG (Munich), and it posted annual sales of 15.6 billion euros in the year 2021. BSH is an abbreviation for Bosch und Siemens Hausgeräte.

Customer Challenge

The goal was to develop a robust data warehousing solution for BSH. There were significant challenges in managing and analyzing vast amounts of data generated across its global operations. The primary challenge was to consolidate diverse data streams from multiple sources into a single, coherent framework to enable efficient data analysis and decision-making. This data included user interactions, operational metrics, and machine-generated data from various internal applications, each using different formats and protocols. The lack of a unified data platform led to siloed data pools, complicating the data processing and analytics tasks. BSH needed a solution that could not only integrate these disparate data sources seamlessly but also scale dynamically to accommodate fluctuating data volumes and complex query requirements.

Furthermore, BSH required a robust data warehousing solution that could support high-performance computing and real-time data processing to serve the needs of its dynamic and fast-paced business environment. The existing systems were not equipped to handle the intensive workloads effectively, often resulting in delayed responses and slower time-to-insight for data-driven decisions. The challenge was to establish a high-performance and scalable architecture capable of handling simultaneous data queries from multiple global endpoints securely and efficiently. This architecture needed to ensure data integrity, provide high availability, and maintain strict compliance with international data security standards, as BSH operates in multiple regulatory environments.

How the solution was deployed to meet the challenge

The Commencis team, leveraging AWS cloud services, developed a comprehensive solution to integrate and optimize their global data management processes. The solution was deployed in two key phases: Data Consolidation and Integration Services, and Data Analysis and Reporting Services.

  • Configured to transform raw data into a structured format suitable for analytical queries and to load this processed data into Redshift. These jobs ensured that data from different sources was standardized and deduplicated, enhancing data quality and reliability by glue jobs.
  • Once data was consolidated in Redshift, the next phase focused on utilizing this centralized data repository for advanced analytics and business intelligence.
  • Served as the core data warehouse, where all consolidated data was queried and analyzed. Redshift’s powerful data warehousing capabilities allowed BSH to perform complex queries across large datasets efficiently.
  • Utilized to extend Redshift queries to the S3, enabling BSH to analyze large volumes of data without having to load them into Redshift.
  • Integration with BI tools such as Tableau, Power BI, and QuickSight. These tools connected directly to Redshift to fetch data for visual analytics, providing BSH stakeholders with interactive dashboards and reports.
  • Established to ensure secure and reliable connectivity between BSH’s on-premises environments and AWS Cloud. This setup maintained data security and network integrity across all data transactions.
  • Implemented to manage access control, ensuring that only authorized personnel could perform operations on the AWS services involved.

Third party applications or solutions used

  • Tableau Server: Tableau Server is utilized to provide business intelligence capabilities by connecting directly to AWS Redshift. It allows BSH to create interactive and shareable dashboards, which help in visualizing the processed data stored in Redshift, facilitating better decision-making across the organization.
  • Power BI Server: Like Tableau, Power BI Server is employed to enhance data visualization and reporting capabilities. It is integrated with AWS Redshift, enabling BSH stakeholders to perform advanced data analysis and generate real-time business intelligence reports from the consolidated data.
  • Denodo Server: Denodo Server acts as a data virtualization and integration platform within BSH’s architecture. It integrates data from various sources, including AWS Redshift and other databases, providing a unified view of information across the organization. This helps in simplifying data access and management, enhancing overall data agility.
  • Celonis: Celonis is used for process mining to analyze and visualize processes based on the data collected in AWS Redshift. It helps BSH identify process inefficiencies and optimize operations by discovering the most impactful improvements based on data.

AWS Services used as part of the solution

  • AWS Redshift: Central to BSH’s data warehousing strategy, AWS Redshift serves as the primary data warehouse where all integrated data is analyzed. Redshift’s powerful data processing capabilities enable BSH to perform complex queries across large datasets efficiently, supporting extensive business intelligence activities.
  • AWS Lambda: Utilized extensively across the BSH architecture, Lambda functions automate and streamline data processing tasks. They handle everything from data ingestion, transformation, and loading processes, ensuring that data flows smoothly between AWS services and that business logic is efficiently executed without managing servers.
  • Amazon S3: Acts as the primary data lake for BSH, where all raw and processed data is stored. S3 provides durable, highly available, and scalable cloud storage, making it ideal for backing up and archiving significant amounts of data.
  • AWS Glue: Used for ETL (Extract, Transform, Load) operations, AWS Glue automates the preparation and transformation of data for analytics. By handling data integration tasks, it ensures that data from various sources is homogenized and ready for analysis in Redshift.
  • Amazon EMR (Elastic MapReduce): Deployed to process big data across dynamically scalable Amazon EC2 instances. BSH uses EMR for running big data frameworks like Hadoop and Spark, which are essential for processing vast amounts of data efficiently.
  • Amazon Athena: Integrated to allow SQL querying directly on data stored in Amazon S3. Athena provides BSH with the flexibility to run ad-hoc queries against S3 data without the need to load it into Redshift, facilitating quick access to data insights.
  • Amazon VPC (Virtual Private Cloud): Used to provision a logically isolated section of the AWS Cloud where BSH can launch AWS resources in a virtual network that they define. This setup allows BSH to control the virtual networking environment, including selection of IP address range, creation of subnets, and configuration of route tables and network gateways.
  • VPC Peering: Connects different VPCs to enable BSH to route traffic between them using private IP addresses. This service is essential for BSH to maintain a secure and efficient network architecture, ensuring that data flows seamlessly between different components of the cloud infrastructure without exposure to the public internet.
  • Site-to-Site VPN: Provides a secure connection between BSH’s on-premises network and the AWS cloud, ensuring that all data transferred remains encrypted and secure from unauthorized access.

Outcomes

Following the successful implementation of AWS Redshift, BSH has significantly enhanced its data warehousing capabilities, providing a robust, scalable platform for data integration and analytics. This centralized system allows for high concurrency data access, enabling real-time data analysis and decision-making across various departments. The ability to handle large data volumes seamlessly and the asynchronous updating mechanism ensures that BSH maintains up-to-date data insights without impacting system performance.

The architecture’s scalability and robustness prepare BSH for future expansions, with plans to incorporate AWS CloudFront to optimize access speeds globally. This strategic use of AWS services not only boosts operational efficiency but also aligns with BSH’s long-term goals of enhancing global data accessibility and security, demonstrating the transformative impact of AWS Redshift in supporting complex, data-driven business environments.

Architecture Diagrams of the specific customer deployment

BSH Architecture Diagram

For the BSH project, the architecture diagram of the specific customer deployment illustrates a sophisticated and secure cloud environment designed to optimize data management and analytics. The deployment features a robust integration of on-premises resources with AWS cloud services, where BSH’s network is securely connected to AWS through a combination of VPC peering and site-to-site VPN. This configuration ensures secure, seamless, and reliable access across BSH’s data management infrastructure.

Within this architecture, AWS Redshift serves as the central data warehouse, with additional AWS services such as Lambda, Glue, and S3 integrated to support data ingestion, transformation, and storage. Data is efficiently managed across multiple VPCs, with each hosting specific components like EMR for big data processing and SQL Server databases acting as intermediary layers for data access and manipulation. Updates and management of these services are streamlined through AWS management tools, enhancing the operational workflow. For enhanced data performance and security, This architecture not only supports current data workflows but is also designed to scale with BSH’s growing global operations.

BSH Architecture Diagram

In current Redshift setup, it is operating 6 ra3.xlplus nodes with 15TB of data storage, servicing a user base of 500. The system handles around 5,000 concurrent connections daily and processes queries involving approximately 5TB of data. Key users such as i40, Data Virtuality, and PCS depend on this infrastructure for critical business insights. However, we face challenges due to high demand; the CPU utilization often surpasses 80%, leading to performance bottlenecks during peak times. This not only delays query responses but also affects industry stakeholders who rely on Redshift for their web applications, causing availability issues during heavy load periods. Moreover, as Data Virtuality utilizes Redshift for analytics, their performance is also compromised during these peak times.

Technical Requirements

  • Establish a secure connection between BSH’s on-premise data centers and the AWS cloud environment using site-to-site VPN and VPC peering. This connectivity is crucial for ensuring safe and reliable data transfer across BSH’s network boundaries.
  • Implement AWS Redshift for high-performance data warehousing needs, ensuring it can handle large-scale data loads and complex queries efficiently. Redshift will serve as the central repository for analytics and business intelligence.
  • Utilize AWS Glue for ETL processes to integrate data from various sources into Redshift. This includes setting up Glue crawlers and jobs to automate the data transformation and loading processes.
  • Deploy AWS Lambda to handle real-time data processing needs. Lambda functions will be used to trigger data processing tasks based on events in S3 or modifications in data streams.
  • Use Amazon S3 for secure, scalable, and durable storage of raw and processed data. Implement proper data lifecycle policies to manage the storage efficiently.
  • Integrate Amazon Athena to enable direct SQL querying of data stored in S3, providing flexible and powerful data analysis capabilities without the need to load data into Redshift for every query.
  • Deploy Amazon EMR for processing large datasets using big data frameworks like Apache Hadoop and Apache Spark. EMR clusters will be used for complex data processing tasks that require significant compute resources.

Conclusion

The deployment of AWS Redshift and accompanying AWS services revolutionized BSH’s data management capabilities. By creating a unified, scalable, and secure environment for data consolidation and analysis, BSH was able to enhance operational efficiencies, improve decision-making processes, and support strategic business initiatives across global markets. This solution not only met the immediate challenges but also provided a scalable foundation for future data initiatives.

Let’s start your cloud journey
Get in Touch