Big Data

How HPE Aruba Provide Chain optimized price and efficiency by migrating to an AWS trendy knowledge structure

11 September 2024

This weblog put up is co-written with Hardeep Randhawa and Abhay Kumar from HPE.

HPE Aruba Networking, previously generally known as Aruba Networks, is a Santa Clara, California-based safety and networking subsidiary of Hewlett Packard Enterprise firm. HPE Aruba Networking is the business chief in wired, wi-fi, and community safety options. Hewlett-Packard acquired Aruba Networks in 2015, making it a wi-fi networking subsidiary with a variety of next-generation community entry options.

Aruba affords networking {hardware} like entry factors, switches, routers, software program, safety gadgets, and Web of Issues (IoT) merchandise. Their massive stock requires in depth provide chain administration to supply elements, make merchandise, and distribute them globally. This complicated course of includes suppliers, logistics, high quality management, and supply.

This put up describes how HPE Aruba automated their Provide Chain administration pipeline, and re-architected and deployed their knowledge answer by adopting a contemporary knowledge structure on AWS.

Challenges with the on-premises answer

Because the demand surged with time, it was crucial that Aruba construct a complicated and highly effective provide chain answer that would assist them scale operations, improve visibility, enhance predictability, elevate buyer expertise, and drive sustainability. To realize their imaginative and prescient of a contemporary, scalable, resilient, safe, and cost-efficient structure, they selected AWS as their trusted associate because of the vary of low-cost, scalable, and dependable cloud companies they provide.

By a dedication to cutting-edge applied sciences and a relentless pursuit of high quality, HPE Aruba designed this next-generation answer as a cloud-based cross-functional provide chain workflow and analytics instrument. The appliance helps customized workflows to permit demand and provide planning groups to collaborate, plan, supply, and fulfill buyer orders, then observe success metrics through persona-based operational and administration experiences and dashboards. This additionally contains constructing an business customary built-in knowledge repository as a single supply of fact, operational reporting via actual time metrics, knowledge high quality monitoring, 24/7 helpdesk, and income forecasting via monetary projections and provide availability projections. Total, this new answer has empowered HPE groups with persona-based entry to 10 full-scale enterprise intelligence (BI) dashboards and over 350 report views throughout demand and provide planning, stock and order administration, SKU dashboards, deal administration, case administration, backlog views, and large deal trackers.

Overview of the answer

This put up describes how HPE Aruba automated their provide chain administration pipeline, ranging from knowledge migration from different knowledge sources right into a centralized Amazon Easy Storage Service (Amazon S3) primarily based storage to constructing their knowledge warehouse on Amazon Redshift with the publication layer constructed on a third-party BI instrument and consumer interface utilizing ReactJS.

The next diagram illustrates the answer structure.

Within the following sections, we undergo the important thing parts within the diagram in additional element:

Supply methods
Knowledge migration
Regional distribution
Orchestration
File processing
Knowledge high quality checks
Archiving processed recordsdata
Copying to Amazon Redshift
Working saved procedures
UI integration
Code Deployment
Safety & Encryption
Knowledge Consumption
Closing Steps

1. Supply methods

Aruba’s supply repository contains knowledge from three completely different working areas in AMER, EMEA, and APJ, together with one worldwide (WW) knowledge pipeline from different sources like SAP S/4 HANA, Salesforce, Enterprise Knowledge Warehouse (EDW), Enterprise Analytics Platform (EAP) SharePoint, and extra. The information sources embrace 150+ recordsdata together with 10-15 necessary recordsdata per area ingested in numerous codecs like xlxs, csv, and dat. Aruba’s knowledge governance pointers required that they use a single centralized instrument that would securely and cost-effectively evaluate all supply recordsdata with a number of codecs, sizes, and ingestion instances for compliance earlier than exporting them out of the HPE setting. To realize this, Aruba first copied the respective recordsdata to a centralized on-premises staging layer.

2. Knowledge migration

Aruba selected AWS Switch Household for SFTP for safe and environment friendly file transfers from an on-premises staging layer to an Amazon S3 primarily based touchdown zone. AWS Switch Household seamlessly integrates with different AWS companies, automates switch, and makes positive knowledge is protected with encryption and entry controls. To stop deduplication points and preserve knowledge integrity, Aruba personalized these knowledge switch jobs to verify earlier transfers are full earlier than copying the subsequent set of recordsdata.

3. Regional distribution

On common, Aruba transfers roughly 100 recordsdata, with whole measurement starting from 1.5–2 GB into the touchdown zone every day. The information quantity will increase every Monday with the weekly file masses and firstly of every month with the month-to-month file masses. These recordsdata comply with the identical naming sample, with a every day system-generated timestamp appended to every file title. Every file arrives as a pair with a tail metadata file in CSV format containing the dimensions and title of the file. This metadata file is later used to learn supply file names throughout processing into the staging layer.

The supply knowledge accommodates recordsdata from three completely different working Areas and one worldwide pipeline that must be processed per native time zones. Due to this fact, separating the recordsdata and operating a definite pipeline for every was essential to decouple and improve failure tolerance. To realize this, Aruba used Amazon S3 Occasion Notifications. With every file uploaded to Amazon S3, an Amazon S3 PUT occasion invokes an AWS Lambda perform that distributes the supply and the metadata recordsdata Area-wise and masses them into the respective Regional touchdown zone S3 bucket. To map the file with the respective Area, this Lambda perform makes use of Area-to-file mapping saved in a configuration desk in Amazon Aurora PostgreSQL-Appropriate Version.

4. Orchestration

The subsequent requirement was to arrange orchestration for the info pipeline to seamlessly implement the required logic on the supply recordsdata to extract significant knowledge. Aruba selected AWS Step Features for orchestrating and automating their extract, remodel, and cargo (ETL) processes to run on a set schedule. As well as, they use AWS Glue jobs for orchestrating validation jobs and shifting knowledge via the info warehouse.

They used Step Features with Lambda and AWS Glue for automated orchestration to reduce the cloud answer deployment timeline by reusing the on-premises code base, the place potential. The prior on-premises knowledge pipeline was orchestrated utilizing Python scripts. Due to this fact, integrating the present scripts with Lambda inside Step Features and AWS Glue helped speed up their deployment timeline on AWS.

5. File processing

With every pipeline operating at 5:00 AM native time, the info is additional validated, processed, after which moved to the processing zone folder in the identical S3 bucket. Unsuccessful file validation ends in the supply recordsdata being moved to the reject zone S3 bucket listing. The next file validations are run by the Lambda features invoked by the Step Features workflow:

The Lambda perform validates if the tail file is out there with the corresponding supply knowledge file. When every full file pair lands within the Regional touchdown zone, the Step Features workflow considers the supply file switch as full.
By studying the metadata file, the file validation perform validates that the names and sizes of the recordsdata that land within the Regional touchdown zone S3 bucket match with the recordsdata on the HPE on-premises server.

6. Knowledge high quality checks

When the recordsdata land within the processing zone, the Step Features workflow invokes one other Lambda perform that converts the uncooked recordsdata to CSV format adopted by stringent knowledge high quality checks. The ultimate validated CSV recordsdata are loaded into the temp uncooked zone S3 folder.

The information high quality (DQ) checks are managed utilizing DQ configurations saved in Aurora PostgreSQL tables. Some examples of DQ checks embrace duplicate knowledge examine, null worth examine, and date format examine. The DQ processing is managed via AWS Glue jobs, that are invoked by Lambda features from throughout the Step Features workflow. Various knowledge processing logics are additionally built-in within the DQ circulate, similar to the next:

Flag-based deduplication – For particular recordsdata, when a flag managed within the Aurora configuration desk is enabled, the method removes duplicates earlier than processing the info
Pre-set values changing nulls – Equally, a preset worth of 1 or 0 would suggest a NULL within the supply knowledge primarily based on the worth set within the configuration desk

7. Archiving processed recordsdata

When the CSV conversion is full, the unique uncooked recordsdata within the processing zone S3 folder are archived for six months within the archive zone S3 bucket folder. After 6 months, the recordsdata on AWS are deleted, with the unique uncooked recordsdata retained within the HPE supply system.

8. Copying to Amazon Redshift

When the info high quality checks and knowledge processing are full, the info is loaded from the S3 temp uncooked zone into the curated zone on an Redshift provisioned cluster, utilizing the COPY command function.

9. Working saved procedures

From the curated zone, they use AWS Glue jobs, the place the Redshift saved procedures are orchestrated to load the info from the curated zone into the Redshift publish zone. The Redshift publish zone is a unique set of tables in the identical Redshift provisioned cluster. The Redshift saved procedures course of and cargo the info into truth and dimension tables in a star schema.

10. UI integration

Amazon OpenSearch Service can also be built-in with the circulate for publishing mass notifications to the end-users via the consumer interface (UI). The customers may also ship messages and put up updates through the UI with the OpenSearch Service integration.

11. Code Deployment

Aruba makes use of AWS CodeCommit and AWS CodePipeline to deploy and handle a bi-monthly code launch cycle, the frequency for which might be elevated on-demand as per deployment wants. The discharge occurs throughout 4 environments – Growth, Testing, UAT and Manufacturing – deployed via DevOps self-discipline, thus enabling shorter turnaround time to ever-changing consumer necessities and upstream knowledge supply modifications.

12. Safety & Encryption

Person entry to the Aruba SC360 portal is managed through SSO with MFA authentication and knowledge safety managed through direct integration of the AWS answer with HPE IT’s unified entry administration API. All the info pipelines between HPE on-premises sources and S3 are encrypted for enhanced safety.

13. Knowledge Consumption

Aruba SC360 utility supplies a ‘Personal House’ function to different BI/Analytics groups inside HPE to run and handle their very own knowledge ingestion pipeline. This has been constructed utilizing Amazon Redshift knowledge sharing function, which has enabled Aruba to securely share entry to stay knowledge of their Amazon Redshift cluster, with out manually shifting or copying the info. Thus, the HPE inner groups might construct their very own knowledge workloads on core Aruba SC360 knowledge whereas sustaining knowledge safety and code isolation.

14. Closing Steps

The information is lastly fetched into the publication layer, which consists of a ReactJS-based consumer interface accessing the info within the Amazon publish zone utilizing Spring Boot REST APIs. Together with knowledge from the Redshift knowledge warehouse, notifications up to date within the OpenSearch Service tables are additionally fetched and loaded into the UI. Amazon Aurora PostgreSQL is used to take care of the configuration values for populating the UI. To construct BI dashboards, Aruba opted to proceed utilizing their current third-party BI instrument as a result of its familiarity amongst inner groups.

Conclusion

On this put up, we confirmed you the way HPE Aruba Provide Chain efficiently re-architected and deployed their knowledge answer by adopting a contemporary knowledge structure on AWS.

The brand new answer has helped Aruba combine knowledge from a number of sources, together with optimizing their price, efficiency, and scalability. This has additionally allowed the Aruba Provide Chain management to obtain in-depth and well timed insights for higher decision-making, thereby elevating the shopper expertise.

To study extra in regards to the AWS companies used to construct trendy knowledge options on AWS, seek advice from the AWS public documentation and keep updated via the AWS Large Knowledge Weblog.

In regards to the authors

Hardeep Randhawa is a Senior Supervisor – Large Knowledge & Analytics, Resolution Structure at HPE, acknowledged for stewarding enterprise-scale applications and deployments. He has led a current Large Knowledge EAP (Enterprise Analytics Platform) construct with one of many largest international SAP HANA/S4 implementations at HPE.

Abhay Kumar is a Lead Knowledge Engineer in Aruba Provide Chain Analytics and manages the Cloud Infrastructure for the Software at HPE. With 11+ years of expertise within the IT business domains like banking, provide chain and Abhay has a powerful background in Cloud Applied sciences, Knowledge Analytics, Knowledge Administration, and Large Knowledge methods. In his spare time, he likes studying, exploring new locations and watching motion pictures.

Ritesh Chaman is a Senior Technical Account Supervisor at Amazon Internet Companies. With 14 years of expertise within the IT business, Ritesh has a powerful background in Knowledge Analytics, Knowledge Administration, Large Knowledge methods and Machine Studying. In his spare time, he loves cooking, watching sci-fi motion pictures, and taking part in sports activities.

Sushmita Barthakur is a Senior Options Architect at Amazon Internet Companies, supporting Enterprise clients architect their workloads on AWS. With a powerful background in Knowledge Analytics and Knowledge Administration, she has in depth expertise serving to clients architect and construct Enterprise Intelligence and Analytics Options, each on-premises and the cloud. Sushmita is predicated out of Tampa, FL and enjoys touring, studying and taking part in tennis.