- Needed improvement/redesign of the Customer Loyalty Program to improve user experience.
- A huge amount of data in various formats from multiple sources including website clickstream data, customer data from relational databases, data from brick and mortar stores and social media data from Twitter and Facebook were collected but weren’t being managed properly.
- Scalability issues impacted the management of large volumes and varieties of data using the existing process.
- Needed a scalable solution to manage and exploit the growing data.
- Helped them in setting up low-cost storage – highly available infrastructure on Hadoop environment.
- Implemented a data lake on Hadoop cluster for different apps to store and consume data.
- Used Hortonworks Data Platform 2.4, which pulls the data from multiple formats including semi-structured, unstructured and relational data and stores the same as or files in HDFS.
- Built real-time analytics data pipeline as well as designed workflows using Oozie with a combination of Pig, Hive, and Python to perform data cleaning, storage, notifications and more.
- Secured data lake with Kerberos authentication, defined authorization and auditing with a comprehensive set of rules managed using Ranger
Tools & Technologies
Hadoop, Hive, Oosie, Hortonworks, Hive
- Ability to hold a diverse mix of structured, unstructured and semi-structured information, which was integrated for ready consumption
- Faster data processing and reporting
- Easy to maintain and highly available Hadoop infrastructure with notifications and review
- The mechanism in place is a single system for both storage and processing
- Better control over system resource allocation among various applications.