- SAS database is used to store all client data. The amount of data is huge and continues to grow fast. SAS licensing cost is high for each GB of data stored. Needed a solution to reduce the operations costs involved in data storage.
- Processing time for mathematical operations is slower compared to a distributed system.
- Process the unstructured dataset and convert it into structured ones.
- Implemented an open-source framework to store and process big data using unique programming models – Hadoop distribution (HDP2.4).
- Migrated to Hadoop ecosystem to enhance the performance of queries at a faster rate.
- Facilitated business logic upfront by creating a data-lake.
- Executed an ETL process to extract data out of the source systems and placing them into the data warehouse.
Tools & Technologies
Hadoop, SAS, Microsoft Active Directory, Apache Ambari, oozie
- Increased scalability by automating fault-tolerance production.
- Excelled at high-volume batch processing resulting in cheaper hardware cost, faster performance, reliability of data.
- Pulled pre-aggregated datasets to a SAS database for faster querying.