I am a results-driven Data Engineer with over 3 years of experience in building scalable ETL/ELT pipelines, optimizing data workflows, and ensuring data accuracy. Proficient in Python, SQL, and big data tools like Spark and Hive, I specialize in designing and implementing cloud-based data systems using AWS and data orchestration using Apache Airflow/Autosys. My expertise includes data modeling, query optimization, and stream processing. I thrive in collaborative, agile environments and am committed to delivering high-quality, reliable data solutions to support data-driven decision-making.
CitiBank
May 2022 - August 2023
1. Built and maintained robust ETL/ELT pipelines using Python and SQL for ingesting and transforming data into AWS S3. 2. Optimized data workflows and implemented data validation frameworks, improving data accuracy and pipeline reliability. 3. Developed scalable data transformations using PETL and integrated Airflow for task orchestration and monitoring. 4. Designed data models and schemas aligned with business logic for efficient data access and reporting. 5. Partnered with analysts and engineering teams to support data requirements, enhancing reporting turnaround by 15%.
Guardian Life Insurance company of America
November 2019 - April 2022
1. Designed and implemented ETL pipelines using Syncsort and integrated data from Oracle, Mainframe, and SQL Server into AWS S3, increasing data availability by 25%. 2. Automated data ingestion pipelines using PySpark, improving batch processing speeds by 35% and reducing manual overhead by 60%. 3. Worked on data modeling and schema design for BI reporting and analytics systems. 4. Collaborated with product and analytics teams to support data-driven initiatives and improve forecast accuracy through predictive modeling. 5. Ensured data quality via automated validation and unit testing frameworks.
RxLogix
January 2019 - June 2019
1. Developed and optimized PL/SQL procedures, views, and database objects, improving system performance by 15%. 2. Performed query optimization and assisted in production issue resolution, reducing query run time by 25%. 3. Developed automated testing frameworks for database procedures, significantly reducing manual testing efforts and improving code quality assurance.
University of East London, October 2024
Data Science
Galgotias University, April 2019
Computer Science
1. Collected and cleaned extensive datasets, ensuring data quality and reliability. 2. Developed predictive models using machine learning algorithms such as decision trees, random forests, and linear regression achieving 95% model accuracy. 3. Conducted hypothesis testing to identify key socio-economic factors impacting Covid-19 spread and severity. 4. Presented insights through visualizations using ggplot2 and Matplotlib, influencing policy recommendations.
1.Analyzed clickstream data from the UCI repository using R and SQL, identifying customer behavior patterns. 2. Conducted exploratory data analysis and feature engineering to identify relevant variables and patterns in the data. 3. Built predictive classification models such as Decision Tree and Naïve Bayes to forecast customer preferences and improve recommendation system by 18% . 4. Performed hypothesis testing techniques such as t-tests, Wilcoxon test and ANOVA to analyze data and draw business insights.
Verified Data Engineer
3-5 years of experience
Preferred commitment: Hourly
Take the next step and bring this top talent to your team
Hire Shikha for your team