About Me
I'm a passionate Data Engineer with 5+ years of experience in designing, implementing, and maintaining data pipelines and infrastructure. I specialize in big data technologies and cloud platforms, helping organizations make data-driven decisions.
Skills
Python SQL Apache Spark Hadoop AWS Google Cloud Platform Docker Kubernetes Airflow Kafka
Experience
Senior Data Engineer - TechCorp (2020-Present)
- Designed and implemented scalable data pipelines processing 10TB+ daily
- Led migration of on-premise data warehouse to cloud-based solution
- Mentored junior engineers and conducted knowledge sharing sessions
Data Engineer - DataInc (2017-2020)
- Developed ETL processes for various data sources
- Optimized existing data pipelines, reducing processing time by 40%
- Collaborated with data scientists to implement machine learning models in production
Projects
Real-time Analytics Platform
Developed a real-time analytics platform using Apache Kafka, Spark Streaming, and Elasticsearch, enabling instant insights from streaming data.
Data Lake Implementation
Architected and implemented a data lake solution on AWS, utilizing S3, Glue, and Athena to provide a scalable and cost-effective data storage and analysis platform.
Contact
john.doe@email.com
linkedin.com/in/johndoe
github.com/johndoe