Image for Data ingestion with Python cookbook: a practical guide helping you ingest, monitor, and identify errors in the data ingestion process

Data ingestion with Python cookbook: a practical guide helping you ingest, monitor, and identify errors in the data ingestion process

See all formats and editions

Deploy your own data ingestion pipeline, orchestrate, and monitor efficiently to prevent loss of data and quality.

Key Features

  • Implement the best practices to create a data Ingestion pipeline using python and pySpark
  • Automate and orchestrate your data pipelines using Apache Airflow
  • Build a monitoring framework while applying the concept of data observability to your pipelines

Book Description

Data Ingestion with Python Cookbook brings a practical way to design and apply data ingestion pipelines, providing real-world examples with the most reputed open-source tools available on the market, and bringing enlightenment to questions or obstacles.

You will be introduced to designing and working with or without data schemas, and creating monitored pipelines with airflow and data observability principles, while following the best practices. The book will further address the challenges to read different data sources or data formats. You will then progress to gain a broad understanding of the best practices for logging errors, how to identify/solve them, data orchestration, monitoring, and where to store the logs for further consultation.

By the end of the book, You will have a complete automated set to start ingesting and monitoring the pipeline, making it easier to plug into the further steps of the ETL process later.

What you will learn

  • Apply data observability using monitoring tools
  • Automate your data ingestion pipeline
  • Read analytical and partitioned data with schema or non-schema
  • Debug and prevent data loss using efficient monitoring and logging
  • Apply data access policies using a data governance framework
  • Create a data orchestration framework to improve data quality

Who This Book Is For

This book is for Data Engineers and data enthusiasts who want to have a better understanding of the process of ingesting data using the most popular tools in the open-source community.

For more advanced learners, this book takes on the theoretical pillars of Data Governance and gives practical examples of real scenarios frequently seen on a daily basis by data engineers.

Table of Contents

  1. GRAPHIC BUNDLE present Introduction to Data Ingestion
  2. Data Access Principals - Accessing your data
  3. Data Discovery - Understanding our data before ingesting
  4. Reading CSV and JSON files and solving problems
  5. Ingesting Data from Structured and Unstructured DataBases
  6. Using PySpark with defined and non-defined schemas
  7. Ingesting Analytical Data
  8. Designing Monitored Data Workflows
  9. Putting everything together with Airflow
  10. Logging and Monitoring you Data Ingest in Airflow
  11. Automate your Data Pipelines
  12. Using Data Observability to Debug, Error Handling, and Prevent Downtimes

Read More
Available
£26.99
Add Line Customisation
Available on VLeBooks
Add to List
Product Details
Packt Publishing
1837633096 / 9781837633098
eBook (EPUB)
005.74
09/06/2023
United Kingdom
English
1 pages
Copy: 100%; print: 100%
Description based on CIP data; resource not viewed.