How to ingest CSV, Parquet & JSON file into snowflake datawarehouse using Pyspark Dataframes??

3 min readSep 18, 2022

pyspark-snowflake data ingestion with csv, parquet and json data files — pyspark snowflake data ingestion

Hi,

In this blog, I would ingest the various types of data (CSV, PARQUET & JSON) file into snowflake datawarehouse. Pyspark used its session while it performs the ingestion task it creates a temporary stage into snowflake to insert data into db table.

You would be familiar with Pyspark data engine, how to convert the files data from various data formats to data-frames and then you can perform various actions and transformations on data-frames. I also used few to check the schema, number of records, duplicate records, how to remove the duplicate records and finally ingest the data into snowflake data-warehouse.

I performed the data ingestion using following steps:

Create snowflake connection using a private key.
Create spark instance using Spark-Session and local cluster mode.
Read data from files (CSV, JSON, PARQUET) and perform transformations on data-frames.
Ingest the data into snowflake data-warehouse using Pyspark write API.

Create a SNOWFLAKE Connection using private key:

a config.properties file inside your project folder:

snowflake connection properties with rsa_key.p8 — snowflake account properties to make connection in python

a snowflake create method using above details:

make snowflake connection using account details and rsa_key.p8 — snowflake connection using account details and rsa key

connection is ready in below format options:

snowflake connection ready with configuration details

read the csv file data:

Please go through the below video, it is complete practical on spark data engine how would you be able to perform csv data ingestion into snowflake datawarehouse.

read the parquet file data:

read the json file data:

Please go through the below video, it is complete practical on spark data engine how would you be able to perform parquet, JSON files data ingestion into snowflake data-warehouse.

Transformations:
below are the duplicate records in data-frame:

remove duplicate records using distinct() or dropDuplicates() method

finally we have to perform data ingestion step :)

Please watch and subscribe my videos, you would get the full idea how to perform the above steps:

Thank You :)

How to ingest CSV, Parquet & JSON file into snowflake datawarehouse using Pyspark Dataframes??

Written by Vidit tyagi

No responses yet