How to ingest CSV, Parquet & JSON file into snowflake datawarehouse using Pyspark Dataframes??

pyspark-snowflake data ingestion with csv, parquet and json data files

Hi,

In this blog, I would ingest the various types of data (CSV, PARQUET & JSON) file into snowflake datawarehouse. Pyspark used its session while it performs the ingestion task it creates a temporary stage into snowflake to insert data into db table.

You would be familiar with Pyspark data engine, how to convert the files data from various data formats to data-frames and then you can perform various actions and transformations on data-frames. I also used few to check the schema, number of records, duplicate records, how to remove the duplicate records and finally ingest the data into snowflake data-warehouse.

I performed the data ingestion using following steps:

  1. Create snowflake connection using a private key.
  2. Create spark instance using Spark-Session and local cluster mode.
  3. Read data from files (CSV, JSON, PARQUET) and perform transformations on data-frames.
  4. Ingest the data into snowflake data-warehouse using Pyspark write API.

Create a SNOWFLAKE Connection using private key:

a config.properties file inside your project folder:

snowflake connection properties with rsa_key.p8

a snowflake create method using above details:

make snowflake connection using account details and rsa_key.p8

connection is ready in below format options:

read the csv file data:

read csv file using pyspark

Please go through the below video, it is complete practical on spark data engine how would you be able to perform csv data ingestion into snowflake datawarehouse.

read the parquet file data:

read the json file data:

read json file data using pyspark

Please go through the below video, it is complete practical on spark data engine how would you be able to perform parquet, JSON files data ingestion into snowflake data-warehouse.

Transformations:
below are the duplicate records in data-frame:

duplicate records in pyspark dataframe

remove duplicate records using distinct() or dropDuplicates() method

remove duplicate records using distinct()

finally we have to perform data ingestion step :)

Please watch and subscribe my videos, you would get the full idea how to perform the above steps:

Thank You :)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store