Atlanta Snowflake Study Group

SnowPro Core exam preparation

Data Loading and Unloading

Back to Home

Define concepts and best practices that should be considered when loading data

Stages and stage types

Stages are intermediate storage locations for data files:

Stage Types: External (your cloud storage) vs Internal (Snowflake-managed). External preferred for data lake integration.

File size and formats

Optimal File Size: 100-250MB compressed. Enables parallel loading. Binary formats (Parquet, Avro, ORC) faster than text.

Folder structures

Organize files by logical partitions (e.g., bucket/app/YYYY/MM/DD/). Enables loading specific time periods by pointing COPY to the relevant path.

Ad hoc/bulk loading

Loading Methods: COPY INTO for bulk (large files), INSERT for small row counts, Web UI for local ad-hoc.

Snowpipe

Continuous ingestion service. Automatically loads new files from external stages using serverless compute. No warehouse management required.

Snowpipe: Continuous, automatic file ingestion. Serverless compute. Near-real-time loading.

Outline different commands used to load data and when they should be used

CREATE STAGE

Defines a stage location and credentials. Required before loading data.

CREATE FILE FORMAT

Creates a named format specification (file type, compression, delimiter, null handling). Reusable across multiple COPY commands.

Named File Formats: Define once, reuse across COPY commands. Specify type, compression, delimiter, null handling.

CREATE PIPE

Defines a Snowpipe for continuous ingestion. Includes the COPY INTO statement and source stage.

CREATE EXTERNAL TABLE

Query files in external stages without loading. Reads data directly from your data lake.

External Tables: Query stage files directly without loading into Snowflake tables.

COPY INTO

Primary bulk loading command. Syntax: COPY INTO <table> FROM @<stage>. Options include:

COPY INTO: Primary bulk load command. Supports error handling, column mapping, and load-time transformations.

INSERT/INSERT OVERWRITE

INSERT for small row counts. INSERT OVERWRITE replaces table contents entirely. Neither efficient for large-scale loading.

PUT

Uploads local files to internal stages. Required step when loading from local filesystem to internal stage.

VALIDATE

Tests COPY INTO without executing. Reports errors (type mismatches, formatting issues) before actual load.

VALIDATE: Dry-run for COPY INTO. Find errors before loading.

Define concepts and best practices that should be considered when unloading data

File size and formats

Overview of compression methods

Supported: GZIP, BZ2, BROTLI, ZSTD. Compression reduces file size, storage cost, and transfer time.

Unload Compression: GZIP, BZ2, BROTLI, ZSTD. Reduces storage and transfer costs.

Empty strings and NULL values

Control representation with EMPTY_FIELD_AS_NULL and NULL_IF options in file format or COPY command.

Unloading to a single file

Default: parallel unload to multiple files. Use SINGLE = TRUE for one file (slower).

Single File Unload: SINGLE = TRUE option. Slower than parallel multi-file default.

Unloading relational tables

Separate COPY INTO @stage per table. No built-in multi-table export with relationship preservation.

Outline the different commands used to unload data and when they should be used

GET

Downloads files from internal stage to local machine. Counterpart to PUT.

LIST

Lists files in a stage (alias: LS). Verify unload success and see created file names.

COPY INTO

Primary unload command. Syntax: COPY INTO @<stage> FROM <table>. Same format options as loading.

Unload Command: COPY INTO @stage FROM table. Same format options as loading.

CREATE STAGE

Required destination for unloaded files.

CREATE FILE FORMAT

Specifies output file format.