select * from information_schema.packages where package_name = 'snowflake-snowpark-python' order by version desc;
Writing the Method or Function
When writing the method or function for the stored procedure, note the following:
Specify the Snowpark Session
object as the first argument of your method or function.
When you call your stored procedure, Snowflake automatically creates a Session
object and passes it to your stored procedure.
(You cannot create the Session
object yourself.)
For the rest of the arguments and for the return value, use the Python types that correspond to
Snowflake data types. Snowflake supports the Python data types listed in
SQL-Python Data Type Mappings for Parameters and Return Types
Handling Errors
You can use the normal Python exception-handling techniques to catch errors within the procedure.
If an uncaught exception occurs inside the method, Snowflake raises an error that includes the stack trace for the exception. When
logging of unhandled exceptions is enabled, Snowflake logs data
about unhandled exceptions in an event table.
Making Dependencies Available to Your Code
If your handler code depends on code defined outside the handler itself (such as code defined in a module) or on resource files, you can
make those dependencies available to your code by uploading them to a stage.
Refer to Making Dependencies Available to Your Code, or for Python worksheets, refer to Add a Python File from a Stage to a Worksheet.
If you create your stored procedure using SQL, use the IMPORTS clause when writing the
CREATE PROCEDURE statement, to point to the dependency files.
Accessing Data in Snowflake from Your Stored Procedure
To access data in Snowflake, use the Snowpark library APIs.
When handling a call to your Python stored procedure, Snowflake creates a Snowpark Session
object and passes the object to the method or function for your stored procedure.
As is the case with stored procedures in other languages, the context for the session (e.g. the privileges, current database and
schema, etc.) is determined by whether the stored procedure runs with caller’s rights or owner’s rights. For details, see
Accessing and Setting the Session State.
You can use this Session
object to call APIs in the
Snowpark library.
For example, you can create a DataFrame for a table or execute an SQL statement.
See the Snowpark Developer Guide for more information.
Data Access Example
The following is an example of a Python method that copies a specified number of rows from one table to another table.
The method takes the following arguments:
A Snowpark Session
object
The name of the table to copy the rows from
The name of the table to save the rows to
The number of rows to copy
The method in this example returns a string. If you run this example in a
Python worksheet,
change the return type for the worksheet to a String
def run(session, from_table, to_table, count):
session.table(from_table).limit(count).write.save_as_table(to_table)
return "SUCCESS"
Reading Files
You can dynamically read a file from a stage in your Python handler using the SnowflakeFile
class in the Snowpark snowflake.snowpark.files
module.
Snowflake supports reading files with SnowflakeFile
for both stored procedures and user-defined functions. For more information about reading files in your handler code, as well as more examples, refer to Reading a File with a Python UDF Handler.
This example demonstrates how to create and call an owner’s rights stored procedure that reads a file using the SnowflakeFile
class.
Create the stored procedure with an in-line handler, specifying the input mode as binary by passing rb
for the mode
argument:
CREATE OR REPLACE PROCEDURE calc_phash(file_path string)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python','imagehash','pillow')
HANDLER = 'run'
from PIL import Image
import imagehash
from snowflake.snowpark.files import SnowflakeFile
def run(ignored_session, file_path):
with SnowflakeFile.open(file_path, 'rb') as f:
return imagehash.average_hash(Image.open(f))
Call the stored procedure:
CALL calc_phash(build_scoped_file_url(@my_files, 'my_image.jpg'));
Using Third-Party Packages from Anaconda
You can specify Anaconda packages to install when you create Python stored procedures. To view the list of third-party packages
from Anaconda, see the Anaconda Snowflake channel.
These third-party packages are built and provided by Anaconda.
You may use the Snowflake conda channel for local testing and development at no cost under the Supplemental Embedded Software Terms to Anaconda’s Terms of Service.
For limitations, see Limitations.
Getting Started
Before you start using the packages provided by Anaconda inside Snowflake, you must acknowledge
the External Offerings Terms.
You must be the organization administrator (use the ORGADMIN role) to accept the terms. You only need to accept the terms once for your
Snowflake account. See Enabling the ORGADMIN role in an account.
Sign in to Snowsight.
Select Admin » Billing & Terms.
In the Anaconda section, select Enable.
In the Anaconda Packages dialog, click the link to review the External Offerings Terms page.
If you agree to the terms, select Acknowledge & Continue.
If you see an error when attempting to accept the terms of service, your user profile might be missing a first name, last name,
or email address. If you have an administrator role, refer to Add user details to your user profile to update your profile
using Snowsight. Otherwise, contact an administrator to update your account.
If you don’t acknowledge the Snowflake Third Party Terms as described above, you can still use stored procedures, but with
these limitations:
You can’t use any third-party packages from Anaconda.
You can still specify Snowpark Python as a package in a stored procedure, but you can’t specify a specific version.
You can’t use the to_pandas
method when interacting with a DataFrame object.
Displaying and Using Packages
You can display all available packages and their version information by querying the PACKAGES view in the Information Schema:
select * from information_schema.packages where language = 'python';
For more information, see Using Third-Party Packages
in the Snowflake Python UDF documentation.
Creating the Stored Procedure
You can create a stored procedure from a Python worksheet, or using SQL.
To create a stored procedure with SQL, see Creating a Stored Procedure.
To create a stored procedure from a Python worksheet, see Creating a Python Stored Procedure to Automate Your Python Worksheet Code.
Creating a Python Stored Procedure to Automate Your Python Worksheet Code
Create a Python stored procedure from your Python worksheet to automate your code. For details on writing Python worksheets, see
Writing Snowpark Code in Python Worksheets.
Prerequisites
Your role must have OWNERSHIP or CREATE PROCEDURE privileges on the database schema in which you run your Python worksheet to deploy it
as a stored procedure.
Deploy a Python Worksheet as a Stored Procedure
To create a Python stored procedure to automate the code in your Python worksheet, do the following:
Sign in to Snowsight.
Open Projects » Worksheets.
Open the Python worksheet that you want to deploy as a stored procedure.
Select Deploy.
Enter a name for the stored procedure.
(Optional) Enter a comment with details about the stored procedure.
(Optional) Select Replace if exists to replace an existing stored procedure with the same name.
For Handler, select the handler function for your stored procedure. For example, main
.
Review the arguments used by your handler function and if needed, override the SQL data type mapping for a typed argument.
For details about how Python types are mapped to SQL types, see SQL-Python Data Type Mappings.
(Optional) Select Open in Worksheets to open the stored procedure definition in a SQL worksheet.
Select Deploy to create the stored procedure.
After the stored procedure is created, you can go to the procedure details or select Done.
You can create multiple stored procedures from one Python worksheet.
After you create a stored procedure, you can automate it as part of a task. Refer to Introduction to tasks.
Returning Tabular Data
You can write a procedure that returns data in tabular form. To write a procedure that returns tabular data, do the following:
Specify TABLE(...)
as the procedure’s return type in your CREATE PROCEDURE statement.
As TABLE parameters, you can specify the returned data’s column names and types if you know them.
If you don’t know the returned columns when defining the procedure – such as when they’re specified at run time – you can leave out the
TABLE parameters. When you do, the procedure’s return value columns will be converted from the columns in the DataFrame returned by its
handler. Column data types will be converted to SQL according to the mapping specified in SQL-Python Data Type Mappings.
Write the handler so that it returns the tabular result in a Snowpark DataFrame.
For more information about dataframes, see Working with DataFrames in Snowpark Python.
Example
The examples in this section illustrate returning tabular values from a procedure that filters for rows where a column matches a string.
Defining the Data
Code in the following example creates a table of employees.
CREATE OR REPLACE TABLE employees(id NUMBER, name VARCHAR, role VARCHAR);
INSERT INTO employees (id, name, role) VALUES (1, 'Alice', 'op'), (2, 'Bob', 'dev'), (3, 'Cindy', 'dev');
Specifying Return Column Names and Types
This example specifies column names and types in the RETURNS TABLE()
statement.
CREATE OR REPLACE PROCEDURE filterByRole(tableName VARCHAR, role VARCHAR)
RETURNS TABLE(id NUMBER, name VARCHAR, role VARCHAR)
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python')
HANDLER = 'filter_by_role'
from snowflake.snowpark.functions import col
def filter_by_role(session, table_name, role):
df = session.table(table_name)
return df.filter(col("role") == role)
Omitting Return Column Names and Types
Code in the following example declares a procedure that allows return value column names and types to be extrapolated from columns in the
handler’s return value. It omits the column names and types from the RETURNS TABLE()
statement.
CREATE OR REPLACE PROCEDURE filterByRole(tableName VARCHAR, role VARCHAR)
RETURNS TABLE()
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python')
HANDLER = 'filter_by_role'
from snowflake.snowpark.functions import col
def filter_by_role(session, table_name, role):
df = session.table(table_name)
return df.filter(col("role") == role)
Calling the Procedure
The following example calls the stored procedure:
CALL filterByRole('employees', 'dev');
The procedure call produces the following output:
+----+-------+------+
| ID | NAME | ROLE |
+----+-------+------+
| 2 | Bob | dev |
| 3 | Cindy | dev |
+----+-------+------+
Calling Your Stored Procedure
After creating a stored procedure, you can call it from SQL or as part of a scheduled task.
For information on calling a stored procedure from SQL, refer to Calling a Stored Procedure.
For information on calling a stored procedure as part of a scheduled task, refer to Introduction to tasks.
Examples
Running Concurrent Tasks with Worker Processes
You can run concurrent tasks using Python worker processes. You might find this useful when you need to run parallel tasks that take
advantage of multiple CPU cores on warehouse nodes.
Snowflake recommends that you not use the built-in Python multiprocessing module.
To work around cases where the Python Global Interpreter Lock prevents a
multi-tasking approach from scaling across all CPU cores, you can execute concurrent tasks using separate worker processes, rather than threads.
You can do this on Snowflake warehouses by using the joblib
library’s Parallel
class, as in the following example.
CREATE OR REPLACE PROCEDURE joblib_multiprocessing_proc(i INT)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = 3.8
HANDLER = 'joblib_multiprocessing'
PACKAGES = ('snowflake-snowpark-python', 'joblib')
AS $$
import joblib
from math import sqrt
def joblib_multiprocessing(session, i):
result = joblib.Parallel(n_jobs=-1)(joblib.delayed(sqrt)(i ** 2) for i in range(10))
return str(result)
The default backend used for joblib.Parallel
differs between Snowflake standard and Snowpark-optimized warehouses.
Standard warehouse default: threading
Snowpark-optimized warehouse default: loky
(multiprocessing)
You can override the default backend setting by calling the joblib.parallel_backend
function, as in the following example.
import joblib
joblib.parallel_backend('loky')