technology

SQL User Defined Functions

At Jana, we use snowflake as our data warehouse. Snowflake comes with a lot of cool features and it is a lot faster at processing huge amounts of data. One of the places where we use the ability to process faster is in computing data in the SQL queries instead of doing it in Java or Python.

The downside to this is that SQL code is at time less readable and difficult to maintain than Java or Python. This led me to learn more about SQL user defined functions and good practices while writing them.

A user defined function is a set of sql statements that encapsulates commonly used formulas or  business logic and is stored in the database for reusability.

There are two types of UDFs supported in snowflake, scalar functions which returns a scalar value like an integer or a varchar and a table function that returns a set of rows.

A simple scalar function look like this:

-- calculate_fees: float, float = float
-- input: amount, fee_percentage
-- output: fees for the given, 0 if the fee percentage is NULL
-- tests:
-- calculate_fees(100, 0.05) = 5.0
-- calculate_fees(100, NULL) = 0
CREATE OR REPLACE FUNCTION calculate_fees(amount FLOAT, fee_percentage FLOAT)
   RETURNS FLOAT
   AS '
       CASE
           WHEN fee_percentage IS NULL THEN 0
           ELSE amount * fee_percentage
       END
   ';

As you can see, user defined functions are really good in doing smaller computations. These functions can be used in select, where and having clauses of the sql statement.

At times, these functions can end up being too big. I realized that having a contract and a purpose for the functions before writing them can be really helpful. This also makes it easier to have unit tests.

If the function is getting to be too big to write, it will definitely be too big to read later on. It’s pretty easy to break bigger functions up into smaller functions and combine the results. I used the functional programming style of coding here and it worked very well for me.   

Another cool thing is that UDFs support function overloading. This can be very useful but also confusing at times. Since the functions are stored in the database, the only way to replace a function is to drop the existing function and then write a new one. I made the mistake of not dropping the older function. This led to there being two functions, the new one that is right and the old one that I wanted to replace. Some parts of the code were still calling the older function and giving out wrong results. Figuring out where these were happening was confusing. And we definitely don’t want a wrong function to be living in the database.

Other than that, I found user defined functions very helpful and easy to use.

If you like working with data and sql, check us out!

Discussion

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s