technology

Area Under a Snowflake

Ever want to calculate the area under a curve in SQL? I know, I know.. who hasn’t right?

sum

It’s pretty straightforward to approximate the area under a curve in SQL by computing a Riemann sum using the trapezoidal rule.

First, we’ll make up some coordinates that represent the curve we want to integrate. Let’s find the area under the square root function from 0 to 1. Just because I want to use

\LaTeX in this post (even though I can’t figure out how to make it display inline), we are approximating this integral:

\displaystyle \int_0^1\! \sqrt x \, \mathrm{d}x. Boom.

Anyway, back to getting the coordinates, we will use 101 evenly spaced points on the curve:

WITH coordinates AS
(
  SELECT (ROW_NUMBER() OVER (ORDER BY 1) - 1) / 100 AS x,
         SQRT((ROW_NUMBER() OVER (ORDER BY 1) - 1) / 100) AS y
    FROM TABLE(GENERATOR(rowCount => 101))
),

Next, we compute the width of the bottom side of the trapezoid and the difference in height between the left and right sides.

deltas AS
(
  SELECT x,
         y,
         LEAD(x) OVER (ORDER BY x) - x AS delta_x,
         LEAD(y) OVER (ORDER BY x) - y AS delta_y
    FROM coordinates
),

Now we can compute the area of each trapezoid in the Riemann sum.

partial_areas AS
(
  SELECT *,
         (delta_x * y) + (0.5 * delta_x * delta_y) AS partial_area
    FROM deltas
)

Here’s a look at what we have so far:

  SELECT *
    FROM deltas
X Y DELTA_X DELTA_Y PARTIAL_AREA
0.000 0 0.010 0.1 0.0005
0.010 0.1 0.010 0.04142135624 0.001207106781
0.020 0.1414213562 0.010 0.03178372452 0.001573132185
0.030 0.1732050808 0.010 0.02679491924 0.001866025404
0.980 0.9899494937 0.010 0.005037943445 0.009924684654
0.990 0.9949874371 0.010 0.005012562893 0.009974937186
1.000 1 NULL NULL NULL

To get the final answer, we can just sum the area of all the small trapezoids.

  SELECT SUM(partial_area) AS area
    FROM partial_areas

If we evaluate the integral above, we get exactly 2/3. With this method, we get an answer of 0.6664629471, which is pretty close, close enough for what I’m using it for, and may or not be close enough for your needs.

Here’s all the SQL together:

WITH coordinates AS
(
  SELECT (ROW_NUMBER() OVER (ORDER BY 1) - 1) / 100 AS x,
         SQRT((ROW_NUMBER() OVER (ORDER BY 1) - 1) / 100) AS y
    FROM TABLE(GENERATOR(rowCount => 101))
),
deltas AS
(
  SELECT x,
         y,
         LEAD(x) OVER (ORDER BY x) - x AS delta_x,
         LEAD(y) OVER (ORDER BY x) - y AS delta_y
    FROM coordinates
),
partial_areas AS
(
  SELECT *,
         (delta_x * y) + (0.5 * delta_x * delta_y) AS partial_area
    FROM deltas
)
  SELECT SUM(partial_area) AS area
    FROM partial_areas
;

Discussion

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s