python

json deep copy: An alternative solution to deep copy

The other day, I ran into a bug that involved modifying and returning a dictionary to meet certain requirements without knowing that it was being updated and saved into the database later on. The quick fix would have been to deep copy the dictionary, modify and return the copied version while keeping the original one untouched. However, I’ve learned here at Jana that deep copying is extremely expensive and slow.

While I managed to solve the original bug by reordering the code so that the part that updates and saves to the database is executed first and that the dictionary is modified right before it is returned, there were other parts of our codebase where deep copying seemed to be inevitable. That’s when I came across a function that our CTO Craig wrote a while back as part of our effort to minimize response time.


import ujson
import json
import copy
import timeit
import random
import numpy as np

def update_dict(data_copy, data):
    for k, v in data_copy.iteritems():
        if v != data[k]:
            if isinstance(v, dict):
                update_dict(v, data[k])
            elif isinstance(v, list):
                update_list(v, data[k])
            elif isinstance(v, float):
                data_copy[k] = data[k]

def update_list(data_copy, data):
    for i, value in enumerate(data_copy):
        if value != data[i]:
            if isinstance(value, dict):
                update_dict(value, data[i])
            elif isinstance(value, list):
                update_list(value, data[i])
            elif isinstance(value, float):
                data_copy[i] = data[i]

def json_deep_copy(data):
    if data is None:
        return data

    #  precise_float is slower, but we get more reports of diffs
    #  without it (floats being floats)
    try:
        data_copy = ujson.loads(
            ujson.dumps(data, double_precision=15),
            precise_float=True)
        if isinstance(data_copy, list):
            update_list(data_copy, data)
        else:
            update_dict(data_copy, data)
    except OverflowError:
        data_copy = json.loads(json.dumps(data))
    except Exception:
        print ("non-json safe object passed. falling back to deepcopy")
        data_copy = copy.deepcopy(data)
    return data_copy

data = {
    'name': 'Jana',
    'completed_offers': [12, 15, 17, 23, 37],
    'balance': {
        'currency': 'USD',
        'amount': 8643
    },
    'age': 25,
    'country': 'US',
    'random_number': random.random()
}

original_deep_copy_trial = timeit.Timer(
    "copy.deepcopy(data)", "from __main__ import copy, data"
)
json_deep_copy_trial = timeit.Timer(
    "json_deep_copy(data)", "from __main__ import json_deep_copy, data"
)

print np.mean(original_deep_copy_trial.repeat(10, number=10000))
# 0.2825556867599
print np.mean(json_deep_copy_trial.repeat(10, number=10000))
# 0.0869469165802

We can see that this customized json_deep_copy  function is more than 3 times faster than the original copy.deepcopy method!

Here’s also another blog post I found about the same topic.

In short, if you need to deepcopy, rethink your logic and avoid using it. If it’s still inevitable, use an alternative solution like json_deep_copy.

Tags:

Discussion

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s