Django Database Performance Optimization

As a project grows, our database size turns larger and we may experience longer waiting time when query an API.
Then it is time to make some performance optimizations. DB query is one of the items that causes performance reduction.

Profiling

The fisrt thing we need to do to address performance issues is profiling. We need to know what are the SQL queries we executes and
what how long does executing a query take.

profiling tools

silk: It contains SQL Inspection and cProfile. On SQL Inspection part, it is easy for you to know what are the actual SQL you executes and which line of
code triggers the query.

Retrieve by indexed column

Try to use

1	Employee.objects.filter(id__in=[id1, id2, ...])

instead of

1	Employee.objects.filter(name__in=[name1, name2, ...])

Avoid N+1 queries

N+1 queries are caused by retrieve foreign key or many to many key model properties. For each model instance, it will produce 1 extra query.
Therefore, if we have N model instances, then it will end up N + 1 queries.

`select_related()`, `prefetch_related()`, `Prefetch()` object

select_related() implements joins tables on database level, while prefetch_related() implements joins on Python code level.

Note that if you use iterator() to run the query, prefetch_related() calls will be ignored since these two optimizations do not make sense together.

Besides, for prefetch_related(), it needs to be noted that it only works for QuerySet all(), any subsequent chained methods which imply a different database query will ignore previously cached results, and retrieve data using a fresh database query.

For example:

1 2	>>> pizzas = Pizza.objects.prefetch_related('toppings') >>> [list(pizza.toppings.filter(spicy=True)) for pizza in pizzas]

Here, pizza.toppings.all() is the cached result. pizza.toppings.filter(spicy=True) will produce a new query.

What if we need to pre-fetch from sub query set instead of all query set like above or control the queryset that order by a specific field? Prefetch() object can help.

1 2	spicy_toppings = Toppings.objects.filer(spicy=True) pizzas = Pizza.objects.prefetch_related(Prefetch('toppings', queryset=spicy_toppings, to_attr='spicy_toppings'))

Note that assign the prefetched result to a custom attribute with the optional to_attr argument. The result will be stored directly in a list and it can improve speed proformance.

caching, @cached_property, @lru_cache()

Retrieving data from cache is much faster than retrieving from database. One thing that needs to keep in mind is that when some data is updated/deleted/created from the database, we should modified tha related data from cache.

Django provides @cached_property decorator that allows to cache the result of a class method(without parameters) as a property.

When we wanna cache methods with parameters, we can use @lru_cache().

Only retrieve fields you need

Try to use `values()` or `values_list()`

When you know you only need some specific fields, use values() or values_list().

Avoid a single model serializer with multi-layer nested serizlizers

Try to provide multiple serializers for a model to support different functionalities

ListSerializer: no more than one layer nested serializers (Remember to use select_related() and prefetch_related() on the queryset)
ReadDetailSerializer: can have multi layer nested serializers (Remember to use select_related() and prefetch_related() on the queryset)
WriteDetailSerializer (for creating or updating): no nested serializers

Further reading list:
Django DB Optimation Docs

Django Database Performance Optimization

Django Database Performance Optimization

Profiling

profiling tools

Retrieve by indexed column

Avoid N+1 queries

select_related(), prefetch_related(), Prefetch() object

caching, @cached_property, @lru_cache()

Only retrieve fields you need

Try to use values() or values_list()

Avoid a single model serializer with multi-layer nested serizlizers

`select_related()`, `prefetch_related()`, `Prefetch()` object

Try to use `values()` or `values_list()`