Normalized updates are usually faster than denormalized updates.
• When the data is well normalized, there’s little or no duplicated data, so there’s
less data to change.
• Normalized tables are usually smaller, so they fit better in memory and perform
better.
• The lack of redundant data means there’s less need for DISTINCT or GROUP BY que-
ries when retrieving lists of values. Consider the preceding example: it’s impossi-
ble to get a distinct list of departments from the denormalized schema without
DISTINCT or GROUP BY, but if DEPARTMENT is a separate table, it’s a trivial query.
The drawbacks of a normalized schema usually have to do with retrieval. Any non-
trivial query on a well-normalized schema will probably require at least one join, and
perhaps several. This is not only expensive, but it can make some indexing strategies
impossible. For example, normalizing may place columns in different tables that
would benefit from belonging to the same index.
The problem is the join, which is keeping you from sorting and filtering simulta-
neously with a single index. If you denormalize the data by combining the tables and
add an index on (account_type, published), you can write the query without a join.
This will be very efficient.
Given that both normalized and denormalized schemas have benefits and draw-
backs, how can you choose the best design?
The truth is, fully normalized and fully denormalized schemas are like laboratory
rats: they usually have little to do with the real world. In the real world, you often
need to mix the approaches, possibly using a partially normalized schema, cache
tables, and other techniques.
The most common way to denormalize data is to duplicate, or cache, selected col-
umns from one table in another table. In MySQL 5.0 and newer, you can use trig-
gers to update the cached values, which makes the implementation easier.
Faster Reads, Slower Writes
You’ll often need extra indexes, redundant fields, or even cache and summary tables
to speed up read queries. These add work to write queries and maintenance jobs, but
this is still a technique you’ll see a lot when you design for high performance: you
amortize the cost of the slower writes by speeding up reads significantly.
However, this isn’t the only price you pay for faster read queries. You also increase
development complexity for both read and write operations.
