Database Isolation Levels And Their Effects on Performance and Scalability
Some of us are not aware of the tremendous job databases perform, particularly their efforts to maintain the Isolation aspect of ACID. For example, some people believe that transactions are only related to data manipulation and not to queries, which is an incorrect assumption. Transaction Isolation is all about queries, and the consistency and completeness of the data retrieved by queries. This is how it works:
Isolation gives the querying user the feeling that he owns the database. It does not matter that hundreds or thousands of concurrent users work with the same database and the same schema (or even the same data). These other uses can generate new data, modify existing data or perform any other action. The querying user must be able to get a complete, consistent picture of the data, unaffected by other users’ actions.
Let’s take the following scenario, which is based on an Orders table that has 1,000,000 rows, with a disk size of 20 GB:
- 8:00: UserA started a query “SELECT * FROM orders”, which queries all the rows of the table. In our scenario, this query usually takes approximately five minutes to complete, as the database must fully scan the table’s blocks from start to end and extract the rows. This is called a FULL TABLE SCAN query, and is not recommended from a performance perspective.
- 8:01: UserB updates the last row in the in the Orders table, and commits the change.
- 8:04: UserA’s query process arrives at the row modified by UserB. What will happen?
Any guess? Will UserA get the original row value or the new row value? The new row value is legitimate and committed, but it was updated after UserA’s query started.
The answer is not clear cut, and depends on the isolation level of the transaction. There are four isolation levels, as follows (see more information at: http://en.wikipedia.org/wiki/Isolation_(database_systems):
- READ UNCOMMITTED: UserA will see the change made by UserB. This isolation level is called dirty reads, which means that read data is not consistent with other parts of the table or the query, and may not yet have been committed. This isolation level ensures the quickest performance, as data is read directly from the table’s blocks with no further processing, verifications or any other validation. The process is quick and the data is asdirty as it can get.
- READ COMMITTED: UserA will not see the change made by UserB. This is because in the READ COMMITTED isolation level, the rows returned by a query are the rows that were committed when the query was started. The change made by UserB was not present when the query started, and therefore will not be included in the query result.
- REPEATABLE READ: UserA will not see the change made by UserB. This is because in the REPEATABLE READ isolation level, the rows returned by a query are the rows that were committed when the transaction was started. The change made by UserB was not present when the transaction was started, and therefore will not be included in the query result.
This means that “All consistent reads within the same transaction read the snapshot established by the first read” (from MySQL documentation. See http://dev.mysql.com/doc/refman/5.1/en/innodb-consistent-read.html). - SERIALIZABLE: This isolation level specifies that all transactions occur in a completely isolated fashion, meaning as if all transactions in the system were executed serially, one after the other. The DBMS can execute two or more transactions at the same time only if the illusion of serial execution can be maintained.
In practice, SERIALIZABLE is similar to REPEATABLE READ, but uses a different implementation for each database engine. In Oracle, the REPEATABLE READ level is not supported and SERIALIZABLE provides the highest isolation level. This level is similar to REPEATABLE READ, but InnoDB implicitly converts all plain SELECT statements to “SELECT … LOCK IN SHARE MODE.
The default isolation level in MySQL’s InnoDB is REPEATABLE READ, which provides a rather high isolation. Consider again this key sentence for the REPEATABLE READ isolation level: “All consistent reads within the same transaction read the snapshot established by the first read”.
Since old values of row data are required for current queries, databases use a special segment to store old row values and snapshots. MySQL calls this segment a Rollback Segment. Oracle once called it this as well, but now calls it an Undo Segment. The premise is the same.
During query execution, each row is examined and if it is found to be too new, an older version of this row is extracted from the rollback segment to comprise the query result. This examination‑lookup‑comprise action chain takes time to complete, resulting in a performance penalty. It also produces a snowball effect. Updates occur during a query, which makes that query slower so that it takes more time. During the time it takes to process the query, more updates come in, making query execution time even longer!
This is why a query that executes in 10 seconds in our testing environment may take a full 10 minutes to execute in a much stronger production environment.
You can read the entire post here - http://www.scalebase.com/isolation-levels-in-relational-databases/