Optimizer - New Optimizer Design - The new design of the optimizer has both advantages and disadvantages when compared to the old one.
In the new design the optimizer is easier to test, and is more likely to give the correct answers, for example if you are running a query that should return 50 rows it is much less likely to return 52 rows due to an error in a statistics update.
On the other hand the new optimizer is harder to code and debug, it is more difficult to test as it is harder to define the correct configuration for the optimizer, and it is harder to see what exactly is happening in the optimizer.
The old optimizer ran the query in batches as groups of rows, where the batch size was set by the optimizer, and then sent the results of that batch to the application for processing.
Each batch was run as a separate, independent, transaction. The optimizer would test one batch, then wait for the application to send the results of that batch back to the optimizer for the next batch, and so on.
The optimizer would use its statistics information to estimate how many rows would be returned in each batch.
The optimizer kept a cache of the most recently returned rows, and used this information to determine how many rows would be returned for the next batch. This meant that if the optimizer was out of date with respect to the statistics, then it would be slower than necessary.
The new optimizer runs the entire query, including the part that loads the rows to the application, in the same transaction. This eliminates the batching requirement, and the optimizer runs the query as though the rows are all returned at the same time. This means that the optimizer is out of date if there are any new rows added to the rowset of the query.
The new optimizer is running more of the query in the optimizer than in the previous optimizer, because it is running the entire query. It will run more code and do more things. This is not always a good thing.
The optimizer always processes the table in chunks, up to a maximum number. This means that in a simple query, the optimizer can determine that no rows will be returned from the table.
In a more complex query, when the optimizer cannot determine that no rows will be returned, then it must use statistics information to estimate how many rows will be returned. This is time consuming 0b46394aab