Friday, March 13, 2009

Auto Increment Stability

Over the last few years I have come to realize that Auto-Increment is a difficult problem for RDMS vendors to solve. When I was working with Sybase, the company I was with had a rule against using auto-increment for stability reasons.

Mysql storage engines seem to be a bit troublesome with auto increment as well. I believe that auto increment in Innodb was implemented using some global variables and mutexes which recently was patched.

For some fun with auto increment try executing the somewhat seemingly simple SQL on Innodb, listed at the end of this article. Which gives what to me appears to be unintuitive results.

The message here is not that the implementators of auto-increment have not done a good job, but that creating this functionality is actually really difficult. Personally I would recommend against using auto-increment in large scale complex applications, unless you really know what you are doing and are accepting the limitations that exist.

INTERESTING QUERY

drop table if exists t1;
CREATE TABLE t1
( id integer auto_increment primary key,
k integer NOT NULL,
INDEX k(k)
) engine = innodb;

insert into t1 (k) values (0);
insert into t1 (k) values (1);
insert into t1 (k) values (2);
insert into t1 (k) values (3);
insert into t1 (k) values (4);
insert into t1 (k) values (5);
insert into t1 (k) values (6);
insert into t1 (k) values (7);
insert into t1 (k) values (8);
insert into t1 (k) values (9);
insert into t1 (k) select k from t1;
insert into t1 (k) select k from t1;
select * from t1 order by id;

--------------------------------------------------------------
Here are the results I get. Notice the gaps in the numbers:

+----+---+
| id | k |
+----+---+
| 1 | 0 |
| 2 | 1 |
| 3 | 2 |
| 4 | 3 |
| 5 | 4 |
| 6 | 5 |
| 7 | 6 |
| 8 | 7 |
| 9 | 8 |
| 10 | 9 |
| 11 | 0 |
| 12 | 1 |
| 13 | 2 |
| 14 | 3 |
| 15 | 4 |
| 16 | 5 |
| 17 | 6 |
| 18 | 7 |
| 19 | 8 |
| 20 | 9 |
| 26 | 0 |
| 27 | 0 |
| 28 | 1 |
| 29 | 1 |
| 30 | 2 |
| 31 | 2 |
| 32 | 3 |
| 33 | 3 |
| 34 | 4 |
| 35 | 4 |
| 36 | 5 |
| 37 | 5 |
| 38 | 6 |
| 39 | 6 |
| 40 | 7 |
| 41 | 7 |
| 42 | 8 |
| 43 | 8 |
| 44 | 9 |
| 45 | 9 |
+----+---+

15 comments:

  1. So it skips a few numbers... how is this a "stability" issue, exactly?

    ReplyDelete
  2. Oracle has a good implementation of, as Oracle calls them, sequences. The trick is to pre-generate them and cache them. The consumption of them is separate from the generation of them.

    Plus, don't set the unreasonable expectation on them that they must not have gaps as that will cause nasty scalability issues.

    ReplyDelete
  3. Pre-generation sounds like an interesting idea.

    ReplyDelete
  4. I do not see this issue.

    mysql> CREATE TABLE t1
    -> ( id integer auto_increment primary key,
    -> k integer NOT NULL,
    -> INDEX k(k)
    -> ) engine = innodb;
    Query OK, 0 rows affected (0.40 sec)

    mysql>
    mysql> insert into t1 (k) values (0);
    Query OK, 1 row affected (0.00 sec)

    mysql> insert into t1 (k) values (1);
    Query OK, 1 row affected (0.01 sec)

    mysql> insert into t1 (k) values (2);
    Query OK, 1 row affected (0.00 sec)

    mysql> insert into t1 (k) values (3);
    Query OK, 1 row affected (0.02 sec)

    mysql> insert into t1 (k) values (4);
    Query OK, 1 row affected (0.00 sec)

    mysql> insert into t1 (k) values (5);
    Query OK, 1 row affected (0.00 sec)

    mysql> insert into t1 (k) values (6);
    Query OK, 1 row affected (0.00 sec)

    mysql> insert into t1 (k) values (7);
    Query OK, 1 row affected (0.00 sec)

    mysql> insert into t1 (k) values (8);
    Query OK, 1 row affected (0.00 sec)

    mysql> insert into t1 (k) values (9);
    Query OK, 1 row affected (0.01 sec)

    mysql> insert into t1 (k) select k from t1;
    Query OK, 10 rows affected (0.03 sec)
    Records: 10 Duplicates: 0 Warnings: 0

    mysql> insert into t1 (k) select k from t1;
    Query OK, 20 rows affected (0.00 sec)
    Records: 20 Duplicates: 0 Warnings: 0

    mysql> select * from t1 order by id;
    +----+---+
    | id | k |
    +----+---+
    | 1 | 0 |
    | 2 | 1 |
    | 3 | 2 |
    | 4 | 3 |
    | 5 | 4 |
    | 6 | 5 |
    | 7 | 6 |
    | 8 | 7 |
    | 9 | 8 |
    | 10 | 9 |
    | 11 | 0 |
    | 12 | 1 |
    | 13 | 2 |
    | 14 | 3 |
    | 15 | 4 |
    | 16 | 5 |
    | 17 | 6 |
    | 18 | 7 |
    | 19 | 8 |
    | 20 | 9 |
    | 21 | 0 |
    | 22 | 0 |
    | 23 | 1 |
    | 24 | 1 |
    | 25 | 2 |
    | 26 | 2 |
    | 27 | 3 |
    | 28 | 3 |
    | 29 | 4 |
    | 30 | 4 |
    | 31 | 5 |
    | 32 | 5 |
    | 33 | 6 |
    | 34 | 6 |
    | 35 | 7 |
    | 36 | 7 |
    | 37 | 8 |
    | 38 | 8 |
    | 39 | 9 |
    | 40 | 9 |
    +----+---+
    40 rows in set (0.01 sec)

    mysql> select version();
    +------------+
    | version() |
    +------------+
    | 5.0.70-log |
    +------------+

    mysql> show create table t1\G
    *************************** 1. row ***************************
    Table: t1
    Create Table: CREATE TABLE `t1` (
    `id` int(11) NOT NULL auto_increment,
    `k` int(11) NOT NULL,
    PRIMARY KEY (`id`),
    KEY `k` (`k`)
    ) ENGINE=InnoDB AUTO_INCREMENT=41 DEFAULT CHARSET=utf8

    ReplyDelete
  5. Hmmm... I guess it is not completely reproduceable. I also tried the same thing on both my Windows and Linux boxes and got the same results ... the gaps in the auto-increment numbers are there.

    Thats kind of the entire point. There is some tricky logic in the auto-increment internals.

    ReplyDelete
  6. Another MySQL / innodb problem (that's bit me) is if you insert some rows, they get id's then delete the top few id's now restart the DB.

    You will get max(id) + 1 -- even if that id has been used before.

    usually not a problem but it could be if you're counting on having a new unique id.

    ReplyDelete
  7. Yeah I guess they start the counter at boot time with the largest value in the actual data. There is no other persistent data that stores the last counter.

    ReplyDelete
  8. It's not only the implementation of AUTOINC that is difficult, it's their use and meaning.

    It seems many people associate auto-increment values as having some correlation with physical ordering. A relational database does not guarantee the order in which rows are stored (or retrieved). You can only really rely on ordering with an ORDER BY clause.

    Auto-incremented values are guaranteed to be unique, and that's all. It isn't necessarily the case that they will be monotonically increasing and it is unwise to expect or rely on the lack of gaps.

    What's more, "numbers are cheap". There are infinitely many of them. ;-) Gaps should not be a concern if all you're looking for is a unique identifier.

    If you try to assign numbers without gaps, you cause a serialization (bottleneck) point. If a transaction rolls back, what happens to the allocated numbers? What about committed rows that are later deleted? Each scenario would leave gaps in the rows in the database.

    If you need a number like a real-world serial number (say on an order or invoice, or legal document) without gaps, don't rely on auto-increment numbers. Instead, it would make sense to pre-generate a number of stored rows of "not-yet-used documents" and then allocate one (perhaps the highest one) when you need it. This avoids the serialization.

    Please review the MySQL documentation for how InnoDB handles auto-increment: http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html.

    ReplyDelete
  9. Ken, I'm going to have to disagree with you that pre-allocating serial numbers in a table avoids serialization. To ensure there are no gaps that table must effectively have a table lock on it. This is to avoid the case where number 1 is assigned but not committed to a document, then number 2 is assigned for the next document, and then number 1 is rolled back.

    It avoids the case where an identity will cause gaps, but it doesn't avoid the serialization issue.

    The fact that "serial" number are being generated implies that serialization must occur.

    Since you understand this issue, I suspect you imply a more complicated solution where documents are not rolled back or deleted but instead just marked as deleted, eliminating all gaps.

    Having avoided this issue due to scalability problems, I'm not sure how you can eliminate all rollback scenarios, however.

    ReplyDelete
  10. dbscience: Nobody is trying to avoid gaps and in a highly concurrent system we are not even trying to guarantee monotonity. We just want unique numbers and want them fast. Everything else is just a big misunderstanding.

    ReplyDelete
  11. There is an example of an emulated SEQUENCE given in the InnoDB section of the MySQL manual. That achieves a concurrency-safe gap-free series.

    ReplyDelete
  12. Right, dbscience, I meant to say the approach I outlined avoids gaps (not serialization). I assume that if you need a unique document, you pick one of the not yet used ones and set it to used (as part of a transaction that does some other things like assign a document name, etc.). The app would be designed never to delete the document, and record its history (e.g., revisions, obsolescence, etc.). Thus, no gaps.

    However, with this approach, the only serialization would be with respect to a specific document. Two sessions couldn't claim the same unused document at the same time. This is "less bad" than the case where where multiple users simply inserting rows into a table.

    Instead of a table lock, I think you could do it with row locks. So, one session might

    SELECT dodid FROM doclist
    WHERE status = 'UNUSED'
    FOR UPDATE;

    This would lock the first available doc, and the application could (quickly) do what it needed to that document and commit. (You could reduce contention with additional WHERE clause predicates (e.g., look for odd-numbered documents).)

    Other sessions doing the same thing would wait for that first transaction to commit. Then they'd find that the document they were waiting for was no longer unused, and subsequently acquire a different document.

    So yes, there is a serialization, but it's not so bad as what happens with autoinc to get a unique key. The autoinc does ensure uniqueness, but if you don't want gaps, I think you have to do something like this and you can at least minimize serialization.

    ReplyDelete
  13. "On MySQL Insights, Ivan Novick looks into the matter of auto increment stability..."

    Log Buffer #140

    ReplyDelete
  14. I am not much familier with autoincrement feature of MySQL but as per my understanding for the example you provided.

    The field 'k' is indexed so when MySQL will execute 'insert into t1 (k) select k from t1', it will append 'order by k' automatically at the end of select query.

    So the output of above example is correct.

    (only my 2 cents)

    ReplyDelete
  15. holland to win euros odds - Vntopbet.com bk8 bk8 우리카지노 계열사 우리카지노 계열사 온라인카지노 온라인카지노 7392Mohegan Sun Novelle - ThauberBet

    ReplyDelete