Mysql Insights: Auto Increment Stability

Friday, March 13, 2009

Auto Increment Stability

Over the last few years I have come to realize that Auto-Increment is a difficult problem for RDMS vendors to solve. When I was working with Sybase, the company I was with had a rule against using auto-increment for stability reasons.

Mysql storage engines seem to be a bit troublesome with auto increment as well. I believe that auto increment in Innodb was implemented using some global variables and mutexes which recently was patched.

For some fun with auto increment try executing the somewhat seemingly simple SQL on Innodb, listed at the end of this article. Which gives what to me appears to be unintuitive results.

The message here is not that the implementators of auto-increment have not done a good job, but that creating this functionality is actually really difficult. Personally I would recommend against using auto-increment in large scale complex applications, unless you really know what you are doing and are accepting the limitations that exist.

INTERESTING QUERY

drop table if exists t1;
CREATE TABLE t1
( id integer auto_increment primary key,
k integer NOT NULL,
INDEX k(k)
) engine = innodb;

insert into t1 (k) values (0);
insert into t1 (k) values (1);
insert into t1 (k) values (2);
insert into t1 (k) values (3);
insert into t1 (k) values (4);
insert into t1 (k) values (5);
insert into t1 (k) values (6);
insert into t1 (k) values (7);
insert into t1 (k) values (8);
insert into t1 (k) values (9);
insert into t1 (k) select k from t1;
insert into t1 (k) select k from t1;
select * from t1 order by id;

--------------------------------------------------------------
Here are the results I get. Notice the gaps in the numbers:

+----+---+
| id | k |
+----+---+
| 1 | 0 |
| 2 | 1 |
| 3 | 2 |
| 4 | 3 |
| 5 | 4 |
| 6 | 5 |
| 7 | 6 |
| 8 | 7 |
| 9 | 8 |
| 10 | 9 |
| 11 | 0 |
| 12 | 1 |
| 13 | 2 |
| 14 | 3 |
| 15 | 4 |
| 16 | 5 |
| 17 | 6 |
| 18 | 7 |
| 19 | 8 |
| 20 | 9 |
| 26 | 0 |
| 27 | 0 |
| 28 | 1 |
| 29 | 1 |
| 30 | 2 |
| 31 | 2 |
| 32 | 3 |
| 33 | 3 |
| 34 | 4 |
| 35 | 4 |
| 36 | 5 |
| 37 | 5 |
| 38 | 6 |
| 39 | 6 |
| 40 | 7 |
| 41 | 7 |
| 42 | 8 |
| 43 | 8 |
| 44 | 9 |
| 45 | 9 |
+----+---+

15 comments:

AnonymousMarch 13, 2009 at 11:01 AM
So it skips a few numbers... how is this a "stability" issue, exactly?
ReplyDelete
Replies
dbscienceMarch 13, 2009 at 11:14 AM
Oracle has a good implementation of, as Oracle calls them, sequences. The trick is to pre-generate them and cache them. The consumption of them is separate from the generation of them.

Plus, don't set the unreasonable expectation on them that they must not have gaps as that will cause nasty scalability issues.
ReplyDelete
Replies
Ivan NovickMarch 13, 2009 at 11:19 AM
Pre-generation sounds like an interesting idea.
ReplyDelete
Replies
qu1j0t3March 13, 2009 at 11:28 AM
I do not see this issue.

mysql> CREATE TABLE t1
-> ( id integer auto_increment primary key,
-> k integer NOT NULL,
-> INDEX k(k)
-> ) engine = innodb;
Query OK, 0 rows affected (0.40 sec)

mysql>
mysql> insert into t1 (k) values (0);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 (k) values (1);
Query OK, 1 row affected (0.01 sec)

mysql> insert into t1 (k) values (2);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 (k) values (3);
Query OK, 1 row affected (0.02 sec)

mysql> insert into t1 (k) values (4);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 (k) values (5);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 (k) values (6);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 (k) values (7);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 (k) values (8);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 (k) values (9);
Query OK, 1 row affected (0.01 sec)

mysql> insert into t1 (k) select k from t1;
Query OK, 10 rows affected (0.03 sec)
Records: 10 Duplicates: 0 Warnings: 0

mysql> insert into t1 (k) select k from t1;
Query OK, 20 rows affected (0.00 sec)
Records: 20 Duplicates: 0 Warnings: 0

mysql> select * from t1 order by id;
+----+---+
| id | k |
+----+---+
| 1 | 0 |
| 2 | 1 |
| 3 | 2 |
| 4 | 3 |
| 5 | 4 |
| 6 | 5 |
| 7 | 6 |
| 8 | 7 |
| 9 | 8 |
| 10 | 9 |
| 11 | 0 |
| 12 | 1 |
| 13 | 2 |
| 14 | 3 |
| 15 | 4 |
| 16 | 5 |
| 17 | 6 |
| 18 | 7 |
| 19 | 8 |
| 20 | 9 |
| 21 | 0 |
| 22 | 0 |
| 23 | 1 |
| 24 | 1 |
| 25 | 2 |
| 26 | 2 |
| 27 | 3 |
| 28 | 3 |
| 29 | 4 |
| 30 | 4 |
| 31 | 5 |
| 32 | 5 |
| 33 | 6 |
| 34 | 6 |
| 35 | 7 |
| 36 | 7 |
| 37 | 8 |
| 38 | 8 |
| 39 | 9 |
| 40 | 9 |
+----+---+
40 rows in set (0.01 sec)

mysql> select version();
+------------+
| version() |
+------------+
| 5.0.70-log |
+------------+

mysql> show create table t1\G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`id` int(11) NOT NULL auto_increment,
`k` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `k` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=41 DEFAULT CHARSET=utf8
ReplyDelete
Replies
Ivan NovickMarch 13, 2009 at 11:43 AM
Hmmm... I guess it is not completely reproduceable. I also tried the same thing on both my Windows and Linux boxes and got the same results ... the gaps in the auto-increment numbers are there.

Thats kind of the entire point. There is some tricky logic in the auto-increment internals.
ReplyDelete
Replies
AnonymousMarch 13, 2009 at 11:53 AM
Another MySQL / innodb problem (that's bit me) is if you insert some rows, they get id's then delete the top few id's now restart the DB.

You will get max(id) + 1 -- even if that id has been used before.

usually not a problem but it could be if you're counting on having a new unique id.
ReplyDelete
Replies
Ivan NovickMarch 13, 2009 at 12:11 PM
Yeah I guess they start the counter at boot time with the largest value in the actual data. There is no other persistent data that stores the last counter.
ReplyDelete
Replies
AnonymousMarch 13, 2009 at 12:35 PM
It's not only the implementation of AUTOINC that is difficult, it's their use and meaning.

It seems many people associate auto-increment values as having some correlation with physical ordering. A relational database does not guarantee the order in which rows are stored (or retrieved). You can only really rely on ordering with an ORDER BY clause.

Auto-incremented values are guaranteed to be unique, and that's all. It isn't necessarily the case that they will be monotonically increasing and it is unwise to expect or rely on the lack of gaps.

What's more, "numbers are cheap". There are infinitely many of them. ;-) Gaps should not be a concern if all you're looking for is a unique identifier.

If you try to assign numbers without gaps, you cause a serialization (bottleneck) point. If a transaction rolls back, what happens to the allocated numbers? What about committed rows that are later deleted? Each scenario would leave gaps in the rows in the database.

If you need a number like a real-world serial number (say on an order or invoice, or legal document) without gaps, don't rely on auto-increment numbers. Instead, it would make sense to pre-generate a number of stored rows of "not-yet-used documents" and then allocate one (perhaps the highest one) when you need it. This avoids the serialization.

Please review the MySQL documentation for how InnoDB handles auto-increment: http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html.
ReplyDelete
Replies
dbscienceMarch 13, 2009 at 4:09 PM
Ken, I'm going to have to disagree with you that pre-allocating serial numbers in a table avoids serialization. To ensure there are no gaps that table must effectively have a table lock on it. This is to avoid the case where number 1 is assigned but not committed to a document, then number 2 is assigned for the next document, and then number 1 is rolled back.

It avoids the case where an identity will cause gaps, but it doesn't avoid the serialization issue.

The fact that "serial" number are being generated implies that serialization must occur.

Since you understand this issue, I suspect you imply a more complicated solution where documents are not rolled back or deleted but instead just marked as deleted, eliminating all gaps.

Having avoided this issue due to scalability problems, I'm not sure how you can eliminate all rollback scenarios, however.
ReplyDelete
Replies
hingoMarch 15, 2009 at 7:22 AM
dbscience: Nobody is trying to avoid gaps and in a highly concurrent system we are not even trying to guarantee monotonity. We just want unique numbers and want them fast. Everything else is just a big misunderstanding.
ReplyDelete
Replies
qu1j0t3March 16, 2009 at 1:29 PM
There is an example of an emulated SEQUENCE given in the InnoDB section of the MySQL manual. That achieves a concurrency-safe gap-free series.
ReplyDelete
Replies
AnonymousMarch 17, 2009 at 3:19 PM
Right, dbscience, I meant to say the approach I outlined avoids gaps (not serialization). I assume that if you need a unique document, you pick one of the not yet used ones and set it to used (as part of a transaction that does some other things like assign a document name, etc.). The app would be designed never to delete the document, and record its history (e.g., revisions, obsolescence, etc.). Thus, no gaps.

However, with this approach, the only serialization would be with respect to a specific document. Two sessions couldn't claim the same unused document at the same time. This is "less bad" than the case where where multiple users simply inserting rows into a table.

Instead of a table lock, I think you could do it with row locks. So, one session might

SELECT dodid FROM doclist
WHERE status = 'UNUSED'
FOR UPDATE;

This would lock the first available doc, and the application could (quickly) do what it needed to that document and commit. (You could reduce contention with additional WHERE clause predicates (e.g., look for odd-numbered documents).)

Other sessions doing the same thing would wait for that first transaction to commit. Then they'd find that the document they were waiting for was no longer unused, and subsequently acquire a different document.

So yes, there is a serialization, but it's not so bad as what happens with autoinc to get a unique key. The autoinc does ensure uniqueness, but if you don't want gaps, I think you have to do something like this and you can at least minimize serialization.
ReplyDelete
Replies
AnonymousMarch 20, 2009 at 12:36 PM
"On MySQL Insights, Ivan Novick looks into the matter of auto increment stability..."

Log Buffer #140
ReplyDelete
Replies
SahilJuly 17, 2009 at 2:41 AM
I am not much familier with autoincrement feature of MySQL but as per my understanding for the example you provided.

The field 'k' is indexed so when MySQL will execute 'insert into t1 (k) select k from t1', it will append 'order by k' automatically at the end of select query.

So the output of above example is correct.

(only my 2 cents)
ReplyDelete
Replies
AnonymousJanuary 17, 2022 at 6:20 AM
holland to win euros odds - Vntopbet.com bk8 bk8 우리카지노 계열사 우리카지노 계열사 온라인카지노 온라인카지노 7392Mohegan Sun Novelle - ThauberBet
ReplyDelete
Replies

Add comment

Mysql Insights

Friday, March 13, 2009

Auto Increment Stability

15 comments:

About Me

Blog Archive

Mysql Dev Blogs

Followers