Sunday, January 25, 2009

Mysql BLOBs and memory allocation

BLOBs have performance limits due to the way they are currently handled by mysql. First I will give you a little background info, then describe the problem, and potential solutions.

BLOB is an acronym for Binary Large Object. It is a datatype available in Mysql in 4 different sizes. There are also 4 text data types that are almost the same as BLOBs accept they contain string information instead of binary information. Text also has different comparison characteristics than BLOBs since it is a string oriented type. BLOBs can be used to store JPEG files or any other large binary data. BLOB data types come in 4 sizes: tiny, regular, medium, and long which have size limitations respectively of 25 bytes, 64 KB, 16 MB, and 4GB.

Ok, so here is the issue: when a mysql storage engine returns a BLOB from disk to mysql it has to provide a contiguous buffer where the data is stored and a pointer to that buffer. In other words "The entire BLOB has to fit into memory at once!!" even for a 4GB long blob. Now imaging you have 100 different 4GB long BLOB files (let's imagine movie files) and there are select statements being run simultaneously to retrieve all 100 files. Your storage engine would need to allocate 400 GB of memory to put all those BLOBs into. Clearly, this is not scalable since boxes today cap out around 64GB of ram.

So what is the solution? Conceptually, the storage engine needs to give the data back to the user in chunks. Mysql could have some fixed buffers, say 100 MB large and the storage engine could pump data into the buffers as the last chunk was sent to the client over the network. Seems pretty straight forward, but it does require changes in mysql server and maybe mysql client as well as all of the storage engines. I wonder if this is already on Mysql's roadmap (anyone know, please leave a comment).

One storage engine did try work around the problem. The PBXT storage engine has created a streaming BLOB infrastructure that seems to be pretty generic. The problem is it requires changes to the mysql client and server in addition to the storage engine and those changes don't seem to be integrated into the main line mysql development branch as far as I can tell.

Any comments or additional ideas on the subject please post a comment. Thanks,
Ivan

5 comments:

  1. Thanks for giving good information about blob memoryallocation

    ReplyDelete
  2. Excellent information. I have another question which I am not sure if you are the best person to ask but you seem to be very knowledgeable in the area. I am trying to make a script that compares two HTML pages for differences (or similarities) and I plan to store the whole HTML page (excluding images and other assets so it is not that big) and I plan to track changes over time to old copies are saved. Since I will be doing some sort of comparison, do you think a blob is ideal for this or should I do text instead?

    ReplyDelete
  3. Is this problem hasn't any scheduled fix yet?

    ReplyDelete
  4. I haven't heard of any proposed solutions that integrate with INNODB or MYISAM, but I think the PBMS storage engine has a solution for it.

    http://www.blobstreaming.org

    ReplyDelete
  5. Assume i have a blob datatype in my table, will the query still perform bad if i retrieve fields other than the blob field.

    ReplyDelete