Thursday, February 19, 2009

Understanding UNIV_LIKELY and other Optimization Macros in the Innodb Codebase

When you first start browsing the code for Innodb (a Mysql Storage Engine) you will notice several strange macros like UNIV_LIKELY popping up all over the place. Its good to get clear on what those mean so as not to distract you from understanding the actual logic of the code you are reading.

The set of macros described below all wrap GCC builtin functions which are used for performance optimization. The macros are defined as blank space on other platforms not supporting these builtin functions. The macros are in two categories the expect and prefetch macros which I will describe separately:


EXPECT MACROS

These macros take expressions and return the exact same value as the expression would return without the macros. From the user's point of view, they are the same as just enclosing the expression in an extra set of parenthesis. However the macros, provide hints to the optimizer about the expected value of the expression which lets them optimize branches.

UNIV_EXPECT(expr, constant)
This is a wrapper around the gnu builtin function '__builtin_expect' . The macro is intended to be used within an if condition or other branching statement to tell the compiler the expected outcome of the expression. The compiler than uses that information to optimize the generated code for that outcome. The args to the function UNIV_EXPECT are an expression that will be evaluated and the expected output which must be a compile time constant integer. Here is an example of how to use this macro:
if (UNIV_EXPECT(retCode, 0)){

fprintf(stderr, "failure, got a non-zero return code\n");

}
This will evaluate the same as doing just if (retCode) but the optimizer will understand that retCode is most likely to always be 0.

UNIV_LIKELY(cond)
The macro returns the value of the condition, and indicates a hint to the compiler that the likely value of the condition is TRUE or 1

UNIV_UNLIKELY(cond)
The macro returns the value of the condition, and indicates a hint to the compiler that the likely value of the condition is FALSE or 0

UNIV_LIKELY_NULL(ptr)
The macro returns the value of the ptr, and indicates a hint to the compiler that the likely value of the ptr is NULL.



Prefetch macros

There are two prefetch macros in Innodb, that are both wrappers around GCC's '__builtin_prefetch'. By telling the system you will be reading or writing to a given address in the near future, the compiler/hardware can try to pre-fetch this address, making the actual memory accesses faster.

UNIV_PREFETCH_R(addr)
Give a hint to the compiler that the addr specified will soon be accessed for reading.

UNIV_PREFETCH_RW(addr)
Give a hint to the compiler that the addr specified will soon be accessed for writing.



More details on the inner workings of these Macros can be found in the GCC documentation, Builtin Functions section. Just look for info about '__builtin_expect' and '__builtin_prefetch'.

Now that you will not be distracted by these macros anymore, Happy Browsing,

Ivan Novick

Wednesday, February 4, 2009

Understanding the InnoDB Source Code Structure

Innodb source code is organized into directories where each directory holds the C source code files for a given module. Within the directory are 1 or more files that are part of the module. The file names have a structure which I will describe below.

The first part of the file name indicates the module name. The module name is followed by a '0' character which is a separator. The second part of the file name represents the sub-module. Most modules have one file where the sub-module is the same name as the module, this file represents the primary file in the module.

For, example the main file for Btree is located in the btr directory and named: btr0btr.c
The first btr indicates the module name, the 0 is a separator and the the second btr is the sub module name which is the same for the primary file.

The file that handles the Btree cursor is also located in the btr directory and named: btr0cur.c
The first btr is the module name, the 0 is a separator, and the cur is the sub module name which stands for cursor in this case.

There is also a include directory which contains all the header files. The header files are also named in a similar way.

For example, the file 'btr0cur.h' is the header file for the Btree cursor module. Note the include files contain both '.h' files and '.ic' files which are included C files.

Now that we understand the naming conventions, I am going to give you a list of all the modules in the Innodb source code base and a description of what they are for. The list below is sorted by lines of code in the module, with the biggest modules coming first.

INNODB MODULES

row

Row Abstraction, 19,768 lines
sub-modules: Updates, Undo, Undo Modify, Undo Insert, Select, Purge, Mysql Interface
The logic for the mysql row formatting and the innodb row formatting is quite lengthy. This module also seems to have a lot of the high-level business logic for Innodb.

trx

Transactions, 13,138 lines
sub-modules: Rollback, Rollback Segment, Undo, Log Records, Purge
As Innodb is a transactional storage engine there is a lot of logic to implement this

btr

Btree data structure, 12,228 lines
sub-modules: Btree Cursor, Btree Persistent Cursor, Btree Adaptive Search
Btree is the index of Innodb and is core functionality

pars
SQL parser, 11,691 lines
sub-modules: Symbol Table, Optimizer, Lexical Analyzer, Grammar
I am pretty sure you can ignore this directory if you are using innodb with mysql, as it is dead code in that case (please do correct me if I am wrong)

dict
Data Dictionary (meta-data), 10,446 lines
sub-modules: Boot, Creation, Load, Memory
Table names, column names, key names, etc. all in this code

handler

Mysql Storage Engine Interface, 8498 lines
This is the primary interface between Mysql and the innodb storage engine and the entry point for all mysql API calls.

log
Database Log, 8379 lines
sub-modules: Recovery
Database logging is core functionality

buf
Buffer Pool, 7784 lines
sub-modules: Buffer Flush Algorithm, Buffer Replacement Algorithm, Buffer Read Abstraction

os
Operating System Interface, 7659 lines
sub-modules: Files, Processes, Threads, Synchronization
This is the fun stuff, all the low level OS specific code

lock

Transaction Lock System, 6224 lines
sub-modules: Lock Queue Iterator

page
Index Page, 5675 lines
sub-modules: Page Cursor

srv
Main Server Driver, 5469 lines
sub-modules: Startup, Query Execution
Look here for configuration option handling coding and other startup issues

sync
Synchronization, 5361 lines
sub-modules: ReadWrite Lock, Wait Array

fil
Table Space Memory Cache, 5282 lines

rem
Records Abstraction, 4965 lines
sub-modules: Record Manager, Record Comparison Service

fsp
File Space Management, 4405 lines

ibuf
Insert Buffer, 4125 lines

ut
Utilities, 4113 lines
sub-modules: Vector, Random Numbers, Memory, List, Debug, Byte Manipulation, Work Queue

mem
Memory Management, 3598 lines
sub-modules: Memory Debug, Memory Pool

data
Data Element Abstraction, 2867 lines
sub-modules: Data Types

que
Query Graph, 2255 lines

mtr
Mini-transaction Buffer, 1967 lines
sub-modules: Mini-transaction Log

eval
SQL evaluator, 1603 lines
sub-modules: Stored Procedures

ha
Hash table, 1422 lines

mach
Machine Dependent Utilities, 1198 lines

fut
File Based Utilities, 951 lines
sub-modules: File Based List

read

Cursor Read, 788 lines

dyn
Dynamically Allocated Array, 560 lines

thr
Threads, 302 lines
sub-modules: Thread Local Storage

usr
Sessions, 163 lines

Monday, February 2, 2009

Portable condition variables in Mysql codebase

How can you write good multi-threaded code without using condition variables?

On most flavors of unix you have the pthread api which gives you pthread_cond_t structures and the related API functions. However on all version of Windows before Windows Vista and Windows Server 2008, the Windows API's did not come with any kind of condition variables. This makes it really hard to write multi-threaded code that works on both Windows and Unix systems especially if you want the code to behave in a similar way on all platforms.

Well Mysql developers solved this issue by writing their own version pthread_cond_t and related functions which can be found in the codebase in the files my_wincond.c. The code is #ifdef enabled on Windows platforms otherwise the standard pthread api is used. The implementation of condition variables in the mysql code base utilizes one CRITICAL_SECTION variable, and 3 arrays of Windows' Event objects. The code is short but hard to truly understand. Please do take a look at this code.

There have been several other attempts by people to create Windows versions of Condition variables, which you can find by searching the web. Almost always these solutions include notes about how the solutions are not perfect.

Thankfully Microsoft has released Condition variables as part of their latest OS versions which they are supporting as part of the OS and system APIs. I don't know if Mysql developers are going to add #ifdef's to use these new APIs when available but I sure think it would be a good idea.

Ivan