Saturday, July 16, 2016

My PL/pgSQL code is stuck, but where?

In the short time that I am enhancing the PL profiler (https://bitbucket.org/openscg/plprofiler) I have been asked multiple times if it can be abused as a debugging aid. Not directly. The conversation goes something like this:

Q: When my PL code is stuck somewhere, can I turn on profiling and see where?
A: No.
Q: Will this feature be added?
A: No.

Of course would that be a useful feature. I don't argue that. And it seems to be that this is precisely how Oracle users find out where their PL/SQL code gets stuck or moves like molasses. However, the reason why I am against adding this is because a profiler, or even parts of it, should not be enabled 24x7 on a production database server for the purpose of eventually using it to extract some debugging information some day. There are other ways to get that information and I will show you one of them.

If a program is stuck somewhere, one uses a debugger to get a backtrace. This works with C programs, like the database server, provided that symbol information is available. In that backtrace (and some variables) we also find the PL/pgSQL backtrace. Having symbol information available is also useful in case of a program crash, to find out why it crashed by loading the core dump into the debugger. 

Every line of PL code, that is executed, goes through the PL/pgSQL executor function exec_stmt(). At that place we find the current PL code's line in a variable. Likewise every function execution goes through either plpgsql_exec_function() or plpgsql_exec_trigger(). In those stack frames we find the OID of the function as well as its signature (name and call argument types).

Doing this eventually several times for a deeply nested PL/pgSQL program is tedious and no fun. So here is a little script called plbacktrace (https://github.com/wieck/plbacktrace.git) that does all of that. It is invoked with the PID of the PostgreSQL backend and will output information like



[postgres@db1 tmp]$ ./plbacktrace.py 13239
fn_oid=105672 lineno=5 func="life_get_cell_pl(life_board,integer,integer)"
fn_oid=105673 lineno=12 func="life_num_neighbors_pl(life_board,integer,integer)"
fn_oid=105675 lineno=11 func="life_tick_pl(life_board)"
fn_oid=105674 lineno=22 func="life_pl(integer,integer,boolean)"
 I will explain another time why I have a database server playing Conway's Game of Life written in PL/pgSQL. 


Regards, Jan

3 comments:

  1. I wish --enable-debug was the default...

    ReplyDelete
    Replies
    1. Having debug symbols by default encourages bad behavior. You can freeze a whole production database server by attaching to a backend with gdb just at the wrong moment. Doing that in a script for only a fraction of a second may be tolerable. Attaching with GDB manually to a live production system is certainly not.

      Delete