CMD: Aurynn Shaw's Blog http://www.commandprompt.com/blogs/aurynn_shaw/ Command Prompt Blog Feed: Aurynn Shaw's Blog Thu, 07 May 2009 09:56:09 -0700 www.commandprompt.com http://www.commandprompt.com/images/small_mammoth.gif Command Prompt Mammoth http://www.commandprompt.com/blogs/aurynn_shaw/ 46 46 http://www.commandprompt.com/blogs/aurynn_shaw/2009/05/long_running_request_handlers_and_python_garbage_collection/ Thu, 07 May 2009 09:56:09 -0700 Long Running Request Handlers and Python Garbage Collection http://www.commandprompt.com/blogs/aurynn_shaw/2009/05/long_running_request_handlers_and_python_garbage_collection/ While working on a <a href="https://projects.commentprompt.com/public/simpycity/">Simpycity</a> + <a href="http://pylonshq.com">Pylons</a> environment the other day, I noticed that my app was leaking Postgres connection handles. This is not behaviour you ever want to see, especially in software as vital as Simpycity. Investigation and testing demonstrated pretty conclusively <a href="https://projects.commandprompt.com/public/simpycity/pastebin?show=f5d51e358%0D">here,</a> and <a href="https://projects.commandprompt.com/public/simpycity/pastebin?show=f41b90427%0D">here</a> that the Python garbage collector was *not* immediately cleaning up dead/unreferenced objects. Specifically, in the test case below, <pre> def foo(): r = Raw("SELECT count(*) as cnt, usename, current_query FROM pg_catalog.pg_stat_activity GROUP BY usename,current_query ORDER BY cnt DESC") return r() foo() </pre> when executed in the Python interactive interpreter, a dangling (unassigned) ResultSet object in remained in memory. Even though no references to the object exist, and it should be cleaned up, it will *not* be cleaned up until another garbage collection event is triggered. Calling locals(), globals() or by forcibly executing the garbage collection via import gc; gc.collect() will clean up the stray ResultSet as expected. Model-based objects exhibit the same behaviour. As the Postgres handles are being created (by default) READ COMMITTED, they are left in the state "<IDLE> in transaction". This part can be handled via calling .commit() on all handles, but, still prone to the leaking/cleanup issue. At its core, the leaking issue stems from design choices behind Simpycity; specifically, abstracting away the connection management logic leaves us in the position where automatic management cannot depend purely on the Python garbage collector for handle cleanup. This lifecycle issue affects Simpycity under Pylons, as well. By relying on the Python garbage collection, a request's Postgres connections fall out of scope, but are not reaped until *later*, when the mod_wsgi subprocess accepts and handles a new request. Once we realized how this happens in the interactive interpreter, we saw that in an environment such as mod_wsgi or mod_python, with multiple active subprocesses, lurking/hanging sessions will inevitably occur. When Lacey and I studied <a href="http://www.sqlalchemy.org/">SQLAlchemy</a>, it appeared vulnerable to the same behaviour; Reading over their pooling logic, as well as examining a Pylons app configured to use SQLAlchemy in the Paster template showed they are *explicitly* cleaning up all DB connections after every Pylons request. This was implemented using a <pre> try: ... finally: ... </pre> block in the core Pylons controller, as well as the global initialization of a db connection during Pylons setup (see model/meta.py, model/__init__.py, lib/base.py in a default Pylons project setup as an example). As Pylons + SQLAlchemy is not our default environment, this was initially missed. Discussion with James Bennett (ubernostrum, irc.freenode.net) shows that the Django ORM has a similar issue. Django resolves this with an asynchronous message-passing system, which, at the end of a Request cycle, notifies the ORM backend that the opened connections should be terminated. Therefore: * Reliance on Python's garbage-collection cycle is fickle at best, especially in long-running, asynchronous processes such as Pylons. * Based on the SQLAlchemy mechanisms within Pylons, specifically the end-of-request session cleanup, it seems reasonable to implement a session manager for Simpycity. This would exist to only to handle long-running scenarios where automatic garbage collection would not be able to consistently run. The connection manager would need to do little more than maintain a list of all active/open handles, and offer a .cleanup() call to free all handles at the end of the request. Coupled with the explicit try-finally cleanup pattern that SQLAlchemy uses within the Pylons WSGIController, a connection manager would succinctly solve the problem. * This would maintain correct transactional integrity, because implicit commits are never issued to the PostgreSQL backend. * The leaking handle issues would vanish, because we know that the only handles used within Simpycity will be tracked within the connection management object, and correctly cleaned up after each request. * In short-run applications, this is largely a non-issue, as the exiting of the interpreter closes all extant handles automatically. As this is a subtle issue, we'll be providing an upstream patch for Pylons, to provide a Paster template specifically to set up a connection management environment for Simpycity. This will also be documented on the Simpycity <a href="https://projects.commandprompt.com/public/simpycity/wiki">wiki</a>, discussing how to set up a connection manager and use it for long-running programs. http://www.commandprompt.com/blogs/aurynn_shaw/2009/05/long_running_request_handlers_and_python_garbage_collection/ OpenSource PostgreSQL Python http://www.commandprompt.com/blogs/aurynn_shaw/2009/02/using_simpycity_in_pylons/ Tue, 10 Feb 2009 17:02:37 -0700 Using Simpycity in Pylons http://www.commandprompt.com/blogs/aurynn_shaw/2009/02/using_simpycity_in_pylons/ <strong>Project Design</strong> Simpycity's core philosophy is that the DBA is going to perform the vast majority of the schema design without the aid of a conventional ORM. This is a marked divergence from most other ORMs and database abstraction layers, and it has an impact on how your project should be designed. http://www.commandprompt.com/blogs/aurynn_shaw/2009/02/using_simpycity_in_pylons/ PostgreSQL OpenSource http://www.commandprompt.com/blogs/aurynn_shaw/2009/01/simpycity_a_quick_tutorial/ Wed, 21 Jan 2009 16:20:16 -0700 Simpycity: A Quick Tutorial http://www.commandprompt.com/blogs/aurynn_shaw/2009/01/simpycity_a_quick_tutorial/ Using Simpycity is as easy as its name suggests - quick, easy. Simple. In keeping with that philosophy, Simpycity offers 3 constructs for database access: Function, Query, and Raw. http://www.commandprompt.com/blogs/aurynn_shaw/2009/01/simpycity_a_quick_tutorial/ PostgreSQL OpenSource http://www.commandprompt.com/blogs/aurynn_shaw/2009/01/simple_postgresql_database_mapping/ Wed, 21 Jan 2009 16:16:14 -0700 Simple PostgreSQL Database Mapping http://www.commandprompt.com/blogs/aurynn_shaw/2009/01/simple_postgresql_database_mapping/ With everyone and their mothers trying to build the next awesome website and next amazing web service, you'll probably find that SQL databases are getting more and more popular. As it turns out though, writing complex software on databases is harder than it looks. Hard enough that database abstractions are growing in popularity, pushing "database agnosticism" and turning the database into a dumb, interchangable data store. As a database developer, trying to write app code in this environment is really frustrating. I know how to design a DB schema. I know how to abstract that schema with views and stored procedures. When I try to use an ORM? All it does is get in my way, especially when you lose most of the ORM functionality by going to raw SQL. Writing SQL straight against the PostgreSQL DB API isn't much better - I end up writing the same boilerplate connection code, return handling and other management that comes with working with low-level interfaces. Honestly, this is more trouble than it's worth. Simpycity changes that. Model definition? Unnecessary. One line to declare a function signature. One line to declare a simple query. One line to declare raw SQL. Returned rows are nothing but standard Python dicts. Got a model definition? All three support wrapping returned rows in any arbitrary model definition. Need to change your data model? Update your app by doing nothing more than adhering to the same API you developed the first time. A minor change in the database is a minor change in Simpycity. Instead of designing your entire API in terms of Python, you seamlessly connect your well-structured database to the Pythonic representation of your choice, cleanly and easily maintaining separation of logic and duty. It's simplicity. Here's how it works: <pre> &gt;&gt;&gt; from simpycity.core import Function &gt;&gt;&gt; function_def = Function("get_rows") &gt;&gt;&gt; results = function_def() &gt;&gt;&gt; function_def.query SELECT * FROM get_rows() &gt;&gt;&gt; function_def = Function("get_row",['id']) &gt;&gt;&gt; function_def.query SELECT * FROM get_row(%s) &gt;&gt;&gt; results = function_def(1) - or - &gt;&gt;&gt; results = function_def(id=1) </pre> That's it. That's Simpycity. Simple, easy, seamless mapping of PostgreSQL stored procedures to Python. Simpycity is a Commandprompt Open Source project, licensed under the terms of the Lesser GPL and you can get it from <a href="https://projects.commandprompt.com/public/simpycity/">https://projects.commandprompt.com/public/simpycity/</a> http://www.commandprompt.com/blogs/aurynn_shaw/2009/01/simple_postgresql_database_mapping/ PostgreSQL OpenSource http://www.commandprompt.com/blogs/aurynn_shaw/2007/10/recursive_plpgsql_lookup_table_and_ltree/ Mon, 08 Oct 2007 18:58:33 -0700 Recursive PLPGSQL, Lookup Table and Ltree http://www.commandprompt.com/blogs/aurynn_shaw/2007/10/recursive_plpgsql_lookup_table_and_ltree/ There are many reasons and needs to implement a tree structure inside of an SQL database. In our particular example today, we're using a basic knowledge base tree as an example of a small- to large-scale tree inside of an SQL database. Additionally, any sort of directory/file system layout would be a good match for these concepts. http://www.commandprompt.com/blogs/aurynn_shaw/2007/10/recursive_plpgsql_lookup_table_and_ltree/