Mammoth
  • |
  • Contact
  • |
CMD | Command Prompt, Inc. - PostgreSQL Solutions, Support & Hosting
  • |
  • |
  • |
  • |
  • |
Remembering to check the docs: Autovacuum
Posted Friday Mar 1st, 2013 10:06am
by Joshua Drake
| Permalink

I was on a call very late last night with a good customer. Well, it was very early. They were having some performance problems and we were talking through how to resolve them before the EST wake up. It is late, we are all tired and of course there are too many people on the call.


So what is the problem? The problem is they weren't running Autovacuum. Now many of my brethren would say, "HERESY!" but in reality there are good reasons not to run Autovacuum (although I would say not Autoanalyze). Autovacuum is unpredictable, and can cause performance problems. 99% of the time you should run Autovacuum but there is a 1% reason to consider other alternatives.


The point I trying to make here in my sleep deprived state is that Autovacuum can be turned on with just a reload. It does not require a restart. I swore up and down that it required a restart and I ended up being wrong. I am not sure why I thought it needed a restart.


So there you go folks. Tip of the day, "autovacuum = on" only needs a reload.


Tallyho read the docs!


Categories: Business, OpenSource, PostgreSQL, SQL

GNU and the FSF should be split up
Posted Wednesday Feb 6th, 2013 12:30pm
by Joshua Drake
| Permalink

The FSF should be broken up.


Yes, I really did just write that. I believe the the FSF no longer fulfills its
mission. Wait, let's back up a step. I can feel the torches started to be covered
in pitch and the frankenstein cry of, "kill the heretic" starting to rumble
through the old streets of the Free Software country. I am not here to say that
the FSF is useless or that it doesn't have purpose. I am not here to say that
Richard Stallman shouldn't continue on his political mission to save the world
from the use of rightfully produced and licensed closed source software.


What I am saying is that the FSF and GNU should separate and that this
separation will act as a catalyst to allow for both to complete its
mission in a more productive manner. That's right, fsf.org and gnu.org should
be two separate non profits, with two different boards. Yes, I am aware that
the two are one and it shall and always be. I am declaring that "for better or
worse" is now worse and a divorce is now in order.


I have the deepest respect for the GNU project. I write this blog entry largely
on software that would not be possible without GNU components. I run Linux (no,
not GNU/Linux). I run KDE 4.10. I write this in Kate, although I normally prefer
Joe. I run PostgreSQL. I run Pidgin, Thunderbird, Gimp, Google Chrome (no not
Chromium), Wine, Netflix Desktop, Python, LibreOffice and my music is playing
using Amarok. And this my fine Open Source (yes Open Source, not Free Software)
denizens is exactly why I think GNU should fork from FSF.


FSF/Richard Stallman is a political movement. A political ideal full zealotry.
It is uncompromising, unrelenting, stalwart and venerable. It has done a lot of
good, it continues to strive to do a lot of good. However...


"The Free Software Foundation (FSF) is a nonprofit with a worldwide mission to
promote computer user freedom and to defend the rights of all free software
users."[1]


On the other hand:


"The primary and continuing goal of GNU is to offer a Unix-compatible system
that would be 100% free software. Not 95% free, not 99.5%, but 100%. The name
of the system, GNU, is a recursive acronym meaning GNU's Not Unix a way of
paying tribute to the technical ideas of Unix, while at the same time saying
that GNU is something different. Technically, GNU is like Unix. But unlike
Unix, GNU gives its users freedom."[2]


As you can see, although both are inextricably intertwined but they are also
fundamentally different in their purpose. One is about freedom and rights
of users. The other is about developing Free Software.


It is my assertion that the continued political movement of the FSF is causing
the GNU Project to suffer from slow, politicized and in some ways arcane
development. Consider, what tools you use. How many of those tools are actually
from the GNU project? All the tools I previously listed, do they need GNU?
Absolutely. Are any of them from GNU? Only GIMP. I think you will find this is
the case with most modern Open Source (and Free Software) users.


It is time for developers not lobbyists to run GNU.



  1. http://www.fsf.org/about/
  2. http://www.gnu.org/gnu/about-gnu.html


Categories: Business, OpenSource, PostgreSQL, Python, SQL

Writing a custom conflict handler for Bucardo 4
Posted Monday Nov 26th, 2012 01:25pm
by Alexey Klyukin
| Permalink

Bucardo, an asynchronous multi-master replication system (or, maybe the asynchronous multi-master replication system for PostgreSQL, cause I know of no other actively developed ones), deals with replication conflicts by providing conflict handlers in a form of standard (built-in) or custom code procedures. The built-in conflict handlers resolve a conflict by taking a row from the source or the target databases (in Bucardo 4 only 2 masters are supported, called 'source' and 'target' below), or using a delta row timestamp to determine the winner; other options include skipping conflicting rows altogether or picking one at random, but they are not very useful if you need the data to be consistent between replicas. For more complex conflict resolution rules it's necessary to write a custom conflict handler and, since the documentation is really scarce on that matter, I've decided to show how to create a simple one.


Read more...

Categories: OpenSource, PostgreSQL

... and that was my last day
Posted Thursday Aug 2nd, 2012 09:12am
by Alvaro Herrera
| Permalink


This has been cooking for a while now, and now it's time to open it up:
July 31st, 2012 was my last day with Command Prompt, Inc.



I joined Command Prompt in October 2005.
Back then I wasn't a very prolific blogger,
it seems,
because it took me six months to
get this fact out.
I haven't improved much since then, even though boss Josh Drake kept telling me
to publish my thoughts and ideas on various PostgreSQL-related matters.



During my time with them I had the opportunity to work on many
interesting things.
I got a number of patches into PostgreSQL,
some of them thanks to some Command Prompt customer sponsoring it;
I worked on the Command Prompt proprietary PostgreSQL fork, Mammoth Replicator,
and on PL/php;
and I got to talk to very smart people,
learn a lot,
and generally have tons of fun.



I enjoyed my time with Command Prompt very much;
the colleagues I leave are very capable and knowledgeable.
I particularly have to thank Alexey Klyukin
with whom I shared thousands of hours of work
in all kinds of projects.



But I decided a needed a change,
so I'm leaving Command Prompt.
I had some job offers at PGCon, and I have already decided what I'm going to do;
I look forward to many more years of PostgreSQL hacking and bug-hunting.



Command Prompt has been growing in business and number of employees lately;
I hope that trend continues and that they have great success.
I heartfully thank Command Prompt very much for the great opportunity they gave me
all these years.



See you in pgsql-hackers.



Categories: Business, PostgreSQL

Fedora 17 not so easy PostgreSQL configuration
Posted Monday Jul 30th, 2012 02:07am
by Alexey Klyukin
| Permalink

I don't usually post rants here, but this one might be actually helpful to others, so let's make an exception. It will be related to installing PostgreSQL from distro-specific packages. I usually prefer setting PostgreSQL from sources, unlike the majority of users; nevertheless, I'm familiar with how popular distros, like Debian or Fedora, manage their PostgreSQL layouts. Or so I thought until today.

My task was simple: install PostgreSQL instance for testing on a fresh Linux box, use a non-standard port. The catch: the box was running a relatively new Fedora 17.

Read more...


Categories: OpenSource, PostgreSQL

The Write–Ahead Log
Posted Thursday Jul 19th, 2012 09:38pm
by Alvaro Herrera
| Permalink

WAL (acronym for Write–Ahead Log) is the mechanism that Postgres uses to implement durability of data changes in the face of abrupt system crashes. > >The WAL system's central concept is that changes to data files must be written to such data files only after those changes are guaranteed to have been logged to WAL, that is, when WAL records have been flushed to permanent storage. (This flushing is accomplished by the fsync system call or equivalent mechanism.) If we follow this procedure, we need not flush data pages to disk on every transaction commit, because we know that in the event of a crash we will be able to recover the database using WAL — any changes that Postgres determines not to have been written will be redone from the WAL records.


One of the major benefits of using WAL is a significantly reduced number of disk flushes, since only the log file needs to be flushed to disk at the time of transaction commit; in multiuser environments, commits of many transactions may be accomplished with a single flush of the transaction log file. Furthermore, the log file is written sequentially, and so the cost of flushing it is much less than the cost of flushing the data pages, especially in instances where you have a lot of small transactions working on different parts of the database.


Checkpoints



Put simply, a checkpoint is a point in the transaction sequence at which all data files (where tables and indexes reside) are guaranteed to be written and flushed to disk to reflect the data modified in memory by regular operation. All those changes have previously been written and flushed to WAL.


There are two configuration parameters that determine the interval between checkpoints. The checkpoint_segments parameter determines the number of transaction log segments (16 MB in size apiece) that will be allowed to be filled before a checkpoint is forced. Similarly, at most checkpoint_timeout seconds will be allowed to elapse between one checkpoint and the next one, regardless of the number of segments in use.


The downside of having too much time/data between checkpoints is that in case of a crash, the time it takes to restore the data from the transaction logs is proportional to the number of segments to restore. Also, the space used in the pg_xlog/ subdirectory grows.


Spread checkpoints



Starting in Postgres 8.3, checkpoints are executed slowly, that is, instead of trying to write and flush pages as quickly as possible, an attempt is made to spread the writes so that the checkpoint itself does not consume all the available I/O bandwidth. Regular operation can, of course, continue while the checkpoint is in progress. In fact, users are happy with this delay behavior, because they observe that query time does not suffer from a latency spike during checkpoints, thanks to checkpointing leaving more I/O bandwidth available for them to use. Such latency spikes were quite noticeable in previous releases.


How much to spread checkpoints is configurable with the checkpoint_completion_target parameter; this value is a fraction of the checkpoint interval (in either number of segments or the timeout setting, whichever is sooner). For example, the default value of 0.5 means to take half the time of a checkpoint interval to do the writes.


Full page writes


Another consideration is that of “full page writes.” For the first change of a data file page (each 8kb unit) after a checkpoint, a full page image must be written to WAL. So the first change on a page after a checkpoint should be considered to be very expensive in terms of WAL space used, but subsequent ones before the next checkpoint occupy a comparatively small space in the log file.


This has one very important effect which is important to consider when configuring checkpoints: too small an interval between two checkpoints might mean that full page writes use almost all the available space in WAL before the next checkpoint, causing a new checkpoint to happen and thus a large number of full page images again. Increasing the checkpoint interval might mean a significantly reduced WAL traffic.



Full page writes can be turned off in postgresql.conf, but this is strongly discouraged; the circumstances in which it is safe to do so are extremely limited, and not completely understood.


References


All of the information found on this article can also be found in the official Postgres documentation: http://www.postgresql.org/docs/current/static/wal-configuration.html
and links therein. >


Categories: PostgreSQL, SQL

In considerations of closed source development
Posted Friday Jul 13th, 2012 01:21pm
by Joshua Drake
| Permalink

Open Source development has a lot going for it, as Bruce Momjian readily points out in a recent blog [1]. However, I believe he missed some key points that are positive for closed source development. Bruce asserts that with Open Source development the developers are the face of the software. That is true but certainly isn't always a good thing. There is a reason that the majority of software development, revenue
generation, and software developer employment is closed source (and no it
isn't because management and marketing are trying to keep the developer
down).


It is simple. Most of us Open Source developers aren't generally good with average people. We are good with our "breed" of people but move us out of our element and suddenly we can be awkward, offensive, and generally weird. We talk differently than other people, we have inside humor that doesn't span directions, and are just as inclusive as the richest Skull & Bones society members. Is this bad? No, it is reality. Whenever you take a group of individuals who are on a different playing field than the average person you are going to end up in this situation.


The second point that Bruce states is that closed source users have very little interaction with users. I think this is misunderstood. To say that Open Source has more interaction with users is, in my opinion, completely false or is at least given much more weight than is reality. Ask any consultant: the majority of their customers have zero idea about the workings of the community, how to communicate with the community, or interactions with developers. Frankly, they don't want to. They have software to run, businesses to operate, and employees to pay.


This can be further illustrated by watching the community. It tooks PostgreSQL years longer than it should have to get replication, and the community is just now starting to look at logical replication, features that were available in closed source versions of PostgreSQL and as open source addons years ago. The users wanted integrated replication but the community wasn't willing to implement them at the time.


Please don't get me wrong, I love Open Source. I love Open Source development. Heck, the only closed source software I run is to play Civ5 occasionally. Everything else is Open Source but I do think that we need to keep perspective on what is going on in the very large world that does not involve Open Source. It is much bigger, in a lot of ways more productive, and employ smore people (a rarity in today's economy) than Open Source
could ever hope to.


1. http://momjian.us/main/blogs/pgblog/2012.html#July_2_2012


Categories: Business, OpenSource, PostgreSQL, Python, SQL

Binding PostgreSQL server to specific CPU cores in Linux
Posted Friday Jun 22nd, 2012 06:15am
by Alexey Klyukin
| Permalink

Recently we had a customer who was running PostgreSQL 8.2 on a 32 cores system with 64GB of memory. They were deploying this server in addition to the already running one with 24 total cores and 32GB of memory. PostgreSQL configuration has been adjusted for extra resources, the database has been partitioned roughly in half between the 2 servers and the queries running against both servers were similar.

Suprisingly, when compared to the old server, the extra resources didn't improve the performance. Quite the contrary, the load average and CPU utilization on a new system was much higher during load spikes, while the TPS number plummeted. After performing initial examination of their server (applying our Audit & Tune package) we've decided that 8.2 might itself become an issue. This version of PostgreSQL is outdated and no longer supported by the community. What was suggested is that 8.2 doesn't scale well for 32 cores. How can we verify that hypothesis? Since they were running a relatively modern Linux kernel (2.6.32, supplied with RHEL 6) we were able to take advantage of the interface provided by the taskset utility.

Read more...


Categories: PostgreSQL

Migrating hierarchical queries from Oracle to PostgreSQL
Posted Thursday Apr 26th, 2012 10:55am
by Alexey Klyukin
| Permalink

This is the second part in a series of blog posts describing PostgreSQL analogs of common Oracle queries



One of the most intricate Oracle specific constructions is "START WITH ... CONNECT BY". According to Oracle's documentation, the syntax is:
SELECT [query] [START WITH initial_condition] CONNECT BY [nocycle] condition.
This statement is commonly used to traverse hierarchical data in the parent-child order. It's easier to illustrate how it works with an example.



Consider a table that stores opponents moves in a game of chess. Each table row contain coordinates (in algebraic notation) of a single move by whites and the move in response by blacks, as well as a column that references a preceding move, making it possible to keep multiple continuations of a specific move for the post-game analysis.



CREATE TABLE moves(id integer, parent integer, white varchar(10), 
black varchar(10));


The following statements describe 2 variants of a very short game, the first one leading to the early checkmate (known as a scholar's mate), and the second one to the position where black successfully avoids being checkmated.


INSERT INTO moves VALUES(1, 0, 'e4', 'e5');
INSERT INTO moves VALUES(2, 1, 'Qh5', 'Nc6');
INSERT INTO moves VALUES(3, 2, 'Bc4', 'g6');
INSERT INTO moves VALUES(4, 3, 'Qf3', 'Nf6'); -- checkmate is avoided
INSERT INTO moves VALUES(5, 2, 'Bc4', 'Nf6');
INSERT INTO moves VALUES(6, 5, 'Qxf7#', NULL); -- blacks being checkmated

Let's build an Oracle query showing a sequence of moves that leads to the checkmate:


SELECT DISTINCT id AS final_move_id, 
LTRIM(SYS_CONNECT_BY_PATH(NVL(white,'')||':'||NVL(black,''),';'),';')||';' 
AS moves, LEVEL AS mate_in 
FROM moves WHERE white LIKE '%#' OR black LIKE '%#' START WITH id = 1 
CONNECT BY PRIOR id = parent;















FINAL_MOVE_ID

MOVES

MATE_IN

6

e4:e5;Qh5:Nc6;Bc4:Nf6;Qxf7#:;

4



The query instructs Oracle to look for a checkmate:



  • The search starts at the move with id = 1, as indicated in the START WITH clause, and considers all possible continuations that lead to a checkmate, denoted by the final '#' in the move's description.
  • Each move and its direct continuation, for instance, moves 2 and 5 represent the parent-child relationship, described by the PRIOR condition.
  • The search depth is stored in the LEVEL pseudo-column.

As a result, Oracle goes from one row to another only if the parent column of the new row contains the id of the current row, accumulating all visited rows in a result set. The SYS_CONNECT_BY_PATH clause produces a string out of the specified columns of the visited rows, connecting each (parent, child) pair by the designated character (';' in our case).



Being Oracle SQL extension, CONNECT BY is not available in PostgreSQL. Recent versions of PostgreSQL implement Common Table Expressions (CTE), SQL-standard way of dealing with hierarchical data. Here's one possible rewrite of the query above for PostgreSQL using recursive CTEs:
Read more...


Categories: OpenSource, PostgreSQL, SQL

Cool and Sexy: Open Source PostgreSQL enterprise contenders
Posted Friday Apr 20th, 2012 10:43am
by Joshua Drake
| Permalink

As with any healthy project, there will be offshoots and people will take the source, fork it and try to create something new, better, different or just.... How that person feels it should be. This is a good thing, it leads to new ideas, new communities and sometimes truly interesting pieces of software.


Postgres-XC has been around for a while, it is primarily developed by NTT and EnterpriseDB. It has a small community but a dedicated engineering/hacker backing. Postgres-XC is interesting because it keeps reasonably up to date with the latest Postgres (1.0 is set to be based on 9.1 of PostgreSQL) but provides a shared nothing clustering architecture. This type of infrastructure is one of the holy grails of web based applications.


Should Postgres-XC deliver on its promises (hint: it does), you will be able to scale out (as opposed to up which Postgres already does extremely well) at an almost 1 to 1 ratio. This means that instead of having to purchase 2 large machines at 10-12k a piece you could purchase 4 machines at 1.5k a piece and achieve similar performance (theoretically, I need to test this). It also means that scaling out in the "cloud" will be easier.


I invite everyone interested in PostgreSQL to take a look at Postgres-XC. It is going to 1.0 soon and it needs community members to help flesh out the warts that haven't been found yet.


Another Postgres fork that has recently appeared is tPostgres. tPostgres (doesn't that look wrong at the beginning of a sentence?) is set to do to Microsoft SQL what EnterpriseDB did to Oracle, with one minor, small, interesting, exception: tPostgres is Open Source. Further Microsoft SQL is more in line with PostgreSQL in the types of workloads you usually see it performing. Imagine a tPostgres with Postgres-XC. Imagine an open source way to easily port Microsoft SQL apps to PostgreSQL.


Now don't get me wrong, the latest versions of Microsoft SQL are actually good products. Yes, I did just say that. However, they are not Open Source, they are expensive (comparatively) and let's get real, we want everyone to run Postgres.


Unfortunately tPostgres is only just announced and they are literally at the beginning of building their community but as it is being initiated by Denis Lussier (co-founder of EnterpriseDB), I imagine that he will come through with something very interesting indeed.


Categories: Business, OpenSource, PostgreSQL, Python, SQL


Copyright © 2000-2013 Command Prompt, Inc. All Rights Reserved. All trademarks property of their respective owners.