Mammoth
  • |
  • Contact
  • |
CMD | Command Prompt, Inc. - PostgreSQL Solutions, Support & Hosting
  • |
  • |
  • |
  • |
  • |
Along the lines of GCE, here are some prices
Posted Thursday Sep 18th, 2014 01:38pm
by Joshua Drake
| Permalink

I was doing some research for a customer who wanted to know where the real value to performance is. Here are some pricing structures between GCE, AWS and Softlayer. For comparison Softlayer is bare metal versus virtual.


GCE: 670.00

16 CPUS

60G Memory

2500GB HD space


GCE: 763.08

16 CPUS

104G Memory

2500GB HD space


Amazon: 911.88

16 CPUS

30G Memory

3000GB HD Space


Amazon: 1534.00

r3.4xlarge

16 CPUS

122.0 Memory

SSD 1 x 320

3000GB HD Space


Amazon: 1679.00

c3.8xlarge

32 CPUS

60.0 Memory

SSD 2 x 320

3000GB HD Space


None of the above include egress bandwidth charges. Ingress is free.


Softlayer: ~815 (with 72GB memory ~ 950)

16 Cores

RAID 10

4TB (4 2TB drives)

48GB Memory



Softlayer: ~1035 (with 72GB memory ~ 1150)

16 Cores

RAID 10

3TB (6 1TB drives, I also looked at 8-750GB and the price was the same. Lastly I also looked at using 2TB drives but the cost is all about the same)

48GB Memory


Categories: Business, OpenSource, PostgreSQL, Python, SQL

GCE, A little advertised cloud service that is perfect for PostgreSQL
Posted Monday Sep 15th, 2014 09:48am
by Joshua Drake
| Permalink

Maybe...


I have yet to run PostgreSQL on GCE in production. I am still testing it but I have learned the following:



  1. A standard provision disk for GCE will give you ~ 80MB/s random write.
  2. A standard SSD provisioned disk for GCE will give you ~ 240MB/s.


Either disk can be provisioned as a raw device allowing you to use Linux Software Raid to build a RAID 10 which even further increases speed and reliability. Think about that, 4 SSD provisioned disks in a RAID 10...


The downside I see outside of the general arguments against cloud services (shared tenancy, all your data in a big brother, lack of control over your resources, general distaste for $vendor, or whatever else we in our right minds can think up) is that GCE is current limited to 16 virtual CPUS and 104GB of memory.


What does that mean? Well it means that it is likely that GCE is perfect for 99% of PostgreSQL workloads. By far the majority of PostgreSQL need less than 104GB of memory. Granted, we have customers that have 256GB, 512GB and even more but those are few and far between.


It also means that EC2 is no longer your only choice for dynamic cloud provisioned VMs for PostgreSQL. Give it a shot, the more competition in this space the better.


Categories: Business, OpenSource, PostgreSQL, Python, SQL

PDXPGDay 2014
Posted Monday Sep 8th, 2014 11:54am
by Joshua Drake
| Permalink

I had the honor of being asked to give the introduction at PDXPGDay 2014 this past Saturday. I didn't speak very long but it was great to see a lot of the old stomping ground. It had been quite some time since I had been in the group of Wheeler, Roth, Wong, Berkus and a few others.


The conference was really a mini-conference but it was great. It was held in the exact same room that PostgreSQL Conference West was held all the way back in 2007. It is hard to believe that was so long ago. I will say it was absolutely awesome that PDX still has the exact same vibe and presentation! (Read: I got to wear shorts and a t-shirt).


Some items of note: Somebody was peverse enough to write a FUSE driver for PostgreSQL and it was even bi-directional. This means that PostgreSQL gets mounted as a filesystem and you can even use Joe (or yes VIM) to edit values and it saves them back to the table.


Not nearly enough of the audience was aware of PGXN. This was a shock to me and illustrates a need for better documentation and visibility through .Org.


The success of this PgDay continues to illustrate that other PUGS should be looking at doing the same, perhaps annually!


Thanks again Gab and Mark for entrusting me with introducing your conference!


Categories: Business, OpenSource, PostgreSQL, Python

apt.postgresql.org... a wonderful if flawed apt repository
Posted Wednesday Sep 3rd, 2014 10:52am
by Joshua Drake
| Permalink

The site apt.postgresql.org is a great resource for those who live in the Debian derived world. It keeps up to date with the latest postgresql packages and has a whole team dedicated to creating these packages. Of course, this is the Open Source world so not everyone agrees 100% with the way things are done in this project. As I noted here, there are some issues.


These issues are not to detract from otherwise excellent work but a note to those who use the repository to look for further problems. I also have a video displaying specifically what the issues are, here.



Categories: Business, OpenSource, PostgreSQL, SQL

Kicking the Donkey of PostgreSQL Replication
Posted Tuesday Feb 4th, 2014 12:50pm
by Joshua Drake
| Permalink

This is the title of a talk I am developing for the matured PostgreSQL Conference:
PGConf NYC 2014 . Formerly a PgDay, this is now a full blown conference extending
two days with three tracks. From all reports it is set to be the largest
PostgreSQL Conference ever in the United States, surpassing even the old West and
East series (which no conference in the U.S. has done to date). It is truly exciting
times for our community.


This talk will be a departure from my standby talks of PostgreSQL Performance and
Choosing the right hardware. Katz asked me, "to bring your full East Coast from the West Coast personality.". I plan on doing so. So cinch up the boot straps it is going to be a ride. In classic JD style I am going to be blunt and to the point about the good, the bad, and the, "WTH were they thinking" of PostgreSQL
Replication.


So outside of personality what am I really trying to deliver to the community? I
think the description of the talk says it all:



  • Have you ever wondered how to configure PostgreSQL Replication?

  • Have you ever wondered how to configure PostgreSQL Log Shipping?

  • Have you ever wondered: Which one is best for my application?

  • Are you aware of the pitfalls of Replication? Where it breaks? When it will
    act in a way that is counter-intuitive? Do you know how to fix it?

  • Do you know how to monitor it?


  • If you have asked yourself any of these questions, this is the talk for you. We
    will step through PostgreSQL replication, the technology involved, the
    configuration parameters and how to configure it in a manner that isn't fragile.
    We will also cover gotcha's how to prepare for them and understanding what replication
    is doing.


    This talk is not not for the faint of heart. I will take a no holds barred approach
    and give you the real deal, the dirt and the gold that is one of the most sought
    after PostgreSQL features.


    On a closing note, this conference is showing how a PostgreSQL User Group, combined with the resources of United States PostgreSQL can help grow and educate the community. They don't just help NYCPUG, they also help PDXPUG, PHILLYPUG and SEAPUG. Maybe your PUG should consider working with PgUS? If so, give "Jonathan S. Katz" [jonathan.katz {@} excoventures.com] a jingle.


    Categories: Business, OpenSource, PostgreSQL, Python, SQL

    Security Considerations While Using ssh-agent.
    Posted Tuesday Nov 26th, 2013 03:18pm
    by Ildefonso Camargo
    | Permalink

    Recently we got contacted by a customer, and started the typical remote access setup: ask them to open ssh port for our bastion server, create a new user for us, and add the corresponding ssh public key. Well, they promptly responded, and we tested access, got into their bastion server and then tried to get to one of the broken servers but, it was asking for a password. We just asked the customer for the password, and got a reply stating that they had assumed we were using agent forwarding.



    The first instinct of the engineer handling the ticket was: well, lets just start ssh-agent, load the key, and forward it! Of course, after a brief discussion, it became clear it was a bad idea.



    It is well known that agent forwarding (forwarding of the ssh authentication agent connection) can be considered, under certain conditions, a security risk, because when you do that a socket is created on the remote host, and any user with access to the socket would be able to authenticate using the keys loaded in the agent.



    Some of you could still ask: what is the risk?, after all, the agent connection gets forwarded over ssh, the socket permissions are very restrictive, and the agent will not expose the keys, not even encrypted, it will only answer to authentication requests. Furthermore, when you read ssh-agent man page, it sounds reassuring:



    (...)"The idea is that the agent is run in the user's local PC, laptop, or terminal. Authentication data need not be stored
    on any other machine, and authentication passphrases never go over the network. However, the connection to the agent
    is forwarded over SSH remote logins, and the user can thus use the privileges given by the identities anywhere in the
    network in a secure way."(...)


    No wonder a lot of people trust ssh-agent so much.



    But this is only partially true. The reality is that any user with enough privileges (root-level access, or access to the same user you are using to login) on the remote host will be able to use your forwarded agent through the local socket, and thus authenticate using the keys you loaded into the agent, potentially gaining access to other servers you manage, for as long as you keep your session open. It is worth noting that the access is limited to *using* the keys, but not fetching them, ie, the agent doesn't allow access to the private keys, but only answers authentication requests.



    Because of this, it is not a good idea to forward the agent to a server you don't trust (either because you don't know and/or don't trust the other administrators, or because you can't tell for sure if the server is compromised). If in doubt, don't do it, if you really have to do it, limit the keys you add to the agent.



    In general, assume that every key you add to the agent will be used by someone or something (bot) on the remote server, ask yourself if you know who will use the key and if you are okay with that.



    Now, back to our customer situation. Not having a lot of time to argue or consider alternatives (customer reported an emergency), we just decided to create a new key for this customer, and use what they wanted: agent forwarding. This worked just fine, and allowed us to get into all the failing servers comfortably, and without any risks.



    The first thing we had to do was create the key, something like this:



    ssh-keygen -t rsa -f ~/agent_keys/new_key-id_rsa


    Running ssh-keygen, and then typing in a new key destination would work too, but if you hit enter, you could unwillingly overwrite your default id_rsa file.



    Now that we generated the new key, to use it with the agent we need to:



    1. Start SSH agent (we don't run it by default), and have the agent run a shell.



    ssh-agent bash


    In my opinion, this is simpler than starting the agent and then exporting the variables, and has the advantage that the agent will automatically quit when you exit the shell, no need to kill the agent, and no risk that you forget to kill it.



    2. Add the key(s) you need to use (keys will be usable by anyone with access to the agent's socket, either on the local or remote server):



    ssh-add ~/agent_keys/new_key-id_rsa


    3. Connect to the remote host, enabling agent forwarding, likely using the -A parameter:



    ssh -A remote_host


    Of course, if you modified ~/.ssh/config file to include "ForwardAgent yes", the -A parameter would be unnecessary. Remember to never set ForwardAgent to "yes" globally, unless you really know what you are doing.



    4. When you are done, just exit the shell, and agent will automatically quit.



    Most problems have different solutions, and this one is not an exception. We decided to use this solution, because it was aligned with what the customer expected, and was fast to implement.



    I hope this little post helps increase the awareness of the associated risks of doing agent forwarding and that, in the end, helps sysadmins make informed decisions with security in mind.



    Until next time!



    Categories: OpenSource

    Managing pg_hba.conf With Ansible
    Posted Thursday Nov 14th, 2013 01:06pm
    by Ivan Lezhnjov
    | Permalink

    pg_hba.conf is perhaps one of the easiest to understand configuration files in PostgreSQL. Its syntax is straightforward, the concept seems to resemble that of any popular IP filter or ACL mechanism in various software packages. pg_hba.conf is also well documented, like the rest of PostgreSQL, and we love it because it lets us do what we want without getting in our way. What else could we possibly ask of it?



    Perhaps, it can be a bit of a nuisance when for you pg_hba.conf means not just one file but many. You may have a dozen read-only standbys and your infrastructure is expanding -- a good sign that you're probably all the rage on the market -- so at one point you may realize you need to modify pg_hba.conf contents and introduce some changes.



    Well, there a number of ways to do that, and if it is just one line that needs to be added, or edited, it is very tempting to just go ahead and do it. Manually. However, what we often forget blinded by this illusion of simplicity of a task at hand is the hidden extra toll of additional actions that need to be carried out. Like reloading PostgreSQL to apply changes made in pg_hba.conf. And thus the amount of manual labor has just doubled, and if you made a mistake, even a teeny tiny typo, you might need to do it again and the word would be then quadrupled and it would most likely come with a bonus of some frustration in the air that usually accompanies broken illusions.



    We at Command Prompt, Inc. have been increasingly relying on ansible to solve this very problem. And we would like to give you an idea about how empowering this rookie of configuration management can be. If you have never heard of ansible before, think chef or puppet. It is similar, but in many ways a different beast.



    Ansible can be used to do all sorts of things. For example, we successfully deployed backups software for one of our customers, distributed SSH public key to configure access without a password to a dozen of hosts (all under 15 minutes including writing a playbook to do this), installed unattended-upgrades and manage its configuration files and a blacklist, deployed various in-house custom reports scripts that require distribution of executables, configuration files and cron jobs, applied SQL files to PostgreSQL DB, automated bulk of tedious work when deploying a monitoring system, continuously update and deploy a base toolset that we use on all servers that we own or manage (tools like sar, atop, molly-guard, tree, etc.) and so on.



    It is typically run in two modes: playbooks and ad-hoc. Playbooks are just (YAML) files that have a list of tasks that are run in successive order. Much like what happens in a BASH script. Ad-hoc mode is a way to call ansible to run just one task at a time directly on your terminal's command line. Combined they are incredibly useful not only for massive and rapid deployment, but also in day-to-day work of a system administrator.



    And as a system administrator or DBA, you can actually update your pg_hba.conf (or a slew of them) with a touch of one command. Perhaps one of the best things about using ansible and having actual playbooks is that they can be run against your inventory of hosts (yeah, ansible parlance) again, and again, which in effect means that you can always ensure that your current, actual configuration is exactly what it was initially intended to be as expressed in a playbook (in configuration management club this is usually talked about as configuration drift and idempotency).



    So, let's take a look at this default pg_hba.conf in PostgreSQL 9.1 on Ubuntu Server 12.04 Precise:



    $ sudo grep "^[a-z]" /etc/postgresql/9.1/main/pg_hba.conf
    [sudo] password for admin: 
    local   all             postgres                                peer
    local   all             all                                     peer
    host    all             all             127.0.0.1/32            md5
    host    all             all             ::1/128                 md5
    


    Suppose we want to add a new entry for a host at 10.10.100.15/32, and apply this change to bazillion of hosts (yes, ansible is actually aware of this number and is fully capable of handling it).



    So, this is our new pg_hba.conf line



    host    all             cmd_zabbix_mon 10.10.100.15/32            trust
    


    Now we need a playbook with a task that will add this new line to pg_hba.conf. Before we create one, we need to make sure that all hosts where we want to see the change happen are in ansible hosts inventory file, /etc/ansible/hosts.



    In our example it is a very simple one:



    [localhost]
    127.0.0.1
    
    [allhosts]
    precise
    


    We are going to connect to allhosts, which is a name of a group of hosts with just one hostname in it -- precise. In fact, it is the same host where we are going to run ansible from.



    It is a good practice to run ansible from a control machine to SSH into managed nodes, i.e. the control machine is a separate system. In practice, you will probably end up managing the control machine too and thus need to SSH into it as well.



    Therefore in our example we're going to SSH into the same machine where ansible is run from, which will serve as an illustration that all you need to get started with ansible is a single, modest virtual machine.



    Anyway, allhosts could be also a very long list of hostnames, e.g.



    [allhosts]
    precise
    venus
    www.commandprompt.com
    db1.nondescript.org
    db-[2:140].nondescript.org
    


    So, with inventory all set, we can now write the playbook. Playbooks are YAML files and thus YAML syntax is something to have a good command of.



    Ansible is extremely flexible in various ways, and that is why there are no strict rules about how your playbooks should be organized. We usually find the following layout most convenient to work with



    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ tree
    .
    |-- file
    |-- setup
        |-- configure.yml
    |-- templates
    |-- vars
        |-- general.yml
    
    4 directories, 2 files
    


    file/ is used for executables, binaries and all manner of files that need to be copied over to managed nodes.



    setup/ contains playbooks



    templates/ is where we keep template files



    vars/ is used to hold special purpose playbooks that are used in essence as configuration files to separate variables apart from tasks (you'll see in a minute how this can be useful).



    So, here it is. You're looking at what technically speaking is called incorrectly here a pg_hba.conf playbook. To get our terminology straight, each file in setup/ is actually called a playbook, and the multitude of YAML files found in such directory tree is referred to as a collection of playbooks.



    Now, let's take a closer look at vars/general.yml



    ---
    
    #
    # General
    sys_usr: admin
    sudo_user: postgres
    
    #
    # Miscellaneous
    pg_hba_conf: /etc/postgresql/9.1/main/pg_hba.conf
    


    It has three variables:



    sys_usr: sets a username that is used to connect to managed nodes and to run tasks as
    sudo_user: is what user we want to run a task as via sudo on a managed node



    and finally



    pg_hba_conf: is a location of pg_hba.conf file



    In our example we are going to get a little fancy and have ansible edit pg_hba.conf as postgres user, by logging into the managed node -- precise -- and modifying pg_hba.conf as postgres via sudo.



    So, let's take a look at the most interesting and useful playbook file.



    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ cat setup/configure.yml 
    ---
    #
    # An example playbook to configure pg_hba.conf file 
    
    - hosts: allhosts
      vars_files:
       - ../vars/general.yml
      user: $sys_usr
      sudo_user: $sudo_user
      tasks:
    
       #
       # Adding a new line
       - lineinfile: dest=$pg_hba_conf regexp="^host\s+all\s+cmd_zabbix_mon\s+127\.0\.0\.1/32\s+trust$" insertafter="^#\sIPv4\slocal.+" line="host    all             cmd_zabbix_mon  127.0.0.1/32            trust"
    


    We say here that



    * we want to connect to each host in host group allhosts and run all tasks on each of those hosts

    * that our variables file is vars/general.yml

    * that we want to log in as $sys_usr

    * and then run tasks via sudo as $sudo_user



    Then we list our tasks. There is just one task that uses this monstrous looking thing that is called lineinfile: module. This mess of symbols says "Add new line as defined by line= after an existing line that starts with '# IPv4 local' text".



    I chose this particular example not to scare our readers but to try to clarify how this module that has this not so intuitive syntax, but is otherwise useful, works; because in all honesty, when I first encountered it I think spent many hours of futile attempts to just understand how to use it.



    Anyway, we'll talk more about lineinfile: a bit later. For now, just try to read what it says and you may notice that it uses regular expressions (Python flavor), and of course it gives you a headache. I understand, but it also is a great tool for making one line changes in configuration files that are not managed completely by ansible via templates.



    Alright, let's see some magic at work.



    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ ansible-playbook setup/configure.yml -kK
    SSH password: 
    sudo password [defaults to SSH password]: 
    
    PLAY [allhosts] *************************************************************** 
    
    GATHERING FACTS *************************************************************** 
    ok: [precise]
    
    TASK: [lineinfile dest=/etc/postgresql/9.1/main/pg_hba.conf regexp="^host\s+all\s+cmd_zabbix_mon\s+127\.0\.0\.1/32\s+trust$" insertafter="^#\sIPv4\slocal.+" line="host    all             cmd_zabbix_mon  127.0.0.1/32            trust"] ***
    changed: [precise]
    
    PLAY RECAP ******************************************************************** 
    precise                    : ok=2    changed=1    unreachable=0    failed=0
    


    I am personally a bit paranoid when it comes to security. I admit it, but I also know that our internal wiki has a very eloquent statement in regards to what may ensue should we as a team manage to screw up our customers environment. Just for the record, we are never abused here at CMD, only loved. We are all very friendly and positive people, and still, even though it is very convenient to rely on SSH keys to get into systems without having to enter a password, when it comes to ansible I choose to consciously type in a password each time I run a playbook. Especially, when the numbers of systems involved are large. Call it self-inflicted torture, but I'd rather think twice before letting ansible cause damage at the speed of the bullets flying out of a machine gun.



    That being said, to run an ansible playbook you run ansible-playbook command by telling it what playbook file you want it to process. While -k and -K indicate that you want to be asked for user: and sudo_user: passwords respectively.



    Once the passwords have been entered and submitted, ansible does its usual routine: it tries to SSH into allhosts, in our case just one but could be a bunch, gather some facts about a system that actually bubble up into a playbook and can be later used to uniquely identify your systems. This allows to apply some logical conditions, or decide to skip certain tasks for some of the hosts, or even types of systems. It then proceeds by announcing what task is going to be played and follows up immediately after that with results: either success, failure or nothing has been changed (ah, idempotency!). A play summary report is also shown in the end to indicate how many tasks resulted in what state, whether any of the hosts were unreachable during the attempted play and whether there were any failures.



    In this case our task was successfully run, so let's see what happened to pg_hba.conf



    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ sudo grep "^[a-z]" /etc/postgresql/9.1/main/pg_hba.conf
    [sudo] password for admin: 
    local   all             postgres                                peer
    local   all             all                                     peer
    host    all             cmd_zabbix_mon  127.0.0.1/32            trust
    host    all             all             127.0.0.1/32            md5
    host    all             all             ::1/128                 md5
    


    Alright, we just added a new line. Here's a truncated version of the pg_hba.conf that shows that the line was added indeed after one that started with "# IPv4 local"



    ...
    # "local" is for Unix domain socket connections only
    local   all             all                                     peer
    # IPv4 local connections:
    host    all             cmd_zabbix_mon  127.0.0.1/32            trust
    host    all             all             127.0.0.1/32            md5
    ...
    


    Let's run the same playbook again and see what happens:



    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ ansible-playbook setup/configure.yml -kK
    SSH password: 
    sudo password [defaults to SSH password]: 
    
    PLAY [allhosts] *************************************************************** 
    
    GATHERING FACTS *************************************************************** 
    ok: [precise]
    
    TASK: [lineinfile dest=/etc/postgresql/9.1/main/pg_hba.conf regexp="^host\s+all\s+cmd_zabbix_mon\s+127\.0\.0\.1/32\s+trust$" insertafter="^#\sIPv4\slocal.+" line="host    all             cmd_zabbix_mon  127.0.0.1/32            trust"] ***
    ok: [precise]
    
    PLAY RECAP ******************************************************************** 
    precise                    : ok=2    changed=0    unreachable=0    failed=0 
    


    Nothing changed. This is expected and is a good thing. Here you see idempotency in action (shouldn't I say inaction instead?). In other words when the changes that are expressed by a sum of playbook tasks are already in place, ansible will not needlessly try to apply them again. If this new line that we have just added disappeared, or were edited in some fashion, just run the playbook against this host one more time and ansible will either re-add the line, or add a new one respectively (leaving modified line intact; read more on this behavior below).



    To illustrate what is going to happen here I changed the line in pg_hba.conf that we had added in a previous run of the playbook, and then I ran it again. This is the result:



    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ sudo grep "^[a-z]" /etc/postgresql/9.1/main/pg_hba.conf
    local   all             postgres                                peer
    local   all             all                                     peer
    host    all             cmd_zabbix_mon  127.0.0.1/32            trust
    xhost    all             cmd_zabbix_mon  127.0.0.1/32            trust
    host    all             all             127.0.0.1/32            md5
    host    all             all             ::1/128                 md5
    


    As our line was tampered with, and because of how our lineinfile: task's regexp is written, a new line was added again but the old, now modified line was left intact.



    Alright let's clean up pg_hba.conf of these two new lines and do something else with the help of lineinfile: module, namely change an existing line.



    We will build on our example by adding a new task.



    ---
    #
    # An example playbook to manage pg_hba.conf file 
    
    - hosts: allhosts
      vars_files:
       - ../vars/general.yml
      user: $sys_usr
      sudo_user: $sudo_user
      tasks:
    
       #
       # Adding a new line
       - lineinfile: dest=$pg_hba_conf regexp="^host\s+all\s+cmd_zabbix_mon\s+127\.0\.0\.1/32\s+trust$" insertafter="^#\sIPv4\slocal.+" line="host    all             cmd_zabbix_mon  127.0.0.1/32            trust"
    
       #
       # Replacing an existing line
       - lineinfile: dest=$pg_hba_conf regexp="^host\s+all\s+cmd_zabbix_mon\s+127\.0\.0\.1/32\s+trust$" insertafter="^host\s+all\s+cmd_zabbix_mon\s.+" line="host    all             cmd_nagios_mon 10.10.100.15/32            trust"
    


    Save the file, run the playbook. Here's the result:



    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ ansible-playbook setup/configure.yml -kK
    SSH password: 
    sudo password [defaults to SSH password]: 
    
    PLAY [allhosts] *************************************************************** 
    
    GATHERING FACTS *************************************************************** 
    ok: [precise]
    
    TASK: [lineinfile dest=/etc/postgresql/9.1/main/pg_hba.conf regexp="^host\s+all\s+cmd_zabbix_mon\s+127\.0\.0\.1/32\s+trust$" insertafter="^#\sIPv4\slocal.+" line="host    all             cmd_zabbix_mon  127.0.0.1/32            trust"] ***
    changed: [precise]
    
    TASK: [lineinfile dest=/etc/postgresql/9.1/main/pg_hba.conf regexp="^host\s+all\s+cmd_zabbix_mon\s+127\.0\.0\.1/32\s+trust$" insertafter="^host\s+all\s+cmd_zabbix_mon\s.+" line="host    all             cmd_nagios_mon 10.10.100.15/32            trust"] ***
    changed: [precise]
    
    PLAY RECAP ******************************************************************** 
    precise                    : ok=3    changed=2    unreachable=0    failed=0   
    
    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ sudo grep "^[a-z]" /etc/postgresql/9.1/main/pg_hba.conf
    local   all             postgres                                peer
    local   all             all                                     peer
    host    all             cmd_nagios_mon 10.10.100.15/32            trust
    host    all             all             127.0.0.1/32            md5
    host    all             all             ::1/128                 md5
    


    Wonderful, the line was first added as in our first example, then the username on this same line was changed from cmd_zabbix_mon to cmd_nagios_mon. In fact, we replaced an entire line, and that's how one should think about line editing with lineinfile: module. Needless to say, this could be done to any part of pg_hba.conf.



    And there you have it, two most frequent uses of lineinfile: module: adding a new line and replacing (or editing) an existing one.



    If it seems a bit confusing and hard to swallow, it actually kind of is. Regardless, here's a little something to help you, an outline of the path to mastery with lineinfile: module.



    First and foremost, know specifics of Python regular expressions. Then know thy enemy.



    Here are the essentials.



    To insert a new line your regexp= and line= must match, and insertafter= must indicate a line after which you want your new line to appear. Subsequent runs will not make any further changes if a line already exists. If a line does not exist, a new line will be added. How you define if a line exists is irrelevant, understand how lineinfile: defines your reality.



    If regexp= and insertafter= match and the line they describe exists it will be replaced with line=. Subsequent runs will not make any further changes. If a line you match does not exist no changes will be made.




    Note: However, if regexp= is empty last line of your file will be replaced with line=, or it may well be some other unexpected place! Bottom line is, be extra cautious when working with lineinfile: unless you're confident beyond any shadow of a doubt and you know exactly what changes will happen. It's a tricky one to work with, so test multiple times in your lab before you use lineinfile: tasks in production environment.




    lineinfile: is great -- because it is the only module that performs line editing so far? -- to edit just one line at a time, but let's face it, sometimes its syntax may be really confusing. Especially when your lines are a full of funky characters that need escaping. It is still worth the effort to learn how to use it, because I could testify that it proved very helpful in numerous instances of my work as system administrator.



    With that in mind, there are less confusing ways to manage your pg_hba.conf with ansible.



    Still, a very attentive reader may have noticed that we never reloaded PostgreSQL configuration and thus all the changes to pg_hba.conf have not taken effect yet.



    You have an idea about the playbook mode now. Let's reload PostgreSQL configuration by running ansible in ad-hoc mode.



    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ ansible allhosts -m service -a "name=postgresql state=reloaded" -kK
    SSH password: 
    sudo password [defaults to SSH password]: 
    precise | success >> {
        "changed": true, 
        "name": "postgresql", 
        "state": "stopped"
    }
    


    That was nice and easy, wasn't it? The syntax is very simple:



    * we already know what allhosts is for

    * -m service says that we are going to use module named service

    * -a is used to pass parameters to the service module

    * name= and state= and the service module parameters

    * -k and -K ask us for our credentials



    Note that we reloaded PostgreSQL as root via sudo. Unless a different sudo user is specified on command line or a sudo_user: variable is used when using a playbook, ansible assumes that you want to run commands as root. In other words, this is default sudo command behavior.



    Ad-hoc mode is often something we use in a new environment where we set up ansible. One of the things it helps to do is gather some facts about the systems we are going to manage. For example, for the purposes of being able to refer to any of the managed hosts in playbooks by their hostname, and then taking actions based on the knowledge of what hosts ansible is currently running tasks on, one of the first things that we do is establishing how ansible sees the managed hosts hostnames. In practice, we learned, they may differ from what you might expect solely from the contents of /etc/hostname and /etc/hosts, or BIND configuration (it really depends on how well your systems are configured).



    To that end we use setup module. Its purpose is to gather facts about hosts and print them on stdout:



    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ ansible allhosts -m setup -k |grep fqdn
    SSH password: 
            "ansible_fqdn": "precise.localdomain",
    


    Here we just selected to show only FQDN. Toss out grep and re-run the command to see a complete list of facts for your host(s).



    In ad-hoc mode you can do other, more general yet equally useful things. For example, see if a user is in /etc/sudoers file on all hosts:



    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ ansible allhosts -m command -a "grep ansible /etc/sudoers" -kK
    SSH password: 
    sudo password [defaults to SSH password]: 
    precise | success | rc=0 >>
    ansiblemgr ALL=(ALL) ALL
    


    or check if your managed nodes can still, or already, be logged in:



    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ ansible allhosts -m ping -k
    SSH password: 
    precise | success >> {
        "changed": false, 
        "ping": "pong"
    }
    


    And should you wish to reload PostgreSQL configuration from a playbook, just add the following extra task to the end of our example playbook:



       #
       # Reload PostgreSQL configuration
       - service: name=postgresql state=reloaded
    


    Save the file, re-run the playbook. Assuming pg_hba.conf was not previously modified, the results thus would be like this:



    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ ansible-playbook setup/configure.yml -kK
    SSH password: 
    sudo password [defaults to SSH password]: 
    
    PLAY [allhosts] *************************************************************** 
    
    GATHERING FACTS *************************************************************** 
    ok: [precise]
    
    TASK: [lineinfile dest=/etc/postgresql/9.1/main/pg_hba.conf regexp="^host\s+all\s+cmd_zabbix_mon\s+127\.0\.0\.1/32\s+trust$" insertafter="^#\sIPv4\slocal.+" line="host    all             cmd_zabbix_mon  127.0.0.1/32            trust"] ***
    changed: [precise]
    
    TASK: [lineinfile dest=/etc/postgresql/9.1/main/pg_hba.conf regexp="^host\s+all\s+cmd_zabbix_mon\s+127\.0\.0\.1/32\s+trust$" insertafter="^host\s+all\s+cmd_zabbix_mon\s.+" line="host    all             cmd_nagios_mon 10.10.100.15/32            trust"] ***
    changed: [precise]
    
    TASK: [service name=postgresql state=reloaded] ******************************** 
    changed: [precise]
    
    PLAY RECAP ******************************************************************** 
    precise                    : ok=4    changed=3    unreachable=0    failed=0
    


    You may want to be looking at PostgreSQL log file when running playbook this time, just to make sure you see if SIGHUP was indeed sent to the service:



    ...
    2013-11-14 07:16:17 PST LOG:  received SIGHUP, reloading configuration files
    


    Now, for the final piece of our ansible discussion let's talk about what I mentioned several minutes ago. Arguably, the easier and less messy way to manage pg_hba.conf. That is, by way of using templates.



    Ansible leverages Jinja2 templating engine. That said, it is very easy to learn how to use it by example. We are going to create another playbook to deploy our new, template based pg_hba.conf.



    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ cp setup/configure.yml setup/deploy.yml
    


    After some editing we come up with this



    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ cat setup/deploy.yml 
    ---
    #
    # An example playbook to deploy pg_hba.conf file 
    
    - hosts: allhosts
      vars_files:
       - ../vars/general.yml
      user: $sys_usr
      tasks:
    
       #
       # Deploying a template
       - template: src=../templates/pg_hba.conf.j2 dest=$inst_dir/pg_hba.conf owner=postgres group=postgres mode=0640
    
       #
       # Reload PostgreSQL configuration
       - service: name=postgresql state=reloaded
    


    Let's break it down:



    * no sudo_user: this time, we do everything as root via sudo call

    * template: module takes our template file pg_hba.conf.j2 and after some processing produces a resulting pg_hba.conf in $inst_dir, which is defined in vars/general.yml. It also sets ownership and access permissions to postgres.postgres and 0640 respectively.



    Here's how the vars/general.yml looks like now



    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ cat vars/general.yml 
    ---
    
    #
    # General
    sys_usr: admin
    
    #
    # Miscellaneous
    pg_hba_conf: /etc/postgresql/9.1/main/pg_hba.conf
    inst_dir: /etc/postgresql/9.1/main
    


    And after that we create a template for pg_hba.conf



    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ cp /etc/postgresql/9.1/main/pg_hba.conf templates/pg_hba.conf.j2
    


    Copying a file and using it unmodified would be enough to create a template. You could go ahead with just this template and deploy to as many hosts as you need. Sometimes this approach is useful, but more often than not you will want slightly different pg_hba.conf files on different hosts. Sometimes, perhaps, you will want your pg_hba.conf to be very different on many hosts. At that point you may be better off using several templates to avoid unnecessary complexity and to continue to maintain ease of readability of each individual template file.



    So, after some editing our example pg_hba.conf.j2 template looks like this. This is a truncated version. As you've just seen we used stock pg_hba.conf to create initial version of a template file, and thus anything that appears before the listed below truncated part of template code remains unmodified.



    ...
    
    # DO NOT DISABLE!
    # If you change this first entry you will need to make sure that the
    # database superuser can access the database using some other method.
    # Noninteractive access to all databases is required during automatic
    # maintenance (custom daily cronjobs, replication, and similar tasks).
    #
    # Database administrative login by Unix domain socket
    local   all             postgres                                peer
    
    # TYPE  DATABASE        USER            ADDRESS                 METHOD
    
    # "local" is for Unix domain socket connections only
    local   all             all                                     peer
    # IPv4 local connections:
    
    {% if ansible_fqdn == "precise.localdomain" or ansible_fqdn == "venus.localdomain" %}
    host    all             cmd_zabbix_mon 10.10.100.15/32          trust
    
    {% elif asible_fqdn == "www.commandprompt.com" %}
    host    all             johnson 192.168.180.17/32               md5
    
    {%elif ansible_fqdn == "db1.nondescript.org" %}
    host    all             blackfox 172.16.12.9/32                 ident
    
    {% else %}
    host    all             cmd_zabbix_mon 10.10.100.12/32          md5
    {% endif %}
    
    
    host    all             all             127.0.0.1/32            md5
    # IPv6 local connections:
    host    all             all             ::1/128                 md5
    # Allow replication connections from localhost, by a user with the
    # replication privilege.
    #local   replication     postgres                                peer
    


    Our template is a bit fancy. It makes use of ansible facts (ansible_fqdn) and it has some conditional statements that will add a specific pg_hba.conf entry depending on where (what host) this template is being deployed to.



    Thus, precise.localdomain and venus.localdomain are going to get this line:



    host    all             cmd_zabbix_mon 10.10.100.15/32          trust
    


    and db1.nondescript.org this one



    host    all             blackfox 172.16.12.9/32                 ident
    


    If the ansible_fqdn fact does not match any of the hostnames specified a default choice of



    host    all             cmd_zabbix_mon 10.10.100.12/32          md5
    


    is going to be applied.



    You could go as far as use variables from vars/general.yml (and any other facts that are gathered prior to execution of playbook tasks) in a template. They would get expanded and make your template even more powerful.



    Now, let's run this new playbook. And before we do so, lets delete pg_hba.conf to simulate a situation where a pg_hba.conf has been lost, but PostgreSQL is still running just fine, safely keeping a cached version of the configuration file in its memory. It would be an imminent problem on the next service restart, but not if became aware of the problem in a timely manner and then ran our new playbook before PostgreSQL had to be restarted:



    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ ls -lah /etc/postgresql/9.1/main/pg_hba.conf
    -rw-r----- 1 postgres postgres 4.5K Nov 14 08:26 /etc/postgresql/9.1/main/pg_hba.conf
    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ sudo rm /etc/postgresql/9.1/main/pg_hba.conf
    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ date
    Thu Nov 14 08:30:58 PST 2013
    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ ansible-playbook setup/deploy.yml -kK
    SSH password: 
    sudo password [defaults to SSH password]: 
    
    PLAY [allhosts] *************************************************************** 
    
    GATHERING FACTS *************************************************************** 
    ok: [precise]
    
    TASK: [template src=../templates/pg_hba.conf.j2 dest=/etc/postgresql/9.1/main/pg_hba.conf owner=postgres group=postgres mode=0640] *** 
    changed: [precise]
    
    TASK: [service name=postgresql state=reloaded] ******************************** 
    changed: [precise]
    
    PLAY RECAP ******************************************************************** 
    precise                    : ok=3    changed=2    unreachable=0    failed=0
    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ ls -lah /etc/postgresql/9.1/main/pg_hba.conf 
    -rw-r----- 1 postgres postgres 4.5K Nov 14 08:31 /etc/postgresql/9.1/main/pg_hba.conf
    admin@precise:~/CMD/svn/ansible-tests/pg_hba.conf$ sudo grep "^[a-z]" /etc/postgresql/9.1/main/pg_hba.conf 
    local   all             postgres                                peer
    local   all             all                                     peer
    host    all             cmd_zabbix_mon 10.10.100.15/32          trust
    host    all             all             127.0.0.1/32            md5
    host    all             all             ::1/128                 md5
    


    As you can see pg_hba.conf was first removed and then recreated, and an appropriate entry for precise.localdomain host added while all the others within {% if %} ... {% endif %} conditional statement were ignored. Had there been more hosts online in my test lab, they all would've received a tailored version of pg_hba.conf according to the configuration of our template.



    This approach is easier on the eyes, less error prone, offers a lot more flexibility and allows you to tailor pg_hba.conf for many systems simultaneously using just one template file.



    Not only that, in a lot of cases ansible playbooks become a centrally managed configuration pool for various pieces of our infrastructure, with the added benefit of having an extra backup copy of the configs that prove useful not only in emergency when a configuration file is lost but even when we need to evaluate the existing configuration. Incredibly useful when you can look at all of the configs at once in just one file.



    Now, for the final touch, if you were to deploy this template in an environment that runs a mix of systems (Red Hat Enterprise Linux, Ubuntu Server, Debian GNU/Linux, etc. ) and you wanted to make changes only on hosts of a certain type, you can easily do so by adding an extra line to this example playbook:



    ...
       #
       # Deploying a template
       - template: src=../templates/pg_hba.conf.j2 dest=$inst_dir/pg_hba.conf owner=postgres group=postgres mode=0640
         when: ansible_distribution == "Debian" or ansible_distribution == "Ubuntu"
    ...
    


    Here ansible_distribution is an ansible fact, much like ansible_fqdn that we used in the pg_hba.conf template file, and as you've seen by now they can be used in various places.



    And this concludes our today's discussion about ansible as a tool to help you manage your PostgreSQL configurations. Much more could be said about it, this is truly just the tip of the iceberg. However, this should be able to help you see how you could be managing your PostgreSQL configurations more effectively.



    As I stated in the opening paragraphs we have been increasingly relying on ansible to do all sorts of things, so if you're asking yourself if you could start managing something with ansible, the answer is that you most probably can.



    Categories: PostgreSQL

    PgConf.eu is over, it was a blast but I am curious about the future
    Posted Wednesday Nov 6th, 2013 12:09pm
    by Joshua Drake
    | Permalink

    First let me say that I attended pgConf.eu like I attend every conference (that I am not running). I show up for a few hours on the first day, then I come back and attend my talk. I don't take travel lightly and as much as I bromance my fellow elephant bretheren, I want to explore the sights and this was freaking Ireland people.


    I had an odd feeling for the time I was there. The community was in full force, there was at least 240 people there and that was great. It was the commerce side, the sponsor side, the **money** side that was lacking. EnterpriseDB, Cybertec and 2ndQuadrant were there with booths but I wonder if it was even worth it?


    It is great that 3 of the largest European commercial contributors were sponsors. I think the traditional model of sponsorship is over and those companies are not getting their investment back. This is certainly not the fault of PgConf.eu nor is it the fault of the sponsors. It is habit to fall into what we know. It is that what we know, is broken. There is no true return on investment. You might pull a couple or even a half a dozen leads. Yes you get 20 minutes to give your speech at the end of the conference (when a lot of the people have left anyway) but business is about relationships. I don't see how 3 booths at a 240 person conference enables relationships to be built.


    I wonder what the attendees are getting out of it? Do we see these sponsors as more beneficial than companies (such as Credativ) who wasn't sponsoring with a booth but was speaking? Or are we just silently thankful that they are there because that way we don't have to spend 500 Euro to attend the conference? If the latter, does it make sense for companies to continue to contribute? What could a company do for the community (and themselves) with a 20k sponsorship versus having a booth and their name on a shirt?


    Which brings me to my final point. If pgConf.eu (or Pgcon, NYCPgConf) were to charge 500.00, and it was sponsor free (or at least more creative in presence) would you still go? What if that included a half day tutorial? What if the conference was only about PostgreSQL + Community. Yes, sponsors are part of the community and they are more than welcome to show up in matching shirts and speak when their talks are accepted but no booths, no littered t-shirts, just good old fashion community, and education.


    What do you think?


    Categories: Business, OpenSource, PostgreSQL, SQL

    5 Things a Non-Geek Girl Learned from Playing with Geeks at CMD
    Posted Wednesday Nov 6th, 2013 12:09pm
    by Angela Roberson
    | Permalink

    When I began at Command Prompt, Java was coffee, Python was a snake, and a Ruby was best used on the color of glittery slippers. If you would have asked me two and a half years ago what "PostgreSQL" does, I would have asked you what language you were speaking.


    A year later, I took my first sales call with out Joshua Drake (jd, @linuxhiker). I was shaking in my boots and it was inevitable that I was going to be sick. Then something happened as soon as I heard the customer say, "Hello".


    I understood what the customer needed and most importantly, I knew we could do it. I was able to say confidently and with out doubt, "Yes, we can take care of you."


    I still have a LONG way to go and I will never be a geek. But here are some key things a non-geek girl has learned from playing with geeks:


    1. It's a big, technical world out there. It is vast and fast-paced. To keep up, you have to continue to instigate fresh ideas and put the time in to develop them. You also have to learn from other people's ideas. The PostgreSQL world is about more than consulting. It's a community of people passionate about the dynamic capabilities of PostgreSQL.


    2. Document down to your underwear. We are passionate about documentation at CMD and you should be too. Only good things can come from documenting work and as Open Source advocates, we promote sharing the Postgres love.


    3. Keeping your software, hardware, and underwear (sorry, it rhymed)updated is key in keeping up the health of your systems. If you don't, you will have problems at some point down the road and you will kick yourself for not making the upgrades sooner.


    4. Our goal is for people to be successful with PostgreSQL. In order to help people do that, it sometimes takes asking "stupid" questions. It will be worth your technical ego if it saves you time, money, and your sanity in the future.


    5. Take risks! Customers sometimes get nervous when we tell them what they are currently using just isn't going to work. The unknown can be unnerving but we are here to make things run as smoothly as possible. Take the risk and trust us!


    So there it is, folks. You Postgres guys (and gals) out there do some really cool stuff. You make the technical world go round and most non-technical people don't even have a base understanding of your brilliance. I'm fortunate to have the CMD team to answer my "stupid" questions with patience and kindness. This is an adventure!


    Angela


    Angela Roberson has been with Command Prompt for two and a half years. She is currently the CMD Team Manager and can help you with all of your non-geek customer needs.



    A pg_basebackup Wish List
    Posted Friday Oct 18th, 2013 09:24am
    by Andrew Nierman
    | Permalink

    pg_basebackup was introduced in Postgres 9.1 as a simple way to copy the data directory of a running database cluster. This was a great addition to a small group of
    PostgreSQL Client Applications.



    The pg_basebackup approach differs from the standard pg_start_backup(), rsync (or other file system copy), pg_stop_backup() approach in that it uses the replication protocol over a standard Postgres connection to make the base backup. A few highlights:


    • A completely standalone backup can be created easily by using the --xlog-method argument (the stream option here is particularly nice so that you don’t have to muck with the wal_keep_segments Postgres setting).

    • Thanks to the cascading replication machinery in 9.2 pg_basebackup can take a backup from an active hot standby server (where the basic pg_start_backup() approach fails).

    • As of 9.3 pg_basebackup will even create a recovery.conf file for you (with the --write-recovery-conf switch), so it’s basically a one-liner to get a standby with replication set up!




    There’s a lot to like, but I’m greedy....



    Wish #1: Parallelization



    If “idle hands are the devil’s playthings” I can only imagine what the devil would do with idle CPUs and network bandwidth. It would be great if pg_basebackup were able to parallelize creation of the base backup and make use of more available resources. In the past I’ve written a script to parallelize rsync by first processing its --itemize-changes output and then spawning multiple rsync processes in parallel to perform the copies; this parallelization can really cut down on the time to get a base backup copied. There are times when getting something done as fast as possible trumps other concerns.



    Wish #2: Compression Options



    Sometimes network bandwidth is in short supply. pg_basebackup has a --gzip switch to enable compression, but this compression is actually done on the client machine rather than the server. Certainly there is the option to run pg_basebackup from the originating server and use netcat (or scp, etc.) to get the data copied to the destination if we are using pg_basebackup’s --format=tar (-Ft) option. For example, on the originating server:

    nierman@source:~$ pg_basebackup -D - -Ft -x | pbzip2 -p8 | nc -l 1201


    (in an actual use case I'd also insert pv in the pipeline to get nice status/progress updates at various stages), and on the destination server:

    nierman@destination:~$ nc source 1201 > newbasedir/base.tar.bz2


    This process is a bit more complicated than a single command from the client/destination server, but more importantly we lose out on things like streaming WAL files when using the tar format with pg_basebackup. Note that I’ve also used pbzip2 above (a parallelized version of bzip2). It would be great to have the option to use external programs like pbzip2, pigz, etc. with pg_basebackup compression. Let’s call that Wish #2b.

    Wish #3: Don’t Die on Me Now!



    Here’s some output from a recent pg_basebackup session:

    pg_basebackup: starting background WAL receiver
    2052690927/2052718105 kB (100%), 1/1 tablespace
    pg_basebackup: could not get transaction log end position from server:
    FATAL:  could not open file "./#postgresql.conf#": Permission denied


    pb_basebackup chunked through almost 2TB of data and then it died just before finishing. Ack. Why? At some point someone had edited the postgresql.conf file as root and their editor saved the backup copy seen above (in the postgres data directory of course!); the Postgres backend couldn’t read that file (owned by root) when creating the base backup. I would have loved to see a simple error message and have pg_basebackup continue on its merry way. Certainly an initial ls -l in the data directory is a good quick check that can be done manually, although that wouldn’t save me from people that hide their cat photos deep inside the Postgres base directory.

    Categories: OpenSource, PostgreSQL


    Copyright © 2000-2014 Command Prompt, Inc. All Rights Reserved. All trademarks property of their respective owners.