3

I have some data whose source-of-truth is in a MySQL database, size is expected to max out at the some-thousands-rows range (in a worst-case scenario) and I'd like to use puppet to configure files on some servers with that data (mostly iterating through those rows in a template).

I'm currently using Puppet 3.0.x, and I cannot change the fact that MySQL will be the authoritative source for that data. Please note, data comes from external sources and not from puppet or from the managed nodes.

What possible approaches are there? Which one would you recommend?

Would External Node Classifiers be useful here?

My "last resort" would be regularly dumping the table to a YAML file and reading that through Hiera to a Puppet template, or to directly dump the table in one or more pre-formatted text file(s) ready to be copied to the nodes.

There is an unanswered question on SF about system users but the fundamental issue is probably similar to mine - he's trying to get data out of MySQL.

Luke404
  • 5,708
  • 3
  • 44
  • 58

2 Answers2

3

I go even further and have a MySQL db that's authoritative for all server data (it's a django app actually). Here's how I integrate them:

Node definitions

There are two possibilities. We currently use a generated manifest, that's included from site.pp containing entries like:

node "avrdb.prod.example.com" {
    $sdb_status="live"
    $sdb_db_type="slave"
    $sdb_db_version="mysql_55"
    $sdb_os="co56"
    include "s_db::avrdb"

}

But soon-ish we need to switch this to an ENC as puppet doesn't support this anymore. The ENC is already written and uses almost the same template.

Server data

Some recipes, like our DNS master recipe, need extra data about servers. Like you suggest, this is done in templates. Here's an example of one such template: named.conf which needs to know all secondary nameservers. It simply uses the mysql gem to get to the database. The database structure is irrelevant of course, but a complete example is usually nice :)

<%   
require 'rubygems'
require 'mysql'

dbh = Mysql.real_connect('serverdb.prod.example.com', 'user', 'pass', 'serverdb')
int_slaves = dbh.query("SELECT ip_address, name FROM network_vip WHERE name LIKE 'idns%' ORDER BY name")
int_hosts = dbh.query("SELECT i.ip_address, a.name FROM servers_interface as i LEFT JOIN servers_asset as a ON i.asset_id=a.id WHERE a.name regexp '^idns-' and i.name='eth0'");
ext_slaves = dbh.query("SELECT ip_address, name FROM network_vip WHERE name LIKE 'edns%' ORDER BY name")
ext_hosts = dbh.query("SELECT i.ip_address, a.name FROM servers_interface as i LEFT JOIN servers_asset as a ON i.asset_id=a.id WHERE a.name regexp '^edns-*' and i.name='eth0'");
%>

options {
    directory "/var/named";
    dump-file "/var/named/data/cache_dump.db";
    statistics-file "/var/named/data/named_stats.txt";
    allow-recursion { none; };
    allow-transfer { any; };
    allow-query { any; };
    // default for transfers-in and transfers-out is 10
    transfers-in 128;
    transfers-out 128;
    // default for transfers-per-ns is 2
    transfers-per-ns 8;
    //  serial-query-rate on a master has undocumented side-effects
    //  of notifying slaves much faster for many zones
    serial-query-rate 4096;
    // resolvers or authoritatives
    also-notify {
        // Internal slaves
        <% int_slaves.each{|slave| -%>
        // <%= slave[1] %>
        <%= slave[0] %>;
        <% } 
        int_slaves.data_seek(0) -%>

        // Internal dns hosts
        // Needed as we move all the boxes over to HA, where they
        // query from eth0, not vip
        <% int_hosts.each{|slave| -%>
        // <%= slave[1] %>
        <%= slave[0] %>;
        <% }
        int_hosts.data_seek(0) -%>

        // External slaves
        <% ext_slaves.each{|slave| -%>
        // <%= slave[1] %>
        <%= slave[0] %>;
        <% } -%>
        <% ext_hosts.each{|slave| -%>
        // <%= slave[1] %>
        <%= slave[0] %>;
        <% }
        ext_hosts.data_seek(0) -%>


    };
};
include "/etc/rndc.key";

// logging
logging {
    channel default_file { file "/var/log/named/named.log"    versions 10 size 100m;  print-time yes; print-category yes; };
    channel security_file { file "/var/log/named/security.log" versions 10 size 100m;  print-time yes; print-category yes; };
    channel query_file { file "/var/log/named/query.log"    versions 10 size 100m;  print-time yes; print-category yes; };
    channel debug_default { file "/var/log/named/debug.log"    versions 10 size 100m;  print-time yes; print-category yes; };

    category general  { default_file; default_debug; };
    category security { security_file; };
    category default  { default_file; };
    category queries  { query_file; };
    category edns-disabled  { null; };
};

statistics-channels {
    inet 127.0.0.1 port 8080;
};

// Do not put any zone declarations here unless they differ between views
// Put zones with data common to all views in commonzones.conf

// This view goes to all the internal nameservers
view "internal" {
    match-clients {
        // Internal slaves
        <% int_slaves.each{|slave| -%>
        // <%= slave[1] %>
        <%= slave[0] %>;
        <% } -%>

        // Internal ds hosts
        // Safety catch in case they request an AXFR on their eth0 ip
        <% int_hosts.each{|slave| -%>
        // <%= slave[1] %>
        <%= slave[0] %>;
        <% } -%>

    };
    zone "example.com" IN {
        type master;
        file "master/example.com-internal.zone";
    };
    // This path relative to chroot
    include "/etc/commonzones.conf";
    include "/etc/zones.rfc1918";
};

// This view goes to everyone else
view "others" {
    match-clients { any; };
    // Special zones that differ between views
    zone "example.com" IN {
        type master;
        file "master/example.com-external.zone";
    };
    // This path relative to chroot
    include "/etc/commonzones.conf";
    include "/etc/zones.rfc1918";
};
Dennis Kaarsemaker
  • 18,793
  • 2
  • 43
  • 69
  • I never bothered about nodes because I don't have many of them and because the ENC is a perfect fit there. My problem is all about big server data (thousands of dns entries, thousands of mail domains (smtp routing, antispam settings, etc), thousands of web virtual hosts, and so on). You nailed it, with queries in templates. It seems to perfectly fit the job. Too bad it doesn't exactly sound like a "puppet best practice" - I hope they will address such use cases in future release. Meanwhile I'll probably end up adopting your answer after some further testing. – Luke404 Dec 11 '12 at 13:20
2

You could also use hiera-mysql backend. If you see the source for this backend, you can see it is easy to create a new one, or customize this one. Using hiera like this, will make your template/manifest code cleaner, than using the approach suggested by Dennis (which also works finel)

Not Now
  • 3,532
  • 17
  • 18
  • I did some tests with hiera-mysql and I think it's not suitable for this task. For one, if I understood correctly, you can only configure a single query in it - that would make "funny" to support different use cases (lots of columns?). The YAML backend should do it better with some keyspace partitioning (key prefixes?) but still it sounds inefficient to me to repeat "column names" for every single entry when you have thousands of entries... that's why it is my last resort right now :) – Luke404 Dec 11 '12 at 11:31