3 Dec 15

    Let your tools take the strain

    unlokq

    Automate everything! If there is something that you can’t automate, consider whether it is really needed, as it slows you down. Continuing the How to eat an Elephant, and other IT monitoring problems series I will poke into the ability of monitoring teams to simplify the onboarding of monitoring.

     

    Through my previous posts we have got to a stage now where we have a set of offerings that are capable of being delivered at a known effort and cost to the requester that are capable of meeting their data needs, but although the time lack between requirements and deployment has been minimized, the amount of time to fulfill the request has not altered, it is still highly dependent on people pressing buttons. Ultimately we want to evolve to state where the request can be made and fulfilled automatically, or even not require the request at all, it just happens automagically.

     

    So how to do that. It starts with being able to automatically find out where the monitoring needs to be deployed, for bronze classes of monitoring then this is relatively simple, since each target type has its own discrete monitoring and they all get the same monitoring, but if you are deploying application silver class application monitoring how would you do that? Well it depends on the automation tooling that you have at your disposal. Let’s start with the most mature scenario

     

    Fully automated application deployment

    In this scenario all parameters for the software, and the hosting platform from databases, to servers, to websites and other UIs are already known at the time of deployment. The obvious choice therefore is to expand the automation workflow to also include the provisioning of the monitoring. This scenario might be typical is a highly virtualized environment, a Virtual Private Cloud system, or in a DevOps practice

     

    Discovered application deployment

    In this case a central repository of the created environment is needed. This may be a manual repository in which case the relationships between the monitored targets need to be handled by some mechanism, in some monitoring tools then this can be done within the tools themselves, in some cases it is done via an external CMDB that can be referenced to do root cause evaluation, either in real time, or after the fact.

     

    Manual application deployment

    On the face of it this seems a pointless task, if the system build is manual then the automation of monitoring deployment is hardly going to make an impact on the time to provision a system. But consider the following;

    1. In these environments monitoring is usually an afterthought, and so happens last, delays in monitoring cannot be hidden during other activities, it will materially delay rollout
    2. Onboarding is to do with provision. Automation can also handle changes, either to the monitoring environment, or the monitored environment.

     

    Obviously this will require the most effort to automate, fortunately not for you in the monitoring function, but for the system and application builders. Understanding how this could be done gives a solid grounding in the function of the other 2 scenarios, so I will discuss this in more detail.

     

    The simplest scenario would be to give each target type (server, database, application etc.) a specific naming convention that identifies Owner, Location, Function, Environment and probably with some numeric suffix to indicate how many of each there are, this obviously has to be done as part of design, probably as part of some enterprise convention. Increasingly this is being frowned upon as naming conventions, certainly for infrastructure, and some applications are exposed through DNS, albeit hopefully internally only J

     

    So where else can you put this information, quite simple really, on the target. In a text file. If text is not appropriate then perhaps in the registry (for windows) or as SNMP system information. Just make sure the information is retrievable systematically by your automation solution. This also allows for more information to be passed to the automation system. I call this fingerprinting.

     

    prints

    Example fingerprint configuration file

     

    In the case of the first 2 options it might be that the Oracle database is hosted on the server (they certainly support the same service). Reading the file on the server, using a simple rule would trigger the automation tools to search for the second file in a known location. In the case of databases and applications there may be several entries in the fingerprint file.

     

    Note that this is the minimum information that could exist in the file, though this may equally well be defined in the tool itself depending on the level of standardization in the enterprise. Depending on the capability of your monitoring solution, examples of what else might be included at various levels are;

    • Filesystems specific to applications or databases, and not owned by the server owning team
    • Application processes of interest
    • Cluster membership
    • Passwords ??? Set the permissions on the file accordingly
    • Thresholds for event triggers

     

    The downside of these fingerprints is that they are only as good as the information that is within them, and that is subject to human error, ‘crap in crap out’ is the old adage. They are also subject to change, however, this is of small concern as the first thing that should be included in your monitoring standards is to monitor for changes in these configuration files and take action accordingly, and automatically