2 Aug 17

    IT Discovery – about as useful as squashing water

    unlokq

    OK, so it’s not as pointless as squashing water, it has a purpose in a traditional environment where governance may not have always been as good as it is now (ahem! – looks away sheepishly), people were able to install, configure and subsequently change systems and software beneath the radar of governance, or possibly in the absence of any governance. How many times has that excel spreadsheet been up to date when you needed it?

    IT discovery tools have given the IT department a capability to inventory all of those separate systems and store then in a central location, post-hoc. Even there though we run into issues; in a bizarre catch 22 you need to know that something exists before you can discover it.

    And in the most perfectly discoverable environment what we have ended up doing is creating a solution whereby people can continue to do things in the same way. With IT department covering the cost of employing additional people to discover what the silos have been doing. This is the dead mans curve dilemma.

    The Dilemma of Dead Man’s Curve is this: when the existing infrastructure no longer supports the demands placed upon it—causing disruption to operations, etc.—the operators of that infrastructure always will try to mitigate the related risks by installing patches at the lowest possible cost. Their goal is to extend the useful life of the investment in the infrastructure, despite the expenses of losses that may result. Patching Dead Man’s Curve is always lower in cost than investing in building a new, functional infrastructure. But the patches merely delay the issue—when should we decide to abandon what exists and invest in building something new that will work?

    With this in mind, let’s ask the following question; ‘Why do we need the information?’ The first response is; ‘we need to discover all the ‘CI’s and their relationships in the estate’.

    Leading to the second question; ‘For what purpose?’ I see 2 main reasons;

    • Cost; to understand the financial weight of the IT estate, ideally on a service by service basis.
    • Quality; to understand the impact of any changes made in the IT estate, whether by the addition of new services, or amendments and removal of existing services.

    Both of these are still valid reasons to know about the components that make up the IT estate, and we could continue to use discovery tooling if the world had stood still for the last 10 years.

    IT is automated.

    If yours is not then it should be. Automation implies increasing speed, and done correctly can increase the agility of your organisation. What does this automation mean for discovery;

    Post hoc discovery introduces ‘lag’ in the system. New CIs can be deleted, created and changed at an interval that is not dictated by human time-frames. Consider continuous deployment, or cloud provided services.

    NOTE: Automation requires not only that automation tools are allowed access to perform their duties, but importantly that human access is removed.

    • For systems, this has the added benefit of removing the security risk associated with humans.
    • For cloud subscriptions having humans and machines accessing the same subscription introduces a governance nightmare.

    Systems are not owned.

    Highly automated environments, where a collection of resources are combined ‘on-demand’ to meet the need of a requestor have been given a name that you might recognise. Cloud! Whether it be public or private these resources should NOT be considered ‘owned’ by IT. Although it is worth noting at this point that applications that run on these provided compute resources are, and may well be subject to discovery. More on this is later post.

    NOTE: In private cloud they should be operated with their own operating budget as a provider of resource to the wider IT community.

    What for these environments? To what level is topology (CIs and their relationships) required from a purchaser/consumer perspective?

    Clearly not to the compute resource level (memory, CPU, disk), since we cannot affect them as discrete elements (they are part of the offering). But what about a VM instance as a collection of compute resources?

    Let’s go back to what we want to achieve with our configuration management system. Control cost and quality.

    Cost

    Is the person requesting the VM interested in individual VM? Probably not!  In almost all cases there will be multiple VMs, coupled with networking components and storage elements that form a platform. A platform being the base onto which we build additional functionality. For simplicity I will refer to this functionality as ‘the application’. With this context we do not need to know about individual VMs, but the aggregated cost of all VMs, network and storage infrastructure (application components) that supports the application.

    If the various application components are in a single subscription then the cost accounting is simple, cost of subscription equals the cost of the application. If however, as is most often the case, the application components share the subscription with other applications then they need to be tagged in some way and the consumption of the tagged is the cost for the application. This cost may be affected by the actions taken as part of maintaining quality.

    Quality

    Slightly more difficult to explain. It boils down to the actions you can (are allowed to) take, and the reasons for taking those actions. Consider the following 4 scenarios;

    Scenario Action Impacted Reason
    Application performance is slow Add VM instance Platform The original VM remains the same, since no changes are made, the change is to the configuration of the platform on which the application resides
    Application performance is slow Resize VM instance VM The VM grows/shrinks in size. This is the same as adding/removing, for example, CPUs to a physical host)
    Application unavailable Delete and Replace VM Platform This is a like for like replacement although the template for deployment remains the same, the instance details are different. This is not the same as recovery from backup that might happen in a traditional environment
    Application Performance is slow Move VM Platform The VM is moved to alternate storage, so all that is changed is the relationship between components (some of which may be new)

    The odd man out in the above scenarios is the resize instance. But consider an alternative perspective, in which the existing VM is shutdown, after replacing with an additional larger VM instance, it is identical to the third option. In which case the only thing that changes is the platform. What does this mean for what is discovered? We only need to know about the platform

    Architecture

    The twist here is that all of these actions can, and I would say should, be built-in to the application architecture, in which case they should be performed by the cloud provider as part of the design. Retroactive remediation introduces lag into the resolution process. In these days where we as consumers are very demanding of the application quality attributes this should always be the case.

    The effect of this is the increasing importance of ‘the architect’ in the ‘discovery’ process. A well architected application that is deployed into a cloud environment does not need to be discovered since all we need to know is held in the architecture pattern (also known as a service design) is already known at time of deployment. Why would you discover what is already known? Some weird desire to prove that the provider has retained all the information you have told them, perhaps?

    The temptation to believe that we need to know the application component information is non-sensical. It is an example of inertia or the ‘this is the way we have always done it’ mentality. Showing that although you may have moved with the technology you have not moved with your mind into a new paradigm.

    Let me know your thoughts, I am open to any reason why we might want to continue to do discovery for purchased cloud services, but have been unable to get a sufficiently convincing reason to sway my opinion.