Integration Candidate Analysis: Notes from the Sausage Factory

deltaTools is useless all by itself.

Depending on the use case, the tools need a build server, a version control system, maybe a test management system… and, of course, some code to deploy and against which to execute tests. Add it all up and deltaTools might integrate with hundreds of software platforms… and probably does ingrate with dozens, right out of the box. But which ones, and how well?

This is a critical question. No organization will begin with an empty dev stack and enter deltaTools as its first element. deltaTools will almost always be a late addition, in an effort to improve the existing stack. And because there are three distinct tools in the deltaTools suite and many dimensions of fitness, each integration candidate should be evaluated at a reasonable depth, which takes time: a precious resource in an open-source project.

So how to choose?

We performed a comprehensive analysis of the DevOps marketplace, with this goal: create a comprehensive list of deltaTools integration candidates, categorized by use case, linked to online resources (reviews, etc.), and rationally ordered so we can work our way down the list from top to bottom.

And we got it done in a couple of hours! Here’s how…

The Process

First, we needed a data source with the following attributes:

  • Broad coverage: lots of products and categories.
  • Deep coverage: lots of user reviews, screenshots, and other useful stuff.
  • Available coverage: data near the surface and easy to screen-scrape.

Our choice was, which offers nearly a million software reviews cross-linked into dozens of categories, conveniently organized into grids on every category page. We picked categories we were sure about (e.g. Test Automation), and then examined their top products for links to other categories. Before long, we had what felt like a complete set.

Next, we needed a way to capture the data. Our tool of choice was Data Miner, which offers a free account with enough screen-scraping capacity to handle a job like this with ease. Data Miner uses jQuery CSS selectors to identify rows on the page and capture column values, and it has smart controls to support non-coders if you need that kind of thing. Data Miner can also find the Next button in paged data, so even for large datasets all we had to do was navigate to the appropriate G2 category page, launch our Data Miner recipe, and download the result.

If you’d like to try our Data Miner recipe, follow these steps:

  1. Install the Data Miner extension from the Chrome Store (or the equivalent for browser of choice) and sign up for a free Data Miner account.
  2. Navigate to a product list (try this one), open the Data Miner extension, and click the Public tab. If you see a recipe entitled G2 Products, that’s ours.
  3. Click on the recipe and then click the Run button. If your list runs to more than one browser page—as the example above does—click the Next Page button to capture the remaining pages. When you download the XLSX file, give it a name that reflects the category in order to support the next step.

After capturing our raw data we needed a way to ingest it into Excel and condition it for analysis. For this we used PowerQuery in Excel. We set it up to combine all of our output files (conveniently located in a single directory), generate a hyperlink for each product/category combination, and pivot the result by category before populating an Excel table. The advantage of doing it this way was that adding a new category is as simple as downloading another file and refreshing the query in Excel. (Updating the follow-on analysis takes a couple more steps, see below.)

Finally: analysis. Our PowerQuery gave us the categories assigned to each product as well as the number of reviews and the average rating for each product in each category. We calculated the category count, average rating, and review count for each product, then assigned a relative weight to each of these factors (ok: a relative priority, which we defined as the inverse of weight) and used the lot to calculate a fitness value.

Ranked in descending order by fitness, the products at the top of our list offer broad functionality (lots of categories), wide adoption (lots of reviews), and good quality (high ratings). Those are our initial evaluation targets… except for the top two (Jenkins and GitHub), which were already integration targets on day one of development. Which nicely validates the analytic approach! 🙂

Note that just being on the list doesn’t guarantee a product will interface with deltaTools! That’s a question of analysis. This process just gives us our to-do list: which products to examine, in which order, to provide value to the greatest number of prospective deltaTools users with the minimum necessary work.

The Result

This table presents every reviewed product listed on in our categories of interest, sorted in descending order by fitness. This is our to-do list. We used the procedure below to generate it in just a couple of hours.

If you’d like to dig deeper into our analytic approach, this ZIP archive contains our analysis spreadsheet (with the embedded PowerQuery) as well as all of our raw data. To see it working, just download, unzip, and refresh the data connection.

One point worth knowing: when you refresh the data, every neat little hyperlink will be replaced by the ugly text of the Excel HYPERLINK function. This reflects a bug (or at least a requirements miss) in PowerQuery. To fix it, after refreshing the data just perform a Replace All of “‘=” with “=” (make sure you search in formulas, which is enabled by default). Then just reset the column widths and the spreadsheet should look fine. You can read more about this trick here.

Also note that there are a bunch of hidden columns in the analysis spreadsheet that decompose the category values returned by PowerQuery and calculate overall ratings & review counts. There was probably a better way to do that, but in this implementation adding a new category will also require the addition of a new category Score and category Reviews columns, plus a formula adjustment to the last four columns, which consume them. To remove a category, do the reverse. If you’ve got this far, neither should be any trouble at all.

And, of course, once you get the approach you can use it anywhere. 🙂

Leave a Reply