Fun With KQL – Split

Introduction

In databases, we often find columns that are stored in a hierarchy structure, not unlike a file path on your drive. For example, in the Microsoft Logs sample database the Perf table stores its counter path this way: \\computername\Memory\Available MBytes.

It would be helpful to have a way to easily break this path out into its individual parts. KQL provides us a way of doing this using the split function.

The samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

If you’ve not read my introductory post in this series, I’d advise you to do so now. It describes the user interface in detail. You’ll find it at https://arcanecode.com/2022/04/11/fun-with-kql-the-kusto-query-language/.

Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.

Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.

Split Basics

To perform a split, you simply pass the column name to split, along with the character to split on, into the split function.

Here we’ve taken the Perf table, used take to grab 100 rows at random, then piped it into project.

Within the project, we create a new column, CPSplit, using the split function. We pass in the column containing what we want to split, CounterPath. We want to perform the split on the backslash.

Because the backslash is a special character, we need to put two of them to escape it. The KQL split function will convert it to a single backslash when it does its work.

In the output pane, you can see it created the CPSplit as an array, but the first item is blank. The split function found the leading double backslashes and treated them as two individual slashes to split on. Since there was nothing between them, it returned a blank column representing the data between the opening backslashes, which of course there was none.

We can fix our issue with the double backslash with parse, which was covered in Fun With KQL – Parse.

Fixing The Double Slash

To eliminate the double backslash, we can use the parse function. This will start grabbing characters starting after the double backslash. As before, with the backslash being a special character we have to escape it by using two backslashes. And because we want to get rid of two backslashes we have to put four of them.

This resulted in a new column, ParsedCounterPath, which has the same text without the double backslashes on front.

Now we can use the split function on the new column. As you can see in the output, the PCSplit now has an array with 3 distinct pieces.

{"DC01.na.contosohotels.com", "Memory", "Available MBytes"}

Getting The Individual Parts

Let’s take the next step, and be able to grab individual items out of the array produced by split.

If you look at the extend operator, you’ll see that the split function, in addition to the column to split on and the character to use for splitting, allows a third parameter, the position of the item from the array to be returned.

From past experience we know that the CounterPath column holds the computer first, then the object, then the counter name. Because arrays are zero based, we pass in a 0 into the first line of the extend operator so the computer name is returned.

We use 1 in the second line to return the object instance, then 2 to get back the counter name.

If you look at the output pane, you may notice something interesting. Take a look at the myCounterName column, and you’ll see it contains the text {"cpuAllocatableNanoCores"}. The squiggly braces are a clear indicator this is still a dynamic (aka array) datatype, in this case with one element.

Generally you want to get only the text, and not an array, so let’s see how to resolve this issue.

Split Then Read

In this example, we’ll first use split to break out data out, then read from the new array column it created.

As before, we use parse to chop off the leading two backslashes from the CounterPath column, resulting in the new ParsedCounterPath column.

We then use split to create a new dynamic (array) column of CounterPathArray, splitting our ParsedCounterPath column on the backslashes.

Now we’ll use another extend, to read the individual elements of the CounterPathArray using positional notation. We put the position number of the element we want in square brackets after the name.

As an example, into myComputer we put element 0 (the first one) of the array by using the syntax CounterPathArray[0].

This is repeated for the rest of the array, then everything flows into project to be displayed.

In the results pane, if you look at the three "my` variables you’ll notice they are not arrays, but individual non-array data types, in this example they are all strings.

Splitting on Other Characters

In these examples we used the backslash to split on. However, you are free to use any character. Dashes, colons, a space, the letter X, any character can be used.

For example, if you used split("Arcane-Code", "-") you’d get a dynamic (array) column with Arcane in position 0, and Code in position 1.

Let’s say you had a timespan, but it was stored as a string. Thus the format_timespan function wouldn’t work with it. You could use split("10:42:33", ":") to split it into the three elements of 10 (hours), 42 (minutes), and 33 (seconds).

Conclusion

This post showed how to use split, combined with parse and normal array notation, to extract the individual pieces out of a column of text. This can be a powerful tool for breaking down formatted text into individual components.

The demos in this series of blog posts were inspired by my Pluralsight courses Kusto Query Language (KQL) from Scratch and Introduction to the Azure Data Migration Service, two of the many courses I have on Pluralsight. All of my courses are linked on my About Me page.

If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.

Fun With KQL – Split

Introduction

Split Basics

Fixing The Double Slash

Getting The Individual Parts

Split Then Read

Splitting on Other Characters

See Also

Conclusion

Published by arcanecode

Leave a comment Cancel reply

Introduction

Split Basics

Fixing The Double Slash

Getting The Individual Parts

Split Then Read

Splitting on Other Characters

See Also

Conclusion

Share this:

Related

Published by arcanecode

Leave a comment Cancel reply