Fun With KQL – Percentiles

Introduction

Often we want to get data that is relative to other data. For example, we want a list of computers that have free space that is greater than the free space of other computers. We need to set a threshold, for example we want to return results where the free space is greater than 95% of the free space on other computers.

To do this, Kusto provides the percentile operator, along with its variants percentiles and percentiles_array.

One hundred percent of the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

If you’ve not read my introductory post in this series, I’d advise you to do so now. It describes the user interface in detail. You’ll find it at https://arcanecode.com/2022/04/11/fun-with-kql-the-kusto-query-language/.

Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.

Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.

Percentile Basics

In this first example, we’ll use the percentile function to return a list of computers who have more than (or equal to) 90% of the free space of the other computers in the Perf table.

We take the Perf table and pipe it through a where to restrict the results to the Available MBytes counters.

This is piped into a summarize, where we employ the percentile function. In the first parameter we pass the column with the value to analyze, in this case it is the CounterValue column. In the second parameter, we pass in the threshold value, in this case 90.

Finally we use the by clause of summarize to indicate how we want to summarize the results. Here, we want to summarize by the ComputerName column.

In the results, we see a list of computers from the Perf table whose Available MBytes values are greater than or equal 90 percent of the other computers in the Perf table.

Do note, the Perf table actually represents a table of performance counter entries, so strictly speaking this isn’t totally accurate data. However we’ve been using the Perf table throughout this Fun With KQL series, so it will do for this example.

Percentiles Basics

The percentile function works fine for a single percentage, but what if you wanted to find values across a range of percentages? For example, you want to return values for 5, 50, and 95 percent?

Rather than having to run three different queries, Kusto provides a percentiles function so you can return multiple values at once.

The query is almost identical to the previous one, except we are using percentiles instead. As before the first parameter is the column to analyze. Next, we have multiple values to use for our percentile calculations.

Here we used three, 5, 50, and 95, however we could use just two, or more than just three.

At the end a sort by was used to order the output by the name of the computer.

In the output you see three columns for each computer, reflecting the Available MBytes values for 5, 50, and 90 percent.

Renaming The Output Columns

In the previous example the default column names that the percentile function output were rather, well ugly to put it bluntly. We could improve on it by using an operator we’ve seen before, project-rename.

Our query is identical to the previous, except the sort by was replaced with project-rename. (The sort could have been retained, I simply removed it to make the example a bit simpler.)

If you recall my post Fun With KQL – Variants of Project, all we have to do is list each new column name we want to use, then after the equal sign the existing column to assign to it.

Our new names are a lot better, but we can streamline the rename process even more. The summarize operator allows us to rename when we make the call.

After the summarize operator we list each new column name we want to use for the output in parenthesis. As you can see, the output used the new column names we provided.

Multiple Levels of Percentiles

In the previous example, we used three percentiles, however this is no limit. In this next example we’ll bump it up to five.

Here we used the same technique as the previous sample, except we have more percentile values. As you can see, we also took advantage of KQL’s flexible layout to make the query easier to read.

Percentiles As An Array

There may be times when we want the percentiles returned in an array instead of columns. For that there’s an alternate version of the percentiles function, percentiles_array.

The first parameter passed into the percentiles_array function is the column we’re evaluating, here CounterValue. The remaining parameters are the percentile values to use. Here we used our original three, but we could have used as many as we needed.

We could call on our old friend mv-expand (covred in the post Fun With KQL – MV-Expand) to turn the expand the array into rows.

See Also

The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.

Fun With KQL – MV-Expand

Fun With KQL – Project

Fun With KQL – Sort

Fun With KQL – Summarize

Fun With KQL – Variants of Project

Fun With KQL – Where

Conclusion

In this post we saw how to use percentiles to evaluate data in relation to other data. We also saw how to return this data in array form.

The demos in this series of blog posts were inspired by my Pluralsight courses Kusto Query Language (KQL) from Scratch and Introduction to the Azure Data Migration Service, two of the many courses I have on Pluralsight. All of my courses are linked on my About Me page.

If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.

Advertisement

Fun With KQL – MV-Expand

Introduction

In the previous article, Fun With KQL – Make_Set and Make_List, we saw how to get a list of items and return them in a JSON array. In this article we’ll see how to break that JSON array into individual rows of data using the mv-expand operator.

Before we expand our KQL knowledge, be aware that the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

If you’ve not read my introductory post in this series, I’d advise you to do so now. It describes the user interface in detail. You’ll find it at https://arcanecode.com/2022/04/11/fun-with-kql-the-kusto-query-language/.

Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.

Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.

A Reminder – Make_Set

Before we look at mv-expand, let’s take a quick reminder of make_set from the previous post.

Here Perf was piped into a where operator to limit the results.

We then used make_set to get a list of all the computers from the data that was piped in. It created a JSON array and stored it in the new Computers column. The make_set function created a list of unique computers, so each one from the dataset being piped in only appears once in the JSON array, no matter how many times it was in the incoming dataset.

MV-Expand Basics

Having a JSON array is nice, but what if we really want a dataset of individual rows, where each item from the JSON array appears in a row? As you may have guessed by now, the mv-expand operator can do this for us.

We take the same query as before, and pipe it into the mv-expand operator. We specify the column holding the JSON array.

From here, mv-expand does its thing, and converts each item in the JSON array into individual row. It uses the same name as the original column for the new on, Computers.

As you can see, mv-expand can be very useful for transforming JSON columns into easily usable rows.

See Also

The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.

Fun With KQL – Make_Set and Make_List

Fun With KQL – Summarize

Fun With KQL – Where

Conclusion

This post explored the useful mv-expand operator. With it you can extract the contents of a JSON array and pivot them into individual rows in a dataset. We also saw how it works nicely with the make_set and make_list functions.

The demos in this series of blog posts were inspired by my Pluralsight courses Kusto Query Language (KQL) from Scratch and Introduction to the Azure Data Migration Service, two of the many courses I have on Pluralsight. All of my courses are linked on my About Me page.

If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.

Fun With KQL – Make_Set and Make_List

Introduction

In previous posts, I’ve mentioned using certain functions and operators to investigate conditions in your system. Naturally you’ll need to create lists of those items, based on certain conditions.

For example, you may want to get a list of the counters associated with an object. Or, you may want to get a list of computer where a certain condition is met.

In this article we’ll see how to get those lists using the Kusto make_set and make_list functions.

The set of samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

If you’ve not read my introductory post in this series, I’d advise you to do so now. It describes the user interface in detail. You’ll find it at https://arcanecode.com/2022/04/11/fun-with-kql-the-kusto-query-language/.

Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.

Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.

Older Names – MakeSet and MakeList

Before I start, let me mention there are older versions of these functions, makeset and makelist. They were renamed to fall in line with revised Kusto naming standards, adding the underscore into the name.

While the old versions still work, you should use the newer version of the names in case Microsoft should phase them out in the future.

Make_Set Basics

For our first example, let’s see how to get a set of items, associated with another item. In this query, we’ll get a list of counter names associated with an object name.

We take the Perf table and pipe in into the summarize operator. A new column name is declared, Counters. We then use make_set, passing in the CounterName column. After the by, we use ObjectName.

This will result in Counters holding a JSON array of CounterNames associated with an ObjectName.

If you look at the output, the second row for the ObjectName of Memory as been expanded. In the Counters column you see a JSON array holding two values, Available MBytes and Available MBytes Memory.

Simply put, the Memory object has two counter names associated with it, Available MBytes and Available MBytes Memory.

Making a Set Based on a Condition

A second, and slightly more useful way to use make_set is to get a list of items where a condition is true.

In this example we again turn to the Perf table. We use a where operator to limit the results to our % Free Space counters where its value is greater than 95 (i.e. 95%).

As before, we go into a summarize operator, creating a new column Computers. We call make_set and pass in the Computer column.

Note that for this query we didn’t use the by portion. In this case, make_set takes the data in the Computer column creates a JSON array, as you can see in the output. This gave us a set of three computers who have more than 95% free space.

Make_List Basics

The second way to create these sets is the make_list function. It works almost identically to make_set, with one minor difference. Let’s see the query in action, and that difference will become clear.f

This query is identical to the one for make_set, except of course for using make_list. However, look at the results.

You’ll see the first computer, SQL01.na.contosohotels.com appears twice in the list. Likewise the computers that begin with SQL12 and SQL00 appear multiple times. And that’s just in little bit that is visible!

Now you can see the big difference, make_set creates a unique list of items. Each entry will only appear once in the JSON array. The make_list function performs no such de-duplication. If the item (in this case the computer name) appeared 100 times, it would be in the JSON array 100 times.

Crashing the User Interface

In the previous example, I attempted to click on the arrow beside the Computers in order to expand the list. The user interface came down with a bad case of "fall down go boom". It sat for a while, before just locking up on me.

I finally determined that the JSON array just had too many items to display. Fortunately, there is a way around this.

Both make_set and make_list accept an optional second parameter to indicate the maximum number of items to return.

In this make_list example, after the Computer column I passed in the value of 64. This will limit the number of items in the JSON array to sixty four items.

I could have used any number, honestly I picked 64 because I happened to glance over at my old Commodore 64 sitting on my desk and decided that would be a good number. Computer history is fun!

Now that I had limited my JSON array, I was able to expand the data in the results grid, and could see the duplicated values. Again, both of these functions support the use of the optional parameter, however you are more likely to need it with make_list.

Make_Set_If

In our first example for make_set, before calling it we had a where operator. Part of it limited the results to rows with a counter value greater than 95.

There is an alternative to make_set called make_set_if. With this function we can pass the condition in as a parameter.

Here we still used where to limit the data to the free space percentage counter. But as a second parameter to make_set_if, we pass in a condition of CounterValue >= 95.

We could have included both conditions by surrounding them with parenthesis, such as:

make_set_if(Computer, (CounterName == "% Free Space" and CounterValue >= 95))

However it turned out to be a more efficient to remove the non free space rows first.

And yes, in this version I did use greater than or equal to, instead of just greater than as I did originally, because why not?

Note that make_set_if also supports the parameter to limit the result set size. It becomes the third parameter, as in:

make_set_if(Computer, CounterValue >= 95, 64)

Make_List_If

There is also a make_list_if function.

It behaves like make_set_if, except for not removing duplicated values. In this example I added the third parameter to limit the size of the JSON array to 32 items.

See Also

The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.

Fun With KQL – Project

Fun With KQL – Summarize

Fun With KQL – Where

Conclusion

In this post we saw how to use the make_set and make_list functions, along with their corresponding make_set_if and make_list_if functions, to get a list of values in a JSON array. These are useful functions for returning a list of items, such as computers, where some condition is true.

The next article in this series will focus on the mv-expand function, which can be used to take the JSON array created by make_set (or make_list) and convert it into rows.

The demos in this series of blog posts were inspired by my Pluralsight courses Kusto Query Language (KQL) from Scratch and Introduction to the Azure Data Migration Service, two of the many courses I have on Pluralsight. All of my courses are linked on my About Me page.

If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.

Fun With KQL – Arg_Max and Arg_Min

Introduction

A very common need in query languages is the ability to extract the maximum and minimum values in a column of data. The Kusto Query Language provides this capability through two functions, arg_max and arg_min. In this post we’ll take a look at these functions.

At a minimum, you need to be aware that the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

If you’ve not read my introductory post in this series, I’d advise you to do so now. It describes the user interface in detail. You’ll find it at https://arcanecode.com/2022/04/11/fun-with-kql-the-kusto-query-language/.

Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.

Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.

Arg_Max Basics

As its name implies, the arg_max function returns the maximum value within the column passed into it.

In this example, we are going to use the summarize operator to summarize by the CounterName. The value we’ll use in the summarize is the maximum CounterValue, determined using arg_max, for each CounterName.

The first parameter we pass into arg_max is the column we want to find the maximum value for. The second argument is the column or columns to be returned, besides of course the max value of the passed in column. In this example we use an asterisk to return all of the columns piped in from Perf.

We then go into a project, to limit the output to a few columns, then sort them. (In a moment we’ll see how to limit the output of arg_max so we don’t need the project.)

Note in the output it retained the name for the column we are getting the maximum value for, CounterValue. You should consider renaming this column in the output to a name that is more reflective of the true data, such as MaxCounterValue. This could make the output clearer to the end user of your query.

Arg_Max With Columns

In this second example, we have the same basic result as the first query. In this version though, we pass in the few columns we want back.

In addition to CounterValue, we’ll pass in TimeGenerated, Computer, and ObjectName.

You’ll notice in this version we no longer need the project operator to reduce the number of columns. That is taken care of in arg_max. By taking advantage of this feature, you can make your queries more compact.

Arg_Min Basics

The arg_min function behaves identically to arg_max, with the exception of course of returning the minimum value from the passed in column. You can use the asterisk to return all columns or specify columns to be returned.

As such we’ll just demonstrate the summarize version of our query, but you can replicate the query shown in the previous section by using arg_min instead of arg_max.

As you can see, the minimum counter value across most of the data was a zero.

See Also

The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.

Fun With KQL – Project

Fun With KQL – Sort

Fun With KQL – Summarize

Conclusion

In this article we saw how to perform a common task across query languages, obtaining the maximum and minimum values for a set of data. We did so using the arg_max and arg_min Kusto functions.

The demos in this series of blog posts were inspired by my Pluralsight courses Kusto Query Language (KQL) from Scratch and Introduction to the Azure Data Migration Service, two of the many courses I have on Pluralsight. All of my courses are linked on my About Me page.

If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.

Fun With KQL – StrCat

Introduction

The strcat function has been shown in previous articles, but it’s so useful it deserves a post all of its own.

As usual, the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

If you’ve not read my introductory post in this series, I’d advise you to do so now. It describes the user interface in detail. You’ll find it at https://arcanecode.com/2022/04/11/fun-with-kql-the-kusto-query-language/.

Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.

Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.

StrCat Basics

In the Perf table, the three columns of Computer, ObjectName, and CounterName are often used together to identify a specific counter value. It’d be nice to combine them into a single column, and we’ll use the strcat function to do just that.

In this example we took the Perf table and piped it into a take to just grab a few rows for this demo.

Next we used the extend operator to create a new column, CompObjCounter. With strcat, you simply keep listing values in the parameters you want to join together.

We start with a column name, Computer. We then pass in a string literal, a static text string of " - ". This is followed by another column, another text string, and finally another column.

As you can see in the output, this results in the nicely formatted string of CH1-AVSMGMTVM - LogicalDisk - Disk Write Bytes/sec for the first row.

Here we passed in five values to the strcat function, but there is really no limit. We could have used as few as two, or twenty, or however many we needed. We could have also used other functions to generate values, as we’ll see in the next example.

StrCat with Functions

In this example, we’ll include a function, datetime_part, as one of the parameters to strcat.

We take Perf, and filter it down with two where operators. Then take is used to limit the rows for this simple sample.

Next we use case. You saw this code in the Fun With KQL – Case. We extract the month number from the TimeGenerated column using datetime_part. We compare it to a number and return the three character abbreviation for it.

From here we go into an extend operator, using strcat to combine the newly calculated MonthName, along with the day and year. You can see I used the datetime_part function as a parameter to strcat to demonstrate it is perfectly allowable to pass in functions.

By the way, if you refer back to the example in Fun With KQL – Case, I used the format_datetime function to get the day and year. Either method works, I just wanted to show a variety of ways to accomplish the same task.

As a final step to the query, I used project to display the column names to the output, including our new, nicely formatted DateText column.

See Also

The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.

Fun With KQL – Extend

Fun With KQL – Case

Fun With KQL – DateTime_Part

Fun With KQL – Format_DateTime

Fun With KQL – Project

Fun With KQL – Take

Fun With KQL – Where

Conclusion

In this post we took a dive into the strcat function. We saw how it can be used to concatenate columns together along with static text. In addition, we saw how functions could be used within strcat to create nicely formatted output.

The demos in this series of blog posts were inspired by my Pluralsight courses Kusto Query Language (KQL) from Scratch and Introduction to the Azure Data Migration Service, two of the many courses I have on Pluralsight. All of my courses are linked on my About Me page.

If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.

Fun With KQL – Split

Introduction

In databases, we often find columns that are stored in a hierarchy structure, not unlike a file path on your drive. For example, in the Microsoft Logs sample database the Perf table stores its counter path this way: \\computername\Memory\Available MBytes.

It would be helpful to have a way to easily break this path out into its individual parts. KQL provides us a way of doing this using the split function.

The samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

If you’ve not read my introductory post in this series, I’d advise you to do so now. It describes the user interface in detail. You’ll find it at https://arcanecode.com/2022/04/11/fun-with-kql-the-kusto-query-language/.

Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.

Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.

Split Basics

To perform a split, you simply pass the column name to split, along with the character to split on, into the split function.

Here we’ve taken the Perf table, used take to grab 100 rows at random, then piped it into project.

Within the project, we create a new column, CPSplit, using the split function. We pass in the column containing what we want to split, CounterPath. We want to perform the split on the backslash.

Because the backslash is a special character, we need to put two of them to escape it. The KQL split function will convert it to a single backslash when it does its work.

In the output pane, you can see it created the CPSplit as an array, but the first item is blank. The split function found the leading double backslashes and treated them as two individual slashes to split on. Since there was nothing between them, it returned a blank column representing the data between the opening backslashes, which of course there was none.

We can fix our issue with the double backslash with parse, which was covered in Fun With KQL – Parse.

Fixing The Double Slash

To eliminate the double backslash, we can use the parse function. This will start grabbing characters starting after the double backslash. As before, with the backslash being a special character we have to escape it by using two backslashes. And because we want to get rid of two backslashes we have to put four of them.

This resulted in a new column, ParsedCounterPath, which has the same text without the double backslashes on front.

Now we can use the split function on the new column. As you can see in the output, the PCSplit now has an array with 3 distinct pieces.

{"DC01.na.contosohotels.com", "Memory", "Available MBytes"}

Getting The Individual Parts

Let’s take the next step, and be able to grab individual items out of the array produced by split.

If you look at the extend operator, you’ll see that the split function, in addition to the column to split on and the character to use for splitting, allows a third parameter, the position of the item from the array to be returned.

From past experience we know that the CounterPath column holds the computer first, then the object, then the counter name. Because arrays are zero based, we pass in a 0 into the first line of the extend operator so the computer name is returned.

We use 1 in the second line to return the object instance, then 2 to get back the counter name.

If you look at the output pane, you may notice something interesting. Take a look at the myCounterName column, and you’ll see it contains the text {"cpuAllocatableNanoCores"}. The squiggly braces are a clear indicator this is still a dynamic (aka array) datatype, in this case with one element.

Generally you want to get only the text, and not an array, so let’s see how to resolve this issue.

Split Then Read

In this example, we’ll first use split to break out data out, then read from the new array column it created.

As before, we use parse to chop off the leading two backslashes from the CounterPath column, resulting in the new ParsedCounterPath column.

We then use split to create a new dynamic (array) column of CounterPathArray, splitting our ParsedCounterPath column on the backslashes.

Now we’ll use another extend, to read the individual elements of the CounterPathArray using positional notation. We put the position number of the element we want in square brackets after the name.

As an example, into myComputer we put element 0 (the first one) of the array by using the syntax CounterPathArray[0].

This is repeated for the rest of the array, then everything flows into project to be displayed.

In the results pane, if you look at the three "my` variables you’ll notice they are not arrays, but individual non-array data types, in this example they are all strings.

Splitting on Other Characters

In these examples we used the backslash to split on. However, you are free to use any character. Dashes, colons, a space, the letter X, any character can be used.

For example, if you used split("Arcane-Code", "-") you’d get a dynamic (array) column with Arcane in position 0, and Code in position 1.

Let’s say you had a timespan, but it was stored as a string. Thus the format_timespan function wouldn’t work with it. You could use split("10:42:33", ":") to split it into the three elements of 10 (hours), 42 (minutes), and 33 (seconds).

See Also

The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.

Fun With KQL – Extend

Fun With KQL – Format_TimeSpan

Fun With KQL – Project

Fun With KQL – Take

Fun With KQL – Parse

Conclusion

This post showed how to use split, combined with parse and normal array notation, to extract the individual pieces out of a column of text. This can be a powerful tool for breaking down formatted text into individual components.

The demos in this series of blog posts were inspired by my Pluralsight courses Kusto Query Language (KQL) from Scratch and Introduction to the Azure Data Migration Service, two of the many courses I have on Pluralsight. All of my courses are linked on my About Me page.

If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.

Fun With KQL – IsNull and IsEmpty

Introduction

In writing queries, it is not uncommon to get results where a column has missing values. This can cause concerns or questions from your users. "Why is this blank?", "There must be something wrong with your query its missing data!".

To avoid this, Kusto provides two functions to check for missing values: isnull and isempty. You can combine this with the iif function (covered in the Fun With KQL – IIF post) to provide clarifying text to the end user.

Before we begin, be aware the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

If you’ve not read my introductory post in this series, I’d advise you to do so now. It describes the user interface in detail. You’ll find it at https://arcanecode.com/2022/04/11/fun-with-kql-the-kusto-query-language/.

Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.

Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.

IsNull

The isnull function is used with numeric data. Let’s look at this simple example.

We’ll use the AppExceptions table, and use where to limit the output to rows where the SeverityLevel column does not have a value, in other words it is null.

This is then piped into the count operator, to get a count of the number of rows without an entry in the SeverityLevel column. As you can see, there are 2,753 rows without a SeverityLevel.

So what happens if we want to get a listing of rows in the AppExceptions table, and for rows with a null SeverityLevel print a message letting the user know it isn’t there?

We use our old friend iif to see if the SeverityLevel is null. Note that we don’t have to use isnull( SeverityLevel ) == true, the isnull function always evaluates to true or false. The iif knows what to do with that true or false, and executes the following parameters accordingly.

If it came out true, the SeverityLevel was null, the text No Severity Level is returned, to let the user know this row lacked any data in this column.

On the other hand, if there was data in the SeverityLevel column then that value is returned. Note something though, we had to wrap the column in the tostring function.

Both the true and false sections off an iif function must return the same datatype! Since the true branch returned a text string, the else must too. Since the SeverityLevel column is of datatype int, we need to convert it to a string.

We do so using the simple tostring function, which simply takes the passed in numeric value and converts it to a string datatype.

IsEmpty

While isnull is meant for numeric data, the isempty function is used with string data.

In this first example we’ll get a count of rows from the AppRequests table where the column AppRoleName is missing any text.

Here we pipe the AppRequests table into a where operator, limiting the dataset to only rows with an empty AppRoleName column. We then use the count operator and find there are 17,707 rows where this column has no data (as of the running of this query).

Lets emulate what we did with isnull, and create an iif statement to let users know that the data in the AppRoleName column wasn’t present.

Here we used iif to check if AppRoleName is empty. If so, it returns the text No App Role Name, otherwise it returns the AppRoleName column.

It wasn’t necessary to wrap the column name with tostring. AppRoleName is already a string datatype, so it will match the hard coded string we have for the true part of the iif function.

See Also

The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.

Fun With KQL – Count

Fun With KQL – Extend

Fun With KQL – IIF

Fun With KQL – Project

Fun With KQL – Take

Fun With KQL – Where

Conclusion

As you see, isnull and isempty are very easy to use. They can be combined with counts, summaries, if functions, even the case function. With these, users can have some assurance that the column was left with no data, and that it wasn’t an issue with the query.

The demos in this series of blog posts were inspired by my Pluralsight courses Kusto Query Language (KQL) from Scratch and Introduction to the Azure Data Migration Service, two of the many courses I have on Pluralsight. All of my courses are linked on my About Me page.

If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.

Fun With KQL – Case

Introduction

In my previous post Fun With KQL – IIF, we saw how to use the Kusto iif function to check for a condition then perform an action based on the result of a condition.

What if you had multiple conditions you need to check? While you could string multiple iif functions together there’s better solution: the KQL case function.

Before we take a look at case, you should know that the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

If you’ve not read my introductory post in this series, I’d advise you to do so now. It describes the user interface in detail. You’ll find it at https://arcanecode.com/2022/04/11/fun-with-kql-the-kusto-query-language/.

Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.

Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.

Case Basics

The case function works by checking a condition, and if true executing the code associted with it. It then checks the next condition, and runs the code associated with it, and so on. When it runs out of conditions, it will execute the final set of code.

This is much easier to understand with a good example, so let’s take a look.

As with so many of the samples in this Fun With KQL series, we start by piping the Perf table into a where to limit the dataset to % Free Space. We then take 100 rows for a small dataset for this demo.

Now we flow into an extend, which creates a new column FreeLevel. We use the case function to get its value.

As the first parameter to case, we pass in a condition, CounterValue < 10. If this condition evaluates to true, it will return the value in the second parameter. In this case it is a simple string, Critical, but we could have done a calculated value.

If the condition is false, in other words CounterValue is greater than or equal to 10, it moves onto the next parameter.

The third parameter is another condition, CounterValue < 30. If true, it executes the code in the fourth parameter, returning the text string Danger. If false, it moves on to the next parameter.

The fifth parameter is CounterValue < 50. Again, if true it returns Look at it. If false, it moves on to the next parameter.

The seventh and final parameter is not a condition, but a value. Because of this, the case treats it as an else. It will execute this final parameter and return its value.

Short Circuiting

Its important to understand the case uses what is known as short circuit logic. When it finds the first positive match, it returns the value associated with it, then stops checking.

For example, lets say the CounterValue had been 20. It would have checked the CounterValue < 10, said "nope no match", then moved on. It would then do the CounterValue < 30 check, and said "ah, 20 is less than 30, so we have a match!". It would then execute the code associated with it, in this case returning the string Danger.

At this point it will stop checking. The CounterValue < 50 will never be checked, nor the else portion of the case function.

Using Case to Summarize

The case function can become useful when combined with the summarize operator.

Here our query is similar to the previous one, except our text is a bit more descriptive.

We then pipe this into a summarize, where we get a count of computers for each of the FreeLevel items. As you can see in the output, the "You’re OK…" level has 98,816 computers in this category. The "Look at it…" level has 2,834 computers.

Now you have an idea of the state of health of your infrastructure, and can decide if you need to do more examination of these 2,834 computers. For example you could run queries, using techniques from previous posts, to get a list of computers you may want to look at.

Number of Conditions

In the previous examples we used three conditions as well as the else parameter. You can have less conditions, or even more. A common use of case is to convert a month number into a month name, as you can see in this example.

In the first extend, we use the datetime_part function to get the number of the month. We do it here, so we only have to run the function once, and not for every line in the case statement.

We run through a series of conditional checks, comparing the new MonthNumber column to a hard coded value. When it finds a match, it returns the three character abbreviation for that month.

At the end of the case is the else part, where the value Unknown is returned. In theory, in this situation it should never fail to find a match, however the case function always expects to have an else parameter.

After the case, we pipe into the project. In the project we use strcat to combine the new MonthName with a space. Then, format_datetime is used to get the day number and the year number for the TimeGenerated column, resulting in dates like April 23, 2022.

See Also

The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.

Fun With KQL – Extend

Fun With KQL – Project

Fun With KQL – Summarize

Fun With KQL – Take

Fun With KQL – Where

Fun With KQL – Format_DateTime

Conclusion

In this post we saw how the case statement can be used to evaluate multiple conditions and return a value. We also learned how it short circuits during its decision making process.

The demos in this series of blog posts were inspired by my Pluralsight courses Kusto Query Language (KQL) from Scratch and Introduction to the Azure Data Migration Service, two of the many courses I have on Pluralsight. All of my courses are linked on my About Me page.

If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.

Fun With KQL – IIF

Introduction

Pretty much every query language, indeed any coding language, has a form of an if statement. An if statement evaluates a condition for true or false, then takes an action based on that result.

In KQL, this is manifested using the iif function. In this post we’ll see two examples of how an iif can be used in your Kusto queries.

If you recall from previous posts in the series, the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

If you’ve not read my introductory post in this series, I’d advise you to do so now. It describes the user interface in detail. You’ll find it at https://arcanecode.com/2022/04/11/fun-with-kql-the-kusto-query-language/.

Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.

Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.

IIF Basics

In our first example, we’ll examine the free disk space counter. If the value is under 70%, we want to display a message to the user that they should look at that computer.

As we’ve done many times in this series, we start with the Perf table, and filter the results to only rows with the % Free Space counter. We’ll then grab just 100 using the take operator to keep a small sample set for this demo.

Next, we use an extend to create a new column FreeState. Then we call our iif function.

The first parameter is the condition. In this case, CounterValue < 70. If this evaluates to true, the statement in the second parameter is executed. In this case, it returns You might want to look at this.

Should the condition evaluate to false, the statement in the third parameter is executed. Here, it returns the string You're OK!.

In the results pane, you can see the counter values, along with the FreeState column. You can verify that the iif statement functioned correctly.

Evaluating Dates with IIF

Let’s do something a little more useful with iif. We’ll examine the current time, and if the TimeGenerated is more than one hour hold, we’ll apply a weighted factor to our counter value. Sometimes we assign more importance to some data points, for example ones that are more recent may have more importance than older values. Weighted factors are a way of achieving this.

In this case, if the counter value is more than one hour hold we’ve decided it is not as important as more recent entries. Thus we’ll use a weighted factor of 0.5, in other words we’ll divide the value by 2. If the counter is within the current hour, we’ll multiply by 2, indicating it’s data is more important.

We start with our good old Perf table. We’ll use a where to limit the results to our % Processor Time counter where its value is greater than zero.

Next we use take to limit the result set. We then use an extend to capture the current time into the new RightNow column. We do this so the value will be consistent across the rest of our query, otherwise you could introduce a subtle bug.

From here we fall into a second extend with our iif statements. In the first, we use datetime_part to get the hour of day from the TimeGenerated column. We call on datetime_part again to get the current hour from the RightNow column we just created.

If they are equal we know this row of data was recent, within the current hour, so we want to apply a double weight to it. For this demo query, we’ll illustrate it by returning the text Double Weight. Else we return Half Weight. This will make it easy to identify which rows should be double and which half.

Now we fall to a second iif. The condition is the same as the previous iif. For a positive result though, we take our CounterValue column and multiply by 2. In the else portion we divide by 2. (We could have also multiplied by 0.5, and produced the same result.)

Looking over the results pane, you can see we had some rows with a double weight. Looking at the adjusted value we can verify it is indeed twice the original counter value. Likewise we can see the half weight rows are indeed half the counter value.

Note there are other methods we could have used for comparison. Rather than seeing if the TimeGenerated was in the same hour of the day as the current time we could have used some date/time arithmetic to see if the TimeGenerated occurred within the last sixty minutes.

See Also

The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.

Fun With KQL – DateTime Arithmetic

Fun With KQL – DateTime_Part

Fun With KQL – Extend

Fun With KQL – Project

Fun With KQL – Take

Fun With KQL – Where

Conclusion

This article showed a few uses for the iif function. First we used it to return a text string based on a condition. Then we performed one of two calculations based on the iif condition and returned the result.

In the next article we’ll look at another way to make decisions based on a condition, the case function.

The demos in this series of blog posts were inspired by my Pluralsight courses Kusto Query Language (KQL) from Scratch and Introduction to the Azure Data Migration Service, two of the many courses I have on Pluralsight. All of my courses are linked on my About Me page.

If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.

Fun With KQL – DateTime_Part

Introduction

In the previous post, Fun With KQL – Format_TimeSpan, we saw how to extract pieces of a timespan datatype. In this post, we’ll see how to extract parts of a datetime datatype using the datetime_part function.

A timely reminder, the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

If you’ve not read my introductory post in this series, I’d advise you to do so now. It describes the user interface in detail. You’ll find it at https://arcanecode.com/2022/04/11/fun-with-kql-the-kusto-query-language/.

Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.

Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.

DateTime_Part Basics

The datetime_part function is quite easy to use. The first parameter is a value that indicates which part of the date to extract. The second parameter is a datetime column you want a part of.

In this example I grabbed the Perf table and used take to get 100 random rows, in order to keep this sample small. I then use project to create new columns using datetime_part to get each part of a date.

In the output I expanded the second row so you could see all of the different parts.

These are the valid values for the first parameter to datetime_part.

Valid Values
year
quarter
month
weekofyear (or week_of_year)
day
dayofyear
hour
minute
second
millisecond
microsecond
nanosecond

Summarizing on a DateTime Part

One of the more useful things you can do with datetime_part is create summaries based on some common time factor. In this example, we’ll get the hour of the day for the TimeGenerated column, then see how many items appear for each hour of the day.

Here we get the Perf table for all rows in the last 7 days via the where operator.

Next, the extend operator is used to create a new column, HourOfDay. The datetime_part function is used to get the hour of the day from the TimeGenerated column.

We now pipe into a project and only pass on the new HourOfDay column.

From here we move into the summarize operator, we get a count which we name PerfCount, based on the new HourOfDay column. Finally we sort the output.

We now have counts for the number of entries into the Perf table broken down by the hour of the day.

This kind of query allows you to spot for time of day issues. Perhaps there is a big upsurge in entires in the 5 o’clock hour, you can look for causes and try to mitigate them.

In this case, the distribution of values seems pretty equivalent across the 24 hours (the rest of the hours are towards the bottom just off screen) so you can be assured the load is even throughout the day.

See Also

The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.

Fun With KQL – Count

Fun With KQL – Extend

Fun With KQL – Extract

Fun With KQL – Project

Fun With KQL – Sort

Fun With KQL – Summarize

Fun With KQL – Take

Fun With KQL – Where

Conclusion

In this article we saw how the datetime_part function can be used to extract portions of a column that is of a datetime datatype. We then saw how to summarize based on the date part we extracted, in this case the hour of the day.

The demos in this series of blog posts were inspired by my Pluralsight courses Kusto Query Language (KQL) from Scratch and Introduction to the Azure Data Migration Service, two of the many courses I have on Pluralsight. All of my courses are linked on my About Me page.

If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.

Fun With KQL – Format_TimeSpan

Introduction

In my previous post, Fun With KQL – Format_DateTime, we showed how to use the format_datetime function to reformat a date to your liking.

If you refer back to my post Fun With KQL – DateTime Arithmetic, you may recall that when you perform any math with datetime data types, such as subtracting one date from another to get the amount of elapsed time, the resulting data type is a timespan.

In this post, we’ll see how to format timespan data types using the format_timespan function.

A timely reminder, be aware that the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

If you’ve not read my introductory post in this series, I’d advise you to do so now. It describes the user interface in detail. You’ll find it at https://arcanecode.com/2022/04/11/fun-with-kql-the-kusto-query-language/.

Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.

Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.

Format_TimeSpan Basics

In this first example, we’ll look at some of the simple, basic ways to format a timespan using the format_timespan function.

First, we take the Perf table and get the last three days of data. Then we use a take to grab just a few rows for this sample.

Next we use the extend operator to do some datetime arithmetic, to see how far in the past the TimeGenerated column was from the time right now. This results in the TimeGen column, which is a timespan datatype.

We use the project operator to grab a few relevant columns, then we have four entires to demonstrate the format_timespan function.

The d character is the place holder for the number of days. For each d provided it will display the days. If there are more ds than there were days, it will pad the day value with leading zeros. For example, if the timespan held 33 days, and you used dddd in the function, the result would be 0033.

Next come the placeholders for hours, minutes, seconds, and sub-seconds. These behave just like they do with format_datetime. Below is a chart with the list of allowed formatting characters.

Time Placeholder Result
d Days, number of d’s indicates how many 0s to use
H Hour, 1 to 23 (24 hour clock)
HH Hour, 01 to 23 (24 hour clock)
m Minute, 0 to 59
mm Minute, 00 to 59
s Second, 0 to 59
ss Second, 00 to 59
f Sub-second
F Sub-second for non-zero vales

Note that the 12 hour clock format of h (lowercase h’s) is not valid. This function measures the number of elapsed hours, not the time of the day.

There’s also an assortment of separator characters we can use to separate the individual parts of our timespan, namely the days, hours, minutes and so on.

Separator Character
/ Forward Slash
Dash
: Colon
, Comma
. Period
_ Underscore
[ Left square bracket
] Right square bracket
Space

Format TimeSpan Examples

Below is a screen shot with a query where I created many variations using format_timespan.

You may wish to open up this image in a new browser tab so you can easily read it. Because there were so many columns I expanded the second row so you could easily reference all the different formats.

Note that these are not set in stone. I randomly picked a variety of formats, using different characters and separators. You can adapt these to any format you need.

In the extend I used the endofweek function to get the difference between the TimeGenerated and the end of the current week. Honestly, for this example, I did this just so we could get some non zero values in the day position of our output. You could use this method though with any of the startof... or endof... functions.

Composing a Sentence using Format_TimeSpan and StrCat

In a future post I’ll go more into the strcat function, but for now just understand that strcat takes all of the values passed in as parameters, and composes them into a single string.

See Also

The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.

Fun With KQL – Between

Fun With KQL – DateTime Arithmetic

Fun With KQL – EndOf

Fun With KQL – Extend

Fun With KQL – Format_DateTime

Fun With KQL – Project

Fun With KQL – StartOf

Fun With KQL – Take

Fun With KQL – Where

Conclusion

In this article we saw how to use the format_timespan function to create custom formats for the timespan datatype, typically the result of datetime arithmetic.

Our next post will continue the discussion of datetime functions within the Kusto world.

The demos in this series of blog posts were inspired by my Pluralsight courses Kusto Query Language (KQL) from Scratch and Introduction to the Azure Data Migration Service, two of the many courses I have on Pluralsight. All of my courses are linked on my About Me page.

If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.

Fun With KQL – Format_DateTime

Reformatting dates is very common when querying data. You may, for example, wish to come up with a more compact date than the default KQL returns. Perhaps you work for an international company. In some countries you may want the date in month/day/year format, while other countries would prefer day/month/year.

Fortunately Kusto provides the format_datetime function that allows you to do just this. We’ll take a look at it in this article.

Be aware that the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

If you’ve not read my introductory post in this series, I’d advise you to do so now. It describes the user interface in detail. You’ll find it at https://arcanecode.com/2022/04/11/fun-with-kql-the-kusto-query-language/.

Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.

Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.

Format_DateTime

The format_datetime function is very simple to use.

The function requires two parameters. The first is the column to be formatted, which must be of a datetime datatype.

The second is a string, holding the format using placeholders, which I’ll cover more momentarily. In the above output I formatted the TimeGenrated column using a variety of formats.

Since it is a bit hard to read, here it is with one of the rows expanded.

Format_DateTime Placeholders

There are many placeholders, as shown in the tables below.

Date Placeholder Result
d Day, 1 to 31
dd Day, 01 to 31
M Month, 1 to 12
MM Month, 01 to 12
y Year, 0 to 9999
yy Year, 00 to 9999
yyyy Year, 0000 to 9999
Time Placeholder Result
h Hour, 1 to 12 (used with AM/PM)
hh Hour, 01 to 12 (used with AM/PM)
H Hour, 1 to 23 (24 hour clock)
HH Hour, 01 to 23 (24 hour clock)
m Minute, 0 to 59
mm Minute, 00 to 59
s Second, 0 to 59
ss Second, 00 to 59
f Sub-second
F Sub-second for non-zero vales
tt am/pm

With the above placeholder syntax, a single letter will result in a single digit. Two characters results in a two digit value.

For example, with one d, the first day of the month is returned as 1. On the other hand, with two dd, the first day of the month is returned as 01.

Note that the lower case h for hours should be used with tt, to get 12 hour dates in AM/PM format. Using upper case H will return the time in 24 hour format.

The f for sub-seconds will return the amount of sub-seconds for the number of fs you supply. Here’s an example:

Value Format Result
22.016 f 22.0
22.016 ff 22.02
22.016 fff 22.016
22.016 F 22
22.016 FF 22.02
22.016 FFF 22.016

If you use a lower case f, it will place a 0 if the value is zero. With an uppercase F, it will suppress the character if it is zero.

Separators

There are several characters you can use as separators when formatting your datetime data.

Separator Character
/ Forward Slash
Dash
: Colon
, Comma
. Period
_ Underscore
[ Left square bracket
] Right square bracket
Space

Separating Date and Time

Another use for format_datetime is separating a datetime column into two parts, a date column and a time column.

Here I created a new column using project, TheDate, and used format_datetime to set it to "yyyy-MM-dd" format. I also created another column, TheTime, and formatted it.

As you can see in the output, we now have the TimeGenerated column broken into two columns, one for the date part and the second for the time.

If I really needed to, I could have broken this down further. A column for year, another for month, and so on.

In this example I created the calculated column using project, I could have opted to do the same thing using an extend if that made more sense for my query.

See Also

The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.

Fun With KQL – Extend

Fun With KQL – Project

Fun With KQL – Take

Conclusion

The format_datetime function isn’t very complicated, but it is very flexible and useful. You can adapt the dates and times in your results to any format you want. Additionally it can be used to break a datetime column into its distinct parts.

The demos in this series of blog posts were inspired by my Pluralsight courses Kusto Query Language (KQL) from Scratch and Introduction to the Azure Data Migration Service, two of the many courses I have on Pluralsight. All of my courses are linked on my About Me page.

If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.

Fun With KQL – Parse_JSON and ToDynamic

Introduction

In order to compact data, JSON is often used to store information within columns in databases. JSON is also useful when you want to store data, but the exact data is different on a row by row basis. Using JSON prevents the need to have a lot of extra columns that will be blank for many of the rows.

It becomes important then that a query language provides a simple method for unpacking that JSON data into useful columns. The Kusto Query Language provides that ability through the use of the parse_json scalar function. In this post we’ll look at examples of how to use it to expand data stored in JSON format.

Originally, parse_json was called todynamic, and the older todynamic function name still works. Both functions work and behave identically. In this post we’ll use the newer parse_json name, but be aware that for any example you could use the older todynamic name and get identical results.

The parse_json samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

If you’ve not read my introductory post in this series, I’d advise you to do so now. It describes the user interface in detail. You’ll find it at https://arcanecode.com/2022/04/11/fun-with-kql-the-kusto-query-language/.

Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.

Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.

Parse_JSON Basics

In order to use this function, we’ll need a table that has a column with JSON formatted data. For that we’ll be using the InsightMetrics table, which has a column named Tags.

Here is a sample piece of data stored in the column. Be aware that not every row in this table has the same format for its JSON data. In addition the samples in the code below may not match the screen shots exactly.

For this example I’ve broken the JSON code into multiple lines to make it easier to read. If you look at the JSON columns inside the results pane you’ll just see one long string of text.

 {
   "vm.azm.ms/mountId":"C:"
 , "vm.azm.ms/diskSizeMB":129545.99609375
 }

Now let’s see how to use parse_json in our Kusto query.

In order to properly use parse_json, each row in the dataset must have a consistent format for its JSON formatted column. So for this example we’ll take the InsightMetrics table, then use where to restrict the rows where the Name column is FreeSpaceMB. The Tags column for FreeSpaceMB rows have the required consistent JSON formatting.

From here we use take to get a small sample for our demo (no sense in taking time to sift through 30,000 rows, the default maximum number of rows KQL will return in the user interface, for a simple demo). Next we flow into a project to just get the few columns we need.

Now we use the extend operator to create a new column, ExtProps. We use the parse_json function, and pass the Tags column into it. This will decompose the JSON data into the new ExtProps column.

In Kusto terms, they specify the new column as a datatype of dynamic. If you are familiar with PowerShell, this is actually created as a hash table, a list of key/value pairs. (If you aren’t familiar with PowerShell, you may want to take a look at my Fun With PowerShell) series.

Once we’ve unpacked the JSON into the ExtProps column, we flow into another project. The first three columns are a repeat of our original three.

Then we have our ExtProps column. In the first one, inside the square brackets [] we pass in the key we want, here vm.azm.ms/mountID. This will return the value associated with this key. If you refer to the JSON example above, this would be C:.

In the next ExtProps entry, we use the second key, vm.azm.ms/diskSizeMB. If you refer to the example above, this will return 129545.99609375.

In the image above, the last two columns are the unpacked JSON data.

More JSON columns

What if our JSON data was more than just two columns? Let’s take a look at an example with four key/values. As before I’ve expanded this to multiple lines to make it easier to read here, in the database it is all one long string.

{
  "container.azm.ms/clusterId":"/subscriptions/ebb79bc0-aa86-44a7-8111-cabbe0c43993/resourceGroups/CH1-GearamaRG/providers/Microsoft.ContainerService/managedClusters/CH1-GearamaAKS"
, "container.azm.ms/clusterName":"CH1-GearamaAKS"
, "hostName":"aks-nodepool1-18153691-vmss000000"
, "name":"sdb1"
}

We’ll again use the InsightMetrics table.

We use a where to limit the Namespace to container.azm.ms/diskio, the data associated with this Namespace has a JSON format that is consistent and resembles the four key/value pairs shown above.

We again use the same parse_json, passing in the Tags column.

In the project I simply reference the ExtProps column four times, passing in the keys for each one. I also used the project ability to rename these columns into something more human friendly.

Note, there is nothing special about the name ExtProps. I could have used any name I wanted, like TagProps, MyKeyValues, or even ArcaneCodeIsAwesome.

Case is Important!

Be aware, KQL is case sensitive when it comes to the JSON keys. When I was first developing these samples, I accidentally keyed in HostName = ExtProps["hostname"] then spent a few minutes scratching my head wondering why I wasn’t getting back any data for the HostName.

I finally realized I should have used hostName, and not hostname for the key (the N in Name is upper cased in the JSON). Once I got the casing correct, the query worked like it should have.

Property Notation

If, and only if, your JSON key has no spaces or special characters such as /, \, and so on, you can use what is called property notation.

On the last two lines of the query, I was able to use ExtProps.hostName and ExtProps.name instead of needing to use the square brackets. This makes for a bit less typing and better readability. Note the case rules still apply with property notation, I needed to use hostName just as it was in the JSON.

For the ClusterID and ClusterName I still needed to pass the JSON keys within square brackets as they had special characters in the name.

Expanding Rows

This is a good time to point out a feature of the user interface. In the results pane at the bottom, to the very left of each row you’ll see a sideways triangle. If you click it, it will expand the row to display the columns under it vertically.

In this screen shot I clicked on the triangle on the second row (highlighted by the little red box in the image). This then expanded the row to display each column of data in a row under it. For results where many columns are returned, this can make it much easier to read the result for a specific row.

But wait, there’s more!

In the above screen shot, you may have noticed there is another arrow beside the Tags column. The interface figured out that the data contained in Tags is a dynamic datatype with JSON data.

Behind the scenes it used parse_json to unpack it, and break it into the individual columns (shown on rows under the Tags).

This is a useful tool for development. With this you can easily verify you are using the correct keys for writing your KQL code.

See Also

The following articles, operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.

Fun With KQL – Extend

Fun With KQL – Project

Fun With KQL – Take

Fun With KQL – Where

Fun With PowerShell

Conclusion

Finding JSON data stored in a database column is becoming more and more common today. It is fortunate that Kusto provides an easy to use way of extracting that data using the parse_json function. Remember for this to work in a query, each row must have a consistent format for its JSON.

I also want to add a final reminder, there is a version of parse_json named todynamic. Both functions work and behave identically, and can be used interchangeably. It’s important to remember this as you’ll find queries written both ways. When writing your queries though, you should prefer the use of the newer parse_json name.

The demos in this series of blog posts were inspired by my Pluralsight courses Kusto Query Language (KQL) from Scratch and Introduction to the Azure Data Migration Service, two of the many courses I have on Pluralsight. All of my courses are linked on my About Me page.

If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.

Fun With KQL – Between

Introduction

It’s not uncommon to want to use a range of values when creating a Kusto query. This might be a range of numeric values, or perhaps a range of dates.

Kusto provides this ability using the between operator. In this post we’ll see how to use it when authoring your Kusto queries.

Just between you and me, the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

If you’ve not read my introductory post in this series, I’d advise you to do so now. It describes the user interface in detail. You’ll find it at https://arcanecode.com/2022/04/11/fun-with-kql-the-kusto-query-language/.

Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.

Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.

Between for Numeric Ranges

Let’s say you want to grab records from the Perf table for the percentage of free disk space. But you only want to see it when the CounterValue is greater than or equal to 70 percent, up to 100 percent.

You could write the where operator like where CounterValue >= 70 and CounterValue <= 100, but that gets a bit wordy. The between operator is not only easier to read, but is more efficient.

We start with the Perf table, then use a where to limit the dataset to only rows that hold the % Free Space counter data. We then use a take operator, to keep the demo sample small.

Now comes the star of this post, the between. Using a where, you specify the column name. Then comes the between operator. In parentheses, you place the starting value, two periods, then the ending value. Here, we go from 70.0 up to 100.0. This will include all rows with the starting value, 70.0, as well as rows with the ending value of 100.0.

In the results, you can see only rows in our specified range were included in the results. As I mentioned, the between is not only easier to read, it can be more efficient than using the where CounterValue >= 70 and CounterValue <= 100 construction I mentioned earlier.

Between for Date Ranges

In the introduction I mentioned you could use between for more than just a numeric range. It can also be used with dates.

In the where operator we used hard coded dates of April 12 to April 18, 2022. We had to put the dates in YYYY-MM-DD format. Because these are strings, we needed to use the datetime function to convert from a string to a datetime datatype.

In the result pane you can see all of our dates fell into the specified range.

Between Dynamic Dates

Hard coded dates may be OK for a quick, impromptu query, but for Kusto queries you’ll want to run repeatedly having dynamic dates makes more sense.

In this example we used ago(7d) to go back one week, with an ending date of right now by using the now function. If you want to read more about these two functions see the See Also section below for links to past articles I’ve written about them.

Not Between

Sometimes it makes more sense to write your Kusto query to exclude a range. For that, KQL includes a not between operator.

In this example we used !between to exclude counter values in the range of 0.00 up to and including 69.999. This produced the same results as our first example.

See Also

The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.

Fun With KQL – Ago

Fun With KQL – DateTime Arithmetic

Fun With KQL – EndOf

Fun With KQL – Now

Fun With KQL – Project

Fun With KQL – StartOf

Fun With KQL – Take

Fun With KQL – Where

Conclusion

Using the between operator allows you to not only write more readable queries, but ones that are more efficent.

The demos in this series of blog posts were inspired by my Pluralsight courses Kusto Query Language (KQL) from Scratch and Introduction to the Azure Data Migration Service, two of the many courses I have on Pluralsight. All of my courses are linked on my About Me page.

If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.

Fun With KQL – EndOf

Introduction

My post, Fun With KQL – DateTime Arithmetic, demonstrated how to do date / time math in Kusto. Then in my previous post, Fun With KQL – StartOf, we showed how to use the StartOf... functions, to calculate the beginning of time periods based on a date that is passed in.

If you’ve not read the previous post I’d highly encourage you to do so now, as there are parallels between it and the endof... functions we’ll discuss in this article, namely endofday, endofmonth, endofyear, and endofweek.

The samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

If you’ve not read my introductory post in this series, I’d advise you to do so now. It describes the user interface in detail. You’ll find it at https://arcanecode.com/2022/04/11/fun-with-kql-the-kusto-query-language/.

Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.

Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.

EndOf Basics

Let’s start with a simple query. We’ll take the TimeGenerated column from our query and add a new column to display the end of that day.

We took our Perf table and piped it into the where operator to get all rows since the start of the current year. That in turn was piped into a take, to grab just a few rows for this demo.

Now our function comes into play. We use the extend operator to create a new column, DayGenerated. This is set equal to the endofday function, into which we pass the TimeGenerated column.

Finally we use a project to limit the columns to the results pane a the bottom.

In the results pane you can see the two date columns, TimeGenerated and DayGenerated. The both show the same date, but the time in the DayGenerated column will always be 11:59:59.999 PM. This time is always returned for an endof... functions return value, whether you use endofday, endofmonth, endofyear, or endofweek.

Summarize by End of Day

Let’s turn to common use for the endof functions, the ability to summarize for a time period.

As in the previous example we piped Perf into a where to limit to data from the start of the year. Then we used extend to add a new column which is the end of the day for the TimeGenerated column.

We then used project to narrow down the list of columns. From here we used the summarize operator to create a count of the number of entires into the Perf table based on the new DayGenerated column and the name of the Computer.

Summarize by End of Month

We can also create summaries for the end of each month.

This query is the same as the one in the previous end of day section, except we use the endofmonth function. In the output you can see, for the computer AppBE00.na.constosohotels.com, we get the last day of each month from January 2022 until April 2022. Note it is April 2022 as I write this, so April is the most recent month we have data for.

Summarize by End Of Year

I’m sure you are getting the hang of this by now, even so let’s see a few more examples. Here, we’ll summarize by the end of the year.

In the where clause of this query, I used startofyear( now(-768d) ). I showed this technique in the previous post Fun With KQL – StartOf. Passing in a negative value into the now function subtracts that many days from the current date.

As there are 365 days in a year, we double that to get 768. It’s April 21, 2022 as I write this, thus it would return April 21,2020. The start of that year was January 1, 2020. Thus the where clause will return all rows where the TimeGenerated is greater than January 1, 2020.

In this case it doesn’t make a great deal of difference, since the sample data doesn’t go back that far, but I wanted to take this chance to remind you of the techniques to return several years worth of data in your query, especially as doing year end summaries would often be done jointly with returning multiple years of data.

Summarize by End Of Week

As our last example, lets see the endofweek function used alongside the summarize operator.

This query works as the others, except of course we use the endofweek function. In the results pane you can see the WeekGenerated column has the last day of each week, by default a Saturday. This is in contrast to the startofweek function, where weeks start on Sundays.

See Also

The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.

Fun With KQL – DateTime Arithmetic

Fun With KQL – Extend

Fun With KQL – Project

Fun With KQL – Sort

Fun With KQL – StartOf

Fun With KQL – Summarize

Fun With KQL – Take

Fun With KQL – Where

Conclusion

In this post we saw how to use the endof... series of functions to calculate period ending dates dynamically. We then saw how to use the various endof... functions to group data into discrete time buckets using summarize.

The demos in this series of blog posts were inspired by my Pluralsight courses Kusto Query Language (KQL) from Scratch and Introduction to the Azure Data Migration Service, two of the many courses I have on Pluralsight. All of my courses are linked on my About Me page.

If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.