My third course in the Pluralsight Kusto Query Language Learning Path has just been published: Kusto Query Language: Scalar Operators. This course is the third in Pluralsight’s Kusto Query Language Learning Path.
The Kusto Query Language has a rich set of scalar operators. Operators that can be used to transform your data, making the output of your queries easier to read.
In this course, we cover some of the most used Kusto Query Language scalar operators, which will enable you to author effective queries right away.
Some of the major topics that are covered include:
Work with dates and times, including datetime math as well as formatting.
Logic branching
Working with strings
Advanced techniques for working with column data
By the end of the course, you’ll have enough information about scalar operators to write queries with more readable output.
Kusto Query Language: Previous Courses
If you have not seen my previous two courses in the series, you should probably watch those first if you are not familiar with KQL.
In the second Pluralsight course, Kusto Query Language: Beginning Operators, I cover some of the most useful of the KQL operators. Using this course you can write effective queries in order to do most of your data retrieval.
The newest course, which I mentioned above, shows how to take your output and format it nicely for a wider audience.
No Pluralsight Subscription? No Problem!
Every good IT organization should have a place for it’s staff to continue their education, and stay up to date with the latest IT subjects. Pluralsight is a great choice, it can be done on demand, has a HUGE catalog of courses, is constantly being updated, and the courses are done by leading IT professionals.
Talk to your boss about it, you’d be amazed at how many things you can get in life just by asking (using a persuasive argument of course).
While waiting for your organization to get approval, there is a way to watch for free. Just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.
Conclusion
Just a side note for those who subscribe to my posts. Normally I publish on Mondays, but this post was a bit late this week due to wrapping up the course. I’ll probably take next week off of my KQL series and resume the week after.
I hope you enjoy the courses! And stay tuned to the blog, as I will be continuing my ongoing series Fun With KQL.
Let me tell you about let, my favorite operator in the Kusto Query Language. Why my favorite?
It is extremely flexible. It lets you create constants, variables, datasets, and even reusable functions. Let me tell you, it’s very powerful.
Before I go further, let me say the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.
Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.
Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.
A Quick Note on Casing
If you have been following my posts in this Fun With KQL series, you’ll have noticed that operators and functions in KQL are all in lowercase. Table and column names though are almost always in mixedcase, although when users create their own data clusters they can use whatever casing they wish. In mixed case, the first letter is capitalized, then letters of distinct words are also capitalized. Some examples are Perf, AppRequests, and AppAvailabilityResults.
As mentioned in the intro, let allows you to create variables to hold variable data, constants, and functions. To keep these distinct from other KQL items I use what is known as camel case. With camel casing, the first letter is in lower case, but any different words after it are placed in capitals. Some examples you’ll find later in this post are timeDiffInDays, usageData, and counterName.
Using camel case immediately lets me identify a name as being created from a let statement and not an operator or table / column name. The use of camel case is not a requirement, but it is a common practice among many of us who use the Kusto Query Language.
Using Let to Create a Constant Value
Using let in its most basic form is very simple. You simply use let, followed by the name of the variable you want to create. Next comes an equal sign, and the value you want to assign to it. The let must end in a semicolon (;).
You can have multiple let statements before your query, as you see in this example. Then within the main query, you use the variable name as you would any column name, table or operator.
Here, we set two variables, minCounterValue and counterName. Placing these values in constants like we do makes them easier to change between runs of the query. For example, perhaps the first time we run the query it returns too many rows, so we can easily change the minimum counter value to say 100 to reduce the number of rows we will need to work with.
This is especially useful when we use the values in multiple places in the query. Here they are used in the strcat operator. Then, they are both used in the where operator. This is a simple query, but you can see how useful it is to be able to quickly change values in one quick setting.
Allow me to point out one other thing, a reminder really, that Kusto is case sensitive. CounterName is a different object than counterName. It is this case sensitivity that allowed us to use where CounterName == counterName in our query.
Be careful when doing this, sometimes it’s too easy to miss the difference in casing when quickly looking over a query. For that reason I suggest avoiding situations such as using CounterName and counterName in the same query. I did it here so I could demonstrate the concepts.
Using Let to Create a Calculated Value
You can also use let to create a value based on a calculation. Here we are doing something very simple, and setting the startDate to twelve hours ago. This makes it very easy for us to alter the calculated value between executions of the query.
Of course you could create far more complex calculations, I kept it simple for this demo.
Also keep in mind, at the time the let statement executes, the main query has not yet executed. Thus you won’t have access to any of the columns in your query.
There’s a way around this though, through the use of functions which we’ll see in the next section.
Creating a Reusable Function with Let
If you want to create a reusable calculation, but need to work with data from your dataset, you’ll need to create a function.
We start by creating the name of the function then an equal sign: let timeDiffInSeconds = .
Next we have parenthesis that enclose the list of parameters we want to use. You can have as many parameters as you need, or none at at all. In our function we need two.
For each parameter we need to indicate the name we want to use inside the function, then a colon, and the datatype for the parameter.
Here we named the parameters date1 and date2, then the colon followed by the datatype. In this case we specified datetime for both of them: (date1: datetime, date2: datetime).
Following the list of parameters we then define the function. The function is enclosed in squiggly braces { }. The function is pretty simple, we take the date2 parameter and subtract the date1 parameter. The result is divided by one second, so the time difference will be returned in seconds: { (date2 - date1) / 1s }.
Here we could fit our function into a single line, but you can use any number of lines you need as long as it is still valid code. For example we could have done:
{
(
date2 - date1
)
/ 1s
}
To be honest this is an example that makes the code a lot harder to read, and if you ever see production code from me that looks like this it’s a clear indicator I’ve been kidnapped by space aliens and am asking for help. But here, it serves to illustrate the point that my function can spread out over multiple lines.
Using the let defined function is simple. In the project we simply create a new column name, ElapsedSeconds and assign to it the function we created in the let, here timeDiffInSeconds. As parameters we then pass in the StartTime and EndTime. This results in:
As you can see in the output, we have our ElapsedSeconds column which shows the difference in seconds between the start and end times.
Functions with Default Values
It’s also possible to supply a default value for the last parameter in your list. Before I go on, let me warn you this is an undocumented feature. The online help for the let statement makes no mention of defaults, so USE THIS AT YOUR OWN RISK. I cannot predict how Microsoft may alter this feature in the future.
I discovered this when I created the first version of the Kusto course for Pluralsight around 2018. For it I was working closely with Microsoft on the content and it was included in the samples they wanted to use. I’m not sure why it is now undocumented, they may plan to discontinue, or change its behavior. So let me say one more time, use at your own risk.
In the list of parameters after the datatype for the last parameter we use an equal sign, followed by the default value.
In this example, if we do not pass in a value for the second parameter it uses the default value supplied. Here, date2: datetime = datetime(2023-01-01) will return January first of 2023 when no date is supplied.
For this example we also altered the function to return the time difference in days instead of seconds.
Looking at the last line of the query, you can see we used ElapesedDaysSinceStartOfYear = timeDiffInDays(TimeGenerated) and only passed in one value, the TimeGenerated.
Right above that we had a line, ElapsedDays = timeDiffInDays(EndTime, StartTime). In this line we passed in two values. Instead of using the default it instead used the value in StartTime. This gave our function a lot of flexibility.
Note default values do have one issue, you can’t use dynamic values such as ago(20d). That’s why we needed to hard code the January first date.
Creating Useful Functions
Back in my post Fun With KQL – Case, I showed how to use a case statement to retrieve the name of the month based on the month number that was used.
This is a useful piece of code, so I created a function out of it. I have a file full of these useful pieces of code. Putting them in functions makes them easy to reuse.
In this function I simply create one parameter, the monthNumber. It will return the text associated with the month number passed in. For more on how the case works, refer back to my Fun With KQL – Case post.
I then added a second useful function, getNiceDate, which uses string concatenation to assemble a nice date. For more on the strcat and format_datetime operators see the See Also section of this post for links to the associated blog posts I’ve done.
In my query I used a couple of extends to call my functions then add their output to my query. I could have done away with the extends and just embedded the calls the functions right inside my project. I just thought separating it out made it a little easier to read.
As you can see in the output, I have both the MonthName and NiceDate columns with the nicely formatted data.
Using a Let to Hold a Dataset
The let operator can also hold a dataset, often referred to as a datatable in Kusto.
In the second let statement, I simply provide a name to hold the dataset, here usageData. I then supply the query needed to get the data.
In the final line I simply use the variable usageData to send the data to the output pane.
Right now this may not seem very useful, but in upcoming posts on join and union you’ll see how to use this functionality. For now just remember it as it will prove useful soon.
See Also
The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.
In this post we explored the versatile let operator. We first saw how to use let to hold constant values. Next we learned how to hold variables using let.
From there we saw how to create reusable functions, in my humble opinion lets greatest power. Finally we saw how to use let to hold a dataset, also called a datatable. We’ll explore this last capability more in upcoming blog posts.
The demos in this series of blog posts were inspired by my Pluralsight courses on the Kusto Query Language, part of their Kusto Learning Path.
If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.
The take_any function is a random row generator. Based on the parameters passed it, it will select a random row from the dataset being piped into it. It also has a variant, take_anyif, we’ll see both in this post.
Note that take_any was originally called any and was renamed. While any still works, it has been deprecated and you should now use take_any.
Any and all of the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.
Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.
Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.
Take_Any Basics
Like other functions we’ve covered so far, we will need to use summarize in order to execute take_any. In this example, we will pass an * (asterisk) into the take_any parameter. This will cause all columns for a random row to be returned.
Each time you execute this you should get a different row back from the piped in dataset. Note there are more columns that appear off screen to the right.
Take_Any For A Column
With take_any, you can also pass in a specific column name instead of using the *.
As you can see, it returns a random value from the column passed as the parameter.
Take_Any With Multiple Columns
take_any can also work with multiple columns. Just pass each column you want as a parameter, and it will return the values from a random row that contains the columns you requested.
Here, we passed three columns from the Perf table into take_any. It returned a random row with the three columns. We could have used more columns or less, according to our needs.
Returning Random Multiple Rows Based On A Column with Take_Any
You can return multiple rows from take_any. To do so, you can add by then the column name after the take_any as you can see in this example.
Here we passed an * to get all columns, then we follow with by CounterName. KQL will get a list of unique CounterNames then return a random row for each one.
Take_AnyIf
The take_anyif variant of take_any operates like other if variants we covered recently, maxif, minif, and sumif.
We pass in the name of a column in the first parameter, then the second parameter is a condition. In this case, the row that will be picked randomly must have a CounterName of % Free Space.
In the results, you can see it grabbed a random computer name from the Computer column where the CounterName for that row had a value of % Free Space.
take_anyif does have a few limitations compared to take_any. First, you cannot pass in an * and get all columns. Second, you can only enter a single column. It does not support passing in multiple columns.
See Also
The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.
In this post we learned how to use take_any to grab a random row. We saw how to return all columns, a single column, or multiple columns. In addition we saw how to use take_anyif to grab a random value conditionally.
The demos in this series of blog posts were inspired by my Pluralsight courses on the Kusto Query Language, part of their Kusto Learning Path.
If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.
In my previous post, Fun With KQL – Max, MaxIf, Min and MinIf, we looked at the aggregation functions max and min. In this post we’ll talk about another aggregation function, sum. We’ll also look at a variant of it, sumif.
I should mention the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.
Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.
Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.
Sum Basics
The sum function is straightforward. It creates a total for the column passed into it.
In this demo we’ll take the Perf table, and filter it with a where operator to only include rows where the CounterName is Free Megabytes.
We then call summarize, which is needed if you want to use an aggregation function on its own. Finally we use sum and pass in the column we want to total, CounterValue.
In this example, we result in a total of 6,398,287,032.
Including the Sum As A Column
We can include the sum function as a column in the output. To do so we need to include a by as part of the summarize so the data will be grouped correctly, in this case by the CounterName.
In the output you can see each CounterName on a row, along with the grand total for its CounterValue.
SumIf
The sumif uses two parameters. The first is the column to aggregate. The second is the condition which, if true, causes a value to be included in the summation process.
In the example below, the CounterValue will only be included if the CounterName equals Free Megabytes.
Other Uses for SumIf
We can use sumif just as we did maxif and minif. In fact, lets extend an example from the previous blog post on Max, MaxIf, Min and MinIf and add sumif to our output.
Here we added two new columns with our summed values.
You can use sum and sumif like you do max, min, maxif, and minif. Refer back to the Fun With KQL – Max and Min for other examples.
See Also
The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.
We learned about sum and sumif in this post, seeing how they can be used. They can be used to return a single grand total, as well as be included as a column in the output of queries.
The demos in this series of blog posts were inspired by my Pluralsight courses on the Kusto Query Language, part of their Kusto Learning Path.
If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.
The max and min aggregation functions are common to almost every language, and the Kusto Query Language is no exception. As you would think, when you pipe in a dataset max returns the maximum value for the column name you pass in. Likewise min returns the lowest value.
In addition, there are variants for each, maxif and minif. We’ll see examples for all of these in this post.
The samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.
Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.
Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.
Max
The max function is easy to use. In this example we use summarize to call an aggregation function, in this case max.
Here we can see the maximum CounterValue in the Perf table where the CounterName was Free Megabytes was 236,999.
Using Max in Other Places
You can use max in many other places in KQL. As one example, refer back to my previous post Fun With KQL – Top-Nested. Instead of the count or sum aggregations we used in the post, we could have also used max.
In this example, we used the max function to rank our top nested values.
MaxIf
There is a variant to max called maxif. It allows us to include a condition in the second parameter such that in order for the value to be considered for max, the condition must be true.
In this example, we use CounterValue in the first parameter, then we put the condition CounterName == "Free Megabytes" in the second parameter, thus restricting the search for a maximum value to only rows with Free Megabytes in the CounterName.
As of now, you may not see much difference between using the combination of where and max versus the maxif. In a moment we’ll see another way to use maxif, but for now, lets move onto min.
Min
The min function can be used like max, except it returns the lowest value in the column indicated.
Here, we found the minimum CounterValue in the dataset that was passed in was 34.
Again, like max, the min aggregate function can be used in many places in KQL, like the Top-Nested operator.
MinIf
min also as an alternate version, minif. Just like maxif, you pass the column name as the first parameter and the condition in the second parameter.
Since it is so similar to maxif we’ll skip a detailed look at it for now, but we’ll show an example of it momentarily.
Max and Min as an Output Columns
It’s possible to include max and min as output columns in your query. In this example we used summarize to calculate the max and min values, giving them better names.
Using by CounterName will group the summarized values by their CounterName, and include the CounterName column in the output.
I used the in operator to limit to just two CounterNames, but you could include all of them, or your own set.
MaxIf and MinIf as Columns
In the previous example, we saw how to create columns to hold min and max values in the output. It had one drawback though. The values were for one of the two CounterNames we limited the results to. There was no way to distinguish which CounterName these values reflected.
This is where the maxif and minif aggregate functions come into play.
In this example we create four columns using the summarize operator. For each column we use either maxif or minif to create a value for just the CounterName we want.
maxif and minif can be used in other places as well. Think back to the previous example with top-nested. Perhaps we were only interested in a handful of ObjectNames.
The second line of the query could have been written:
| top-nested 3 of ObjectName
by ObjectMax = maxif( CounterValue
, CounterName in ("CounterName1", "CounterName2")
)
This functionality would really let us hone in on just the data we need.
See Also
The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.
In this post we saw how to use the aggregate functions min and max. First we saw how to use them with summarize to return a specific value, then saw how to use them with part of another query. We also saw their alternates, maxif and minif.
The demos in this series of blog posts were inspired by my Pluralsight courses on the Kusto Query Language, part of their Kusto Learning Path.
If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.
Back in June of 2022 I covered the top operator in my Fun With KQL – Top post. We showed how to create your own top 10 lists, for example what were the top 5 computers ranked by free disk space.
What if you needed your top results in a nested hierarchy? For example, you wanted to know which three objects in the Perf table had the most entries? But, for each one of those, what were the three counters with the most entires?
That’s where the top-nested operator comes in. It allows you to create top lists in nested, also called hierarchical levels.
Before we begin our discussion on top-nested, you should be aware that the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.
Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.
Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.
Top-Nested Basics
In the example below, we call the Perf table, and pipe it into our first top-nested operator. We then tell it how many items we want, in this case 3. Next, we tell it what we want three of, here we say we want the top 3 from our ObjectName column.
Next, after the by we indicate how we want to determine which ObjectName‘s are the top 3. Here we use the count aggregation, and storing that result in a new ObjectCount column.
OK, we’ve now setup the uppermost level for our top hierarchy. Now we need to tell the query what should be in the nested level. To do that, we use a comma, then a second top-nested operator.
Into the second top-nested we again say we want 3 items, this time from the CounterName column. Again we’ll rank these by the count aggregation, storing the result in the CounterNameCount column.
Finally we sort the results by the counts in descending order.
In the results, the first item was the Process object name, with 2,262,619 entries. It only had one CounterName associated with it, % Processor Time, which is why you only see one row for the Process object.
In second place is the ObjectName of LogicalDisk, with 1,286,540 entries. It had three counters associated with it. Of these, the Disk Read Bytes/sec took top place with 116,968 rows. In close second was Disk Bytes/sec with 116,965 entires in the Perf table. In third place for LogicalDisk was Disk Writes/sec.
The ObjectName that came in third was the K8SContainer, and you can see the three CounterName values associated with it.
Now that you’ve seen it in action, the top-nested operator is pretty simple to use and understand. Just tell it how many items you want, what item you want, and what aggregation you want to use to rank them.
Multiple Levels for Top-Nested
You can have many nested levels, in this next example we’ll use three levels of nesting.
Here we decided to get the top 5 of each level, and we went three levels deep, ObjectName, CounterName, then InstanceName. We could have gone even deeper, we just need additional top-nested operators for each level of our hierarchy.
Also note we decided to sort by the names of our various objects instead of the counter totals. This is a design decision that can be made by the ultimate end user of your query, and will be dependant on their needs.
Additionally, while I used 5 at every level, this isn’t a requirement. I could have used top-nested 3 at the ObjectName level, then top-nested 5 at the CounterName level, and perhaps top-nested 10 at the InstanceName level. Again, this can be determined by the needs of your end user, and Kusto is flexible enough to handle those needs.
Using Other Aggregations
So far we’ve been using the count aggregation for our needs. We can use any aggregation type we need to with top-nested. Take a look at this example.
Here, we used the sum aggregation, summing up our CounterValue column in order to determine the rank within our top-nested hierarchy.
We could have used other aggregations, such as min or max (both of which we’ll see in the next blog post), or any of the many aggregations available in Kusto.
All Others
Sometimes it’s just as important to know what wasn’t included in your top list. The top-nested operator gives us this capability through the with others capability.
In this example, on the very top of the hierarchy you can see we’ve added with others = "All Other Objects" between the column to rank and the by. It is the with others that tells top-nested to aggregate the values not included in the final top list and display those results as well.
In the output, you see a row for "All Other Objects", this is the count for all the objects that were not in the top list.
The text All Other Objects was of my own choosing. I could have used any text here, like Not in the top list, Stuff not on top or Better luck next time.
Note that when determining the value for other it used the same aggregation function as the top-nested. Here we used count, but it could have been sum or whichever aggregation function we used.
Others at All Levels
In the previous example we only included others at the top level. We can use it at all levels if we wish. We’ll harken back to our original example with two top-nested levels, and include a others for each level.
In the second row, you can see K8SContainer, and All Other Counters in a sublevel, followed the the top 3 values for the CounterNames in the K8SContainer.
All Others at Sublevels Only
As you’ve seen many times in this Fun With KQL series, the Kusto Query Language is very flexible. This allows us a way to have others appear only at the sublevels, as you can see in this demonstration.
Here we only included with others at the second level of our nest. Do note this resulted in an extra row where the ObjectName is empty.
See Also
The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.
Getting a hierarchy of top items is a common business need. As you saw in this post, top-nested allows you to accomplish this easily. It also includes the ability to include the other values not included in the top-nested list.
The demos in this series of blog posts were inspired by my Pluralsight courses on the Kusto Query Language, part of their Kusto Learning Path.
If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.
Business Analysis is becoming mainstream in today’s corporate world. A big part of that analysis is done with pivot tables. Think of an Excel spreadsheet where data is organized into rows and columns.
The pivot plugin will take one data column from your query, and flip it to become new columns in the output data grid. The other column will become the rows, and an aggregation function will be at the cross section of the rows and columns, supplying the main data. You’ll get a better understanding through the demos in this post.
You may be wondering "plugin? What’s a plugin?"
Microsoft created a set of language extensions called plugins. These are functions which add extra capability to the Kusto Query Language. Plugins are invoked using the evaluate operator. pivot is one of the many plugins, we’ll look at more in upcoming posts.
The samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.
Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.
Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.
Countries
Our ultimate endpoint for this demo is to get the count of requests, by the name of the request, for each country. In this demo we’ll be using AppRequests table, so lets begin by getting a list of countries.
For these demos we’ll start by using the AppRequests data, and filtering where the success was false, and the column containing the countries, ClientCountryOrRegion, is not empty.
To get our list of countries we’ll simply use distinct, then sort by the country column.
We’ll be using these countries as the columns in our output data grid.
Counts
Next, lets get the data that will be the basis for our pivot. We’ll project just two columns, Name and ClientCountryOrRegion. The Name column contains the type of request that was made, typically a GET or PUT request.
Now we call the summarize operator to get a count of the unique combination of Name and ClientCountryOrRegion and store it in RequestCount.
This does give us the data we need, the number of requests for each request type (the name) in each country. This data though, isn’t easy to read. For example, it’s difficult to compare the number of requests between countries.
We can solve this problem by using the hero of this blog post, the pivot plugin.
Pivot Basics
In this query, we get our data and project just the two columns we need. We then pipe this into the evaluate operator. As stated earlier, we have to use evaluate in order to call a plugin. Here, we follow the evaluate operator with the name of the plugin to call, in this case pivot.
Into the parameter of pivot we pass in the name of the data column from our query we want to become the column in the data grid. By default, the column not passed into pivot becomes the rows.
At the intersection of rows and columns is the aggregation function used by pivot. While there are several you can use (and we’ll see examples momentarily) by default if you don’t specify one count() is used.
After the call to evaluate pivot... we call sort, so the Name will be listed in ascending alphabetical order.
If we look at the first row, we see that "GET .aws/credentials" had one call, from the country of Russia. Looking further down, we can see "GET Employees/Create" had 1,355 calls, all in the United States.
Pivoting On A Different Column
The AppRequests table has a column ItemCount. What we want to accomplish with this next query is summing up the ItemCount value and have that be at the intersections of our pivoted table.
We need to make two modifications to the previous query. First, we have to add ItemCount to our list of projected values in order to use it in the pivot.
Next, we need to add a second parameter to the pivot plugin. We specify the type of aggregation we want to use, in this example sum. Then into the sum parameter we pass in the value to aggregate on, ItemCount.
In this case, at the intersection of the Name and ClientCountryOrRegion is the summed up ItemCount.
Other Aggregations
The pivot plugin supports many aggregations. The list includes min, max, take_any, sum, dcount, avg, stdev, variance, make_list, make_bag, make_set, and the default of count.
Be aware that these aggregations are used in many places in Kusto beyond just pivot. Over the course of these Fun With KQL blog posts we’ll be devoting posts to many of them.
Additional Columns In The Output
Sometimes you need more than just one column on the rows. The pivot plugin supports that in two ways. First, if you pipe multiple columns into it, all of them except for the pivot column you pass in are returned.
The second way is to pass the column names in as additional parameters, as you’ll see in this example.
Here, for the column to pivot on we once again used ClientCountryOrRegion, and our aggregation is sum of the ItemCount. Then we begin passing in the columns to display.
Be aware you need to pass in all the columns you want, if you omit any they won’t be in the results. In this example, we added Name and AppRoleInstance as the third and forth parameters to the pivot plugin.
I then used project-reorder, mostly for fun, you can read more about it in my Variants of Project blog post. Finally we call sort to sort the output.
While we could omit the columns as parameters and just use the default behavior of displaying everything passed in, explicitly listing the columns in the pivot parameters makes the query self documenting. It’s clear that we wanted these exact columns in the output. For that reason I prefer to list the columns as parameters.
See Also
The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.
If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.
Back on April 25, 2022 I did a blog post on the where operator, Fun With KQL – Where. In that post, I covered several functions that can be used with where to limit the results of a query. This list includes: startswith, endswith, has,hasprefix, hassuffix, and contains.
All of these had one thing in common, they were case insensitive. Kusto ignored the case of the text passed in. If you passed in the text BYTE, for example, Kusto would match on BYTE, Byte, bYtE, byte and other combinations.
There are versions of these which are case sensitive. We’ll see a few here, focusing on the contains keyword. In addition there are not versions, which will also be demonstrated.
There is another operator we’ll discuss here, in. It is a bit of an odd duck, in that it is case sensitive by default. We’ll see it and its variants later in this post.
The samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.
Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.
Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.
Preface
For this section of the post I’m going to use contains for the demonstrations. Be aware everything I discuss on contains also applies to the other operators mentioned in the Introduction. Each has the same variants as contains.
Case Insensitive Contains
First, let’s take a look at the normal version of contains.
Briefly, we get the Perf table and grab three columns, TimeGenerated, CounterName, and CounterValue. This is then piped into a where, in which we use contains to look for rows in the CounterName column with the text BYTES.
In the results, you can see names like Available MBytes, Free Megabytes, and Bytes Sent/sec, to name a few. In this case the case of the text passed into contains, all uppercase, was irrelevant to the match.
Making Contains Case Sensitive
There is an alternate version of contains, contains_cs. Let’s rerun the above query, using the _cs version.
This invokes contains_cs, the case sensitive version of contains. It and will look for an exact match of the text that was passed in. In this case it looked in CounterName and found no records with BYTES in all caps, so in the results at the bottom shows "No results found…".
The other keywords I mentioned in the Introduction also have case sensitive versions, just add _cs to the end of their names.
Not Contains
In addition to the case sensitive versions of these commands, there is also a not version, as in "not contains". To invoke the not version of contains or any of the operators place a ! (exclamation mark, also called a "bang") on the front, as in !contains.
In the results you will see all rows as long as the word Bytes is not in the **CounterName column.
Note that the !contains is case insensitive. The casing of the value passed in, in this example Bytes didn’t matter.
Not Contains Case Sensitive
It probably won’t come as a surprise, but there is a case sensitive version of not contains. You simply combine the ! on the front and append the _cs to the end of the command. Here we do it with contains, using the command !contains_cs.
In this example we used !contains_cs to look for rows that did not contain the exact text BYTES in all caps. As you can see in the output, it returned rows including some that had the word Bytes, but it was in mixed case. No rows with upper case BYTES were returned.
In
In the introduction I mentioned there is one keyword that had the rules revered when it comes to case sensitivity, and that is in.
The in looks for values in a list that you pass into the in parameters. Let’s take a look.
We use where on the CounterName, then call in. In the parameters we pass in three values, three strings which we want to find rows with these values. The results show matches on some of the text that was passed in, there were many more rows not show on the screen capture.
One thing to note, by default in is case sensitive, unlike the other commands mentioned in this post. in looked for exact matches, including case, with the values we passed in.
Case Insensitive In
To make in case insensitive, you append a ~ (tilde) after the keyword, in~.
In this example, you can see it found matches for the values passed in, even though the case of the text passed in did not match the case in the results.
Not In
The in has a not version that works like the other operators. Place a ! (exclamation mark / bang) before it.
In this version of the query, !in returned all records except for ones in the list passed into the !in.
Also note we took advantage of the flexibility of the Kusto Query Language formatting and put each parameter on its own line.
Like a regular in, !in is case sensitive by default. There is a version, as you might expect, that can be case insensitive.
Not In Case Insensitive
To call the not and case insensitive version of in, simply combine the ! and ~: !in~.
With this example, it omitted matches of the three values passed in, regardless of the case of the text in the Perf table.
In Parameters
I just wanted to point out that with in there is no limit to the number of parameters. in can handle 1, 2, 3, 4, or more.
In addition, while these examples used strings in and its variants can also work with numeric values, for example where CounterValue !in (0, 42, 33, 73).
See Also
The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.
In this post, we covered how to make certain operators case sensitive as well as use the not versions of them. While we focused on contains, the same methods also apply to startswith, endswith, has,hasprefix, and hassuffix.
We then looked at the in operator and how it differed from the others when it comes to case sensitivity.
The demos in this series of blog posts were inspired by my Pluralsight courses on the Kusto Query Language, part of their Kusto Learning Path.
If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.
In my previous post, Fun With KQL – DCountIf, we saw how you could apply a filter to the data directly within the dcountif function.
You may have been thinking gosh, it sure would be nice if we could do that within the count function! (Note, you can read more about count in my Fun With KQL – Count blog post.)
Well fear not intrepid Kusto coder, there is just such a function to fit your needs: countif.
The samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.
Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.
Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.
Count – A Refresher
Before we tackle countif, let’s refresh ourselves on how count works. Here’s a simple example.
We take our Perf table and pipe it into a where operator to only get back rows where our CounterName column has the word Bytes in it. Note for this example the contains doesn’t care about case, it’ll match Bytes, bytes, BYTES, and so on.
Next we flow into a summarize, which uses the count function to sum up by the CounterName column. Finally we do a sort to make the output easy to read.
CountIf Basics
Now let’s see how to do the same thing with the countif function.
First, we omitted the where since the filterning will be done in the countif.
Next we use summarize, and this time set a column name of RowCount to hold the value returned by countif.
In the countif function we pass a parameter, the expression we want to use for filtering. Here we use the same CounterName contains "Bytes" we used with the where statement in the previous example. We could have used any expression though; looked for an exact match using ==, used in with a list, or any other valid Kusto expression.
Finally, we used sort to sort the output.
Notice something different about these results as compared to the previous example. Here, countif included row counts with 0 values. In contrast, the count operator removes all rows with zeros before it returns the data.
No Zeros Here!
If we want the output of countif to match what count produces, all we need to do is add a where to suppress rows with zero values.
Here all we needed was to use where RowCount > 0 to remove those rows, making the output of countif match count.
No Guesswork Here
Just to be clear, countif, unlike dcountif, returns an exact value, not an estimate.
See Also
The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.
The countif function can provide a streamlined way to filter our data when we need accurate row counts. We just need to keep in mind it will return data with zero counts, which may be a benefit if your goal is to discover rows where the count is zero. Otherwise you just need to remember to filter those zero value rows out.
If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.
In the previous post of this series, Fun With KQL – DCount, we saw how to use the dcount function to get an estimated count of rows for an incoming dataset.
It’s common though to want to filter out certain rows from the count. While you could do the filtering before getting to the dcount, there’s an alternative function that allows you to do the filtering right within it: dcountif.
Note if you haven’t read the previous post on dcount, I’d advise taking a quick read now as we’ll be building on it for this post.
The samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.
Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.
Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.
DCountIf Basics
The dcountif function is almost identical to dcount, except it allows for an extra parameter, as you can see in this sample.
Here we are using in to see if the EventID column is in the list of values in parenthesis. We could have used any number of comparisons, for example using == to look for a single value, !in for not in, match, startswith, and many more.
In this result set, only rows whose event IDs were in the list of values are included.
DCountIf Accuracy
Just like dcount, the dcountif function returns and estimated count. You can pass in a third parameter with an accuracy level to use, these are the same as in dcount.
Accuracy Value
Error Percentage
0
1.6%
1
0.8%
2
0.4%
3
0.28%
4
0.2%
Let’s see an example of it in use.
Here we use a value of 0, which is the least accurate but fastest. As with dcount we can use values 0 to 4 to get the best balance of speed and accuracy for our needs. By default dcountif will use an accuracy level of 1 if it is omitted.
You can see the Fun With KQL – DCount post for a more extensive discussion on the topic of speed versus accuracy.
See Also
The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.
In this post we saw how dcountif can be used to get an estimated distinct count, but also allow you to filter out certain rows from the count, all with a single function.
If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.
In an earlier post in this series, Fun With KQL – Count, you saw how to use the count operator to count the number of rows in a dataset.
Then we learned about another operator, distinct, in the post Fun With KQL – Distinct. This showed how to get a list of distinct values from a table.
While we could combine these, it would be logical to have a single command that returns a distinct count in one operation. As you may have guessed by the title of this post, such an operator exists: dcount.
The samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.
Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.
Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.
SecurityEvent
Before we begin, let me mention for this post we’ll move away from the Perf table and use a new one, SecurityEvent. This table is just what it sounds like. Every time an object in your Azure instance, such as a server, has a security related event it gets logged to this table.
The SecurityEvent table has data a bit better suited for demonstrating the dcount function. Plus by this point, assuming you’ve been following along in the Fun With KQL series, you’re probably tired of looking at the Perf table.
A Refresher on Distinct
As dcount is a combination of distinct and count, let’s take a moment to refresh ourselves on them. We’ll start with the distinct operator.
Distinct returns a single entry for the columns indicated, no matter how many times they occur in the dataset. Here’s a simple example, where we want a list of distinct values for the combination of the EventID and Activity columns.
Taking a look at the first row, we have the event ID of 4688 and activity value of "4688 – A new process has been created.". This combination of values could occur once, or one million times in the SecurityEvent table. No matter how many times this combination appears, it will only show up once in the dataset produced by distinct.
Combining Distinct with Count
In the opening I mentioned we can combine distinct with count to get a distinct count. In the example below, we’ll pipe our SecurityEvent table into a where to limit the data to the last 90 days.
Be aware this query is a little different from the previous one. Here we are getting a distinct set for the combination of the computer name and the event ID. In this result set a computer will have multiple events associated with it. Here is some example data that might output from the distinct, to illustrate the point.
Computer Name
EventID
WEB001
4668
WEB001
5493
WEB001
8042
WEB001
5309
SQL202
0867
SQL202
5309
The result of our distinct is piped into the summarize operator. In the summarize we’re using the count to add up the number of EventID entries for each computer.
Finally we use a sort to list our computer names in ascending alphabetical order.
Our first entry, AppBE00.na.contosohotels.com, had 20 events associated with it. The last entry, AppFE0000CLG, only had 9 security related events.
DCount Basics
While we got our results, it was quite a bit of extra work. We can make our code much more readable by using dcount.
In the example below, we’ll replace the two lines of code containing distinct and count, and condense it into a single line.
Here we use a summarize, followed by our dcount function. Into dcount we pass a parameter of the column name we want to count, in this case the EventID. We follow that with by Computer to indicate we want to sum up the number of distinct events for each computer name.
Speed Versus Accuracy (Don’t Skip This Part!!!)
There’s one important thing you have to know, and that is the count function can be slow. It has to go over every row in the incoming dataset to get the count, in a big dataset that can take a while.
The dcount function is much faster because it uses an estimated count. It may not be perfectly accurate, but will execute much faster.
Whether to use it is dependant on your goal. If you are trying to uncover computers with large numbers of events, it may not matter if AppBE00.na.contosohotels.com had 20 events, 19, or 21, you just need to know it had a lot (especially compared to other servers) so you can look at them.
On the other hand if you are dealing with, for example, financial data, you may need a very accurate value and hence avoid dcount in favor of the distinct + count combination.
Adjusting the Accuracy Level
The dcount function supports a second parameter, accuracy. This is a value in the range of 0 to 4. Below is a table which represents the error percentage allowed for each value passed in the accuracy parameter.
Accuracy Value
Error Percentage
0
1.6%
1
0.8%
2
0.4%
3
0.28%
4
0.2%
An accuracy level of 0 will be the fastest, but the least accurate. Similarly, a value of 4 will be the slowest, but most accurate.
When the accuracy parameter is omitted, the default value of 1 is used.
Using the Accuracy Parameter
Here is an example of using the accuracy parameter. We’ll set it to the least accurate, but fastest level of 0.
As you can see, within the dcount after the EventID parameter we have a second parameter. We pass in a 0, indicating we want the fastest run time and will settle for a lesser accuracy.
Here is an example where we want a little better accuracy than the default of 1, and are willing to accept a longer query execution time.
As you can see, we are passing in a value of 2 for the accuracy parameter.
What Accuracy Value to Use?
So which value should you pick? As stated earlier, that’s dependant on your dataset and the goal of your query. If you are just taking a quick look and are OK with a rough estimate, you can use a lower value. Alternatively you can bump it to a larger value if you aren’t satisfied.
The best thing you can do is experiment. Run the query using each of the five values (0 to 4) and look at the results, deciding which best suits your needs for your particular goal.
See Also
The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.
In this post we learned how the dcount function can return a value faster than the combination of distinct plus count, although it may not be as accurate.
We then saw how we could adjust the accuracy level used in the dcount function, and got some advice on how to choose a level that balanced your need for speed with accuracy.
If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.
Often we want to get data that is relative to other data. For example, we want a list of computers that have free space that is greater than the free space of other computers. We need to set a threshold, for example we want to return results where the free space is greater than 95% of the free space on other computers.
To do this, Kusto provides the percentile operator, along with its variants percentiles and percentiles_array.
One hundred percent of the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.
Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.
Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.
Percentile Basics
In this first example, we’ll use the percentile function to return a list of computers who have more than (or equal to) 90% of the free space of the other computers in the Perf table.
We take the Perf table and pipe it through a where to restrict the results to the Available MBytes counters.
This is piped into a summarize, where we employ the percentile function. In the first parameter we pass the column with the value to analyze, in this case it is the CounterValue column. In the second parameter, we pass in the threshold value, in this case 90.
Finally we use the by clause of summarize to indicate how we want to summarize the results. Here, we want to summarize by the ComputerName column.
In the results, we see a list of computers from the Perf table whose Available MBytes values are greater than or equal 90 percent of the other computers in the Perf table.
Do note, the Perf table actually represents a table of performance counter entries, so strictly speaking this isn’t totally accurate data. However we’ve been using the Perf table throughout this Fun With KQL series, so it will do for this example.
Percentiles Basics
The percentile function works fine for a single percentage, but what if you wanted to find values across a range of percentages? For example, you want to return values for 5, 50, and 95 percent?
Rather than having to run three different queries, Kusto provides a percentiles function so you can return multiple values at once.
The query is almost identical to the previous one, except we are using percentiles instead. As before the first parameter is the column to analyze. Next, we have multiple values to use for our percentile calculations.
Here we used three, 5, 50, and 95, however we could use just two, or more than just three.
At the end a sort by was used to order the output by the name of the computer.
In the output you see three columns for each computer, reflecting the Available MBytes values for 5, 50, and 90 percent.
Renaming The Output Columns
In the previous example the default column names that the percentile function output were rather, well ugly to put it bluntly. We could improve on it by using an operator we’ve seen before, project-rename.
Our query is identical to the previous, except the sort by was replaced with project-rename. (The sort could have been retained, I simply removed it to make the example a bit simpler.)
If you recall my post Fun With KQL – Variants of Project, all we have to do is list each new column name we want to use, then after the equal sign the existing column to assign to it.
Our new names are a lot better, but we can streamline the rename process even more. The summarize operator allows us to rename when we make the call.
After the summarize operator we list each new column name we want to use for the output in parenthesis. As you can see, the output used the new column names we provided.
Multiple Levels of Percentiles
In the previous example, we used three percentiles, however this is no limit. In this next example we’ll bump it up to five.
Here we used the same technique as the previous sample, except we have more percentile values. As you can see, we also took advantage of KQL’s flexible layout to make the query easier to read.
Percentiles As An Array
There may be times when we want the percentiles returned in an array instead of columns. For that there’s an alternate version of the percentiles function, percentiles_array.
The first parameter passed into the percentiles_array function is the column we’re evaluating, here CounterValue. The remaining parameters are the percentile values to use. Here we used our original three, but we could have used as many as we needed.
We could call on our old friend mv-expand (covred in the post Fun With KQL – MV-Expand) to turn the expand the array into rows.
See Also
The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.
If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.
In the previous article, Fun With KQL – Make_Set and Make_List, we saw how to get a list of items and return them in a JSON array. In this article we’ll see how to break that JSON array into individual rows of data using the mv-expand operator.
Before we expand our KQL knowledge, be aware that the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.
Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.
Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.
A Reminder – Make_Set
Before we look at mv-expand, let’s take a quick reminder of make_set from the previous post.
Here Perf was piped into a where operator to limit the results.
We then used make_set to get a list of all the computers from the data that was piped in. It created a JSON array and stored it in the new Computers column. The make_set function created a list of unique computers, so each one from the dataset being piped in only appears once in the JSON array, no matter how many times it was in the incoming dataset.
MV-Expand Basics
Having a JSON array is nice, but what if we really want a dataset of individual rows, where each item from the JSON array appears in a row? As you may have guessed by now, the mv-expand operator can do this for us.
We take the same query as before, and pipe it into the mv-expand operator. We specify the column holding the JSON array.
From here, mv-expand does its thing, and converts each item in the JSON array into individual row. It uses the same name as the original column for the new on, Computers.
As you can see, mv-expand can be very useful for transforming JSON columns into easily usable rows.
See Also
The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.
This post explored the useful mv-expand operator. With it you can extract the contents of a JSON array and pivot them into individual rows in a dataset. We also saw how it works nicely with the make_set and make_list functions.
If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.
In previous posts, I’ve mentioned using certain functions and operators to investigate conditions in your system. Naturally you’ll need to create lists of those items, based on certain conditions.
For example, you may want to get a list of the counters associated with an object. Or, you may want to get a list of computer where a certain condition is met.
In this article we’ll see how to get those lists using the Kusto make_set and make_list functions.
The set of samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.
Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.
Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.
Older Names – MakeSet and MakeList
Before I start, let me mention there are older versions of these functions, makeset and makelist. They were renamed to fall in line with revised Kusto naming standards, adding the underscore into the name.
While the old versions still work, you should use the newer version of the names in case Microsoft should phase them out in the future.
Make_Set Basics
For our first example, let’s see how to get a set of items, associated with another item. In this query, we’ll get a list of counter names associated with an object name.
We take the Perf table and pipe in into the summarize operator. A new column name is declared, Counters. We then use make_set, passing in the CounterName column. After the by, we use ObjectName.
This will result in Counters holding a JSON array of CounterNames associated with an ObjectName.
If you look at the output, the second row for the ObjectName of Memory as been expanded. In the Counters column you see a JSON array holding two values, Available MBytes and Available MBytes Memory.
Simply put, the Memory object has two counter names associated with it, Available MBytes and Available MBytes Memory.
Making a Set Based on a Condition
A second, and slightly more useful way to use make_set is to get a list of items where a condition is true.
In this example we again turn to the Perf table. We use a where operator to limit the results to our % Free Space counters where its value is greater than 95 (i.e. 95%).
As before, we go into a summarize operator, creating a new column Computers. We call make_set and pass in the Computer column.
Note that for this query we didn’t use the by portion. In this case, make_set takes the data in the Computer column creates a JSON array, as you can see in the output. This gave us a set of three computers who have more than 95% free space.
Make_List Basics
The second way to create these sets is the make_list function. It works almost identically to make_set, with one minor difference. Let’s see the query in action, and that difference will become clear.f
This query is identical to the one for make_set, except of course for using make_list. However, look at the results.
You’ll see the first computer, SQL01.na.contosohotels.com appears twice in the list. Likewise the computers that begin with SQL12 and SQL00 appear multiple times. And that’s just in little bit that is visible!
Now you can see the big difference, make_set creates a unique list of items. Each entry will only appear once in the JSON array. The make_list function performs no such de-duplication. If the item (in this case the computer name) appeared 100 times, it would be in the JSON array 100 times.
Crashing the User Interface
In the previous example, I attempted to click on the arrow beside the Computers in order to expand the list. The user interface came down with a bad case of "fall down go boom". It sat for a while, before just locking up on me.
I finally determined that the JSON array just had too many items to display. Fortunately, there is a way around this.
Both make_set and make_list accept an optional second parameter to indicate the maximum number of items to return.
In this make_list example, after the Computer column I passed in the value of 64. This will limit the number of items in the JSON array to sixty four items.
I could have used any number, honestly I picked 64 because I happened to glance over at my old Commodore 64 sitting on my desk and decided that would be a good number. Computer history is fun!
Now that I had limited my JSON array, I was able to expand the data in the results grid, and could see the duplicated values. Again, both of these functions support the use of the optional parameter, however you are more likely to need it with make_list.
Make_Set_If
In our first example for make_set, before calling it we had a where operator. Part of it limited the results to rows with a counter value greater than 95.
There is an alternative to make_set called make_set_if. With this function we can pass the condition in as a parameter.
Here we still used where to limit the data to the free space percentage counter. But as a second parameter to make_set_if, we pass in a condition of CounterValue >= 95.
We could have included both conditions by surrounding them with parenthesis, such as:
make_set_if(Computer, (CounterName == "% Free Space" and CounterValue >= 95))
However it turned out to be a more efficient to remove the non free space rows first.
And yes, in this version I did use greater than or equal to, instead of just greater than as I did originally, because why not?
Note that make_set_if also supports the parameter to limit the result set size. It becomes the third parameter, as in:
make_set_if(Computer, CounterValue >= 95, 64)
Make_List_If
There is also a make_list_if function.
It behaves like make_set_if, except for not removing duplicated values. In this example I added the third parameter to limit the size of the JSON array to 32 items.
See Also
The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.
In this post we saw how to use the make_set and make_list functions, along with their corresponding make_set_if and make_list_if functions, to get a list of values in a JSON array. These are useful functions for returning a list of items, such as computers, where some condition is true.
The next article in this series will focus on the mv-expand function, which can be used to take the JSON array created by make_set (or make_list) and convert it into rows.
If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.
A very common need in query languages is the ability to extract the maximum and minimum values in a column of data. The Kusto Query Language provides this capability through two functions, arg_max and arg_min. In this post we’ll take a look at these functions.
At a minimum, you need to be aware that the samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.
Note that my output may not look exactly like yours when you run the sample queries for several reasons. First, Microsoft only keeps a few days of demo data, which are constantly updated, so the dates and sample data won’t match the screen shots.
Second, I’ll be using the column tool (discussed in the introductory post) to limit the output to just the columns needed to demonstrate the query. Finally, Microsoft may make changes to both the user interface and the data structures between the time I write this and when you read it.
Arg_Max Basics
As its name implies, the arg_max function returns the maximum value within the column passed into it.
In this example, we are going to use the summarize operator to summarize by the CounterName. The value we’ll use in the summarize is the maximum CounterValue, determined using arg_max, for each CounterName.
The first parameter we pass into arg_max is the column we want to find the maximum value for. The second argument is the column or columns to be returned, besides of course the max value of the passed in column. In this example we use an asterisk to return all of the columns piped in from Perf.
We then go into a project, to limit the output to a few columns, then sort them. (In a moment we’ll see how to limit the output of arg_max so we don’t need the project.)
Note in the output it retained the name for the column we are getting the maximum value for, CounterValue. You should consider renaming this column in the output to a name that is more reflective of the true data, such as MaxCounterValue. This could make the output clearer to the end user of your query.
Arg_Max With Columns
In this second example, we have the same basic result as the first query. In this version though, we pass in the few columns we want back.
In addition to CounterValue, we’ll pass in TimeGenerated, Computer, and ObjectName.
You’ll notice in this version we no longer need the project operator to reduce the number of columns. That is taken care of in arg_max. By taking advantage of this feature, you can make your queries more compact.
Arg_Min Basics
The arg_min function behaves identically to arg_max, with the exception of course of returning the minimum value from the passed in column. You can use the asterisk to return all columns or specify columns to be returned.
As such we’ll just demonstrate the summarize version of our query, but you can replicate the query shown in the previous section by using arg_min instead of arg_max.
As you can see, the minimum counter value across most of the data was a zero.
See Also
The following operators, functions, and/or plugins were used or mentioned in this article’s demos. You can learn more about them in some of my previous posts, linked below.
In this article we saw how to perform a common task across query languages, obtaining the maximum and minimum values for a set of data. We did so using the arg_max and arg_min Kusto functions.
If you don’t have a Pluralsight subscription, just go to my list of courses on Pluralsight . At the top is a Try For Free button you can use to get a free 10 day subscription to Pluralsight, with which you can watch my courses, or any other course on the site.