What’s New in Splunk App for Microsoft Exchange v1.1

February 11, 2012, 1:05 pm

≫ Next: Quick Tips from .Conf 2012 – Microsoft Solutions

Following the successful release of the first Splunk App for Microsoft Exchange back in August 2011 we recently released an updated version. The Splunk App for Microsoft Exchange v1.1 contains over 100 community suggested improvements. The Splunk App for Microsoft Exchange v1.1 allows you to monitor server health, e-mail messages and users across your Microsoft Exchange 2007/2010 infrastructure. It’s available right now for download from Splunkbase.

Here are the top five features we added or improved in the new version.

1. New Feature – Technology Add-on for Blackberry Enterprise Server v5.03

This was a big deal at a lot of our implementations. In the corporate world, Blackberries still rule. You can now find out when a specified user last synchronized his or her email via the Blackberry Enterprise Server right on the user information dashboard. We’ve also added a throughput dashboard specifically for the Blackberry Enterprise Server.

2. New Feature – Service Health Monitoring

Each server has an Exchange PowerShell cmdlet called ‘Test-ServiceHealth’ that when launched, outputs the services that should be running and whether they are running or not. However, to use it, you normally have to run it by hand. By including this information in our health input, you can backtrack to determine exactly when a service died, and then look at other information – performance, windows event logs, etc. – to determine why it died.

3. Updated Feature – Security Dashboards

New Security dashboards show where external logons to OWA and ActiveSync are originating from (courtesy of the Google Maps add-on), and the new Anomalous Logons dashboard that not only tells you about failed logons, but also tells you when users are logging in from multiple countries or regions.

4. New Feature – Auditing the Administrator

If you have an Exchange 2010 infrastructure, then the Exchange service is monitoring what your administrators do on the system – right down to the underlying PowerShell cmdlets that are run by the UI. We allow you to search on anything – host, Administrator name, cmdlet name, and parameters. So, if you’ve ever wanted to know what was done to a particular mailbox, or who was running a particular cmdlet, this is the dashboard for you.

5. Updated Feature – Message Tracking

Message tracking is pushed to the edge of your organization by allowing you to include information from other (non-Microsoft) systems, such as Cisco Ironport or Sendmail Sentrion. This allows you to see if the message in question was quarantined by your anti-virus device, or blocked by your anti-spam device, for instance.

The updated App also incorporates significant community feedback on a range of features, including the following:

Normalize the way you reference users, messages, clusters, mailboxes and anything else that can have more than one reference. You no longer need to worry about how a user logs in or a message was addressed – it’s all the same to us.
Support for multi-master clusters. Larger environments have clusters that have three or more member servers, where multiple servers can be the “master” at a given time. We’ve cleaned up the references to clusters so that the information is easier than ever to read.
The Overview dashboard is now driven by real-time searches, so you can watch the rate at which your systems are processing messages.
Enhanced summary reporting allows you to see a monthly snapshot of what you Exchange infrastructure is doing and how much space your users are hogging.
Support for Splunk 4.3, so you can upgrade Splunk to the latest and greatest version.

We had a great set of beta customers for this product. Their diligence in providing concise feedback on their likes and dislikes (plus putting up with the occasional bug and letting me see their systems so the bugs can get fixed) has produced a solid release and well worth the upgrade.

If you run an Exchange infrastructure and haven’t tried Splunk App for Microsoft Exchange yet, try it out. Splunk App for Microsoft Exchange is free and can be downloaded from Splunkbase.

↧

Quick Tips from .Conf 2012 – Microsoft Solutions

September 17, 2012, 3:46 pm

≫ Next: Live from MEC

≪ Previous: What’s New in Splunk App for Microsoft Exchange v1.1

I’ve just got back from .Conf 2012 in Las Vegas, and it was a great conference. I had a great time and met some great customers. We had a booth in the Splunk Labs area demonstrating both the Splunk for Microsoft Exchange app and the Splunk for Microsoft Windows Active Directory app. We spoke to a lot of customers, many of whom were implementing the apps, and even more thought they should be implementing them after seeing the demo. We did two very technical sessions on best practices for deploying each app. We found that too many gigabytes give you a hangover. And yes the rumors are true, there was a monkey.

While at the booth and after the sessions, I answered some fairly common questions, so I’m going to start blogging a little more frequently to share those questions and of course my answers. My first one is this: “How do I alter the Splunk_TA_windows to log to winevents (as recommended) instead of main?”

The Splunk_TA_windows, also known as the Splunk Technology Addon for Windows, collects and parses common logs from Microsoft Windows hosts, such as the Windows Event Log for Security. The Splunk App for Microsoft Windows Active Directory also uses the Windows Event Log for Security to gather audit information, so it was a good idea to not duplicate effort here. Out of the box (or, in this case, as downloaded from Splunkbase) the Splunk_TA_windows stores these windows event logs in the default index, known as “main”.

The best practice for the Splunk App for Active Directory is to store these common windows event logs in a separate index – for example, “winevents” is used to store other AD related windows event logs and so would be an ideal place.

Splunk recommends the use of a Deployment Server to manage the apps pushed out to the forwarders, and so we will assume this best practice. In this case, the Splunk_TA_windows is stored on the deployment server in $SPLUNK_HOME/etc/deployment-apps, and we will be editing the endpoint files in this location.

Our process has two basic steps:

Configure Splunk_TA_windows to store events in the different index
Configure Splunk_for_ActiveDirectory to look for events in a different index

Yep – only two steps. Very straight forward. Let’s start with Step 1 – configuring Splunk_TA_windows to store events in the different index. To do this, create a file in $SPLUNK_HOME/etc/deployment-apps/Splunk_TA_windows/local called inputs.conf and add the following entries:

[WinEventLog:Security] index=winevents [WinEventLog:Application] index=winevents [WinEventLog:System] index=winevents

Save this file, then push out the changes with:

splunk refresh deploy-clients

The Windows Event Logs should now flow into the winevents index. Of course, you should make sure you have created the winevents index prior to pushing out, but if you have installed the Splunk_for_ActiveDirectory app, then that’s already taken care of.

Our second step is to configure the Splunk_for_ActiveDirectory app to look for these Windows Event Logs in the new index. To do this, we need to create a new file under $SPLUNK_HOME/etc/apps/Splunk_for_ActiveDirectory/local called eventtypes.conf, with the following contents:

[wineventlog-application] search = index=winevents source=WinEventLog:Application [wineventlog-system] search = index=winevents source=WinEventLog:System [wineventlog-security] search = index=winevents source=WinEventLog:Security

Save the file, and then refresh the server. You can do this simply (but slowly) by restarting through the Manager or with the “splunk restart” command-line version. Alternatively, you can log on to the web interface as an Administrator, then open another tab and browse to http://splunk-server:8000/debug/refresh - this will refresh the event types without restarting.

If you already have data within the main index, you can use “(index=main OR index=winevents)” in the search strings until the data is no longer useful. This will prevent you having to move the events.

As is always the case, you can always send your feedback for the Microsoft Solutions apps to me at Microsoft@splunk.com.

↧

Live from MEC

September 24, 2012, 12:22 pm

≫ Next: Splunking Powershell and .NET Data Structures

≪ Previous: Quick Tips from .Conf 2012 – Microsoft Solutions

Today marked the return of the Microsoft Exchange Conference (MEC). After a 10 year break, Microsoft revived the conference and team Splunk was onsite in Orlando supporting the revival. At Splunk we deliver multiple solutions that support Microsoft Technologies. At MEC we are showing the Splunk App for Microsoft Exchange, which delivers real-time operations data about your messaging infrastructure. Overall the experience on day 1 was amazing. The interest with the Exchange App had attendees lining up for demos as we showcased how we deliver insight about their messaging infrastructure.

Microsoft Exchange is complex and the questions we are getting have to do with real-world problems administrators face. But the reality is the problems we have been discussing are not just about Microsoft Exchange. In the end an e-mail administrator manages an e-mail service and that problem is bigger than just Exchange servers. Exchange is one part of the puzzle, but your SMTP relay powered by a CISCO IronPort Appliance is an example of components you need to monitor in addition to the Exchange servers. When these components come together you have an e-mail service and this is where the Splunk App for Microsoft Exchange comes in to save the day. As an example by consuming multiple inputs we deliver true message tracking – by correlating the Exchange Server logs with Cisco IronPort we can track a message to the enterprise edge, not just within the Exchange Server boundary – delivering true message tracking.

The Splunk App for Microsoft Exchange delivers multiple features like this that save you time in monitoring your messaging infrastructure and delivers real time insights. Check out the App to discover what’s really going on with you e-mail infrastructure and if you’re in Orlando at MEC stop by the Splunk booth and say hi.

↧

Splunking Powershell and .NET Data Structures

September 25, 2012, 5:21 am

≫ Next: Splunk App for Active Directory and the Top 10 Issues

≪ Previous: Live from MEC

We are currently rocking it at the Microsoft Exchange Conference (MEC) in Orlando and I’m being asked where we get our data from to handle the reporting and monitoring requirements for the Splunk App for Microsoft Exchange. Some of the sources are relatively straight forward – things like the Windows Event Log, IIS logs and Message Tracking logs, for example. But where do we get the rich user information? The answer lies in a series of Powershell scripts that run on a regular basis on each Exchange server. You see, Powershell has access to the whole of the .NET framework and that is where a lot of information lies.

Let’s take a quick example – splunking the Inbox Rules of all the users in Exchange. Our first step is to write a Powershell script to gather the required information. Since we are splunking the data, the only requirements are that we have a time stamp and it is in textual format. However, our best practice is to use KV pairs for the data and to put the data on one line if we can.

The Exchange Command Shell (which is Powershell with additional cmdlets) provides a cmdlet called Get-InboxRule to allow us to pull the information we need. This is really a wrapper around the .NET Framework ExchangeService.GetInboxRule method. You can find information on all the .NET Framework methods from MSDN.

As is common with Powershell, an object is returned by this cmdlet. We can iterate over the members to get the key-value pairs. Finally, we can output all that as a string to the console (which is where Splunk will read the data we produce). You can run this script within the Exchange Command Shell to see what sort of data we are looking at. I call this script “get-inboxrules.ps1″

$Mailboxes = Get-Mailbox -Server $Env:ComputerName
foreach ($Mailbox in $Mailboxes) {
	$Id = 0
	$UPN = $Mailbox.UserPrincipalName
	$Quota = $Mailbox.RulesQuota.ToBytes()
	$Rules = Get-InboxRule -Mailbox $Mailbox
	if ($Rules -ne $null) {
		$Rules | Foreach-Object {
			$O = New-Object System.Collections.ArrayList
			$D = Get-Date -format 'yyyy-MM-ddTHH:mm:sszzz'
			[void]$O.Add($D)
			[void]$O.Add("Mailbox=`"$UPN`"")
			[void]$O.Add("Quota=`"$Quota`"")
			[void]$O.Add("InternalRuleID=$Id")
			foreach ($p in $_.PSObject.Properties) {
				$Val = ""
				if ($_.PSObject.Properties[$p.Name].Value -ne $null) {
					$Val = $_.PSObject.Properties[$p.Name].Value
					$Val = $Val.Replace("`"", "'")
				}
				[void]$O.Add("$($p.Name)=`"$Val`"")
			}
			Write-Host ($O -join " ")
			$Id++
		}
	}
}

Our first step is to get a list of mailboxes (or users) on the mailbox server we are running on. One of the things we do for performance is to ensure that we don’t traverse the network to get information. Now we have a list of target users, we get a list of Inbox Rules for each mailbox using the Get-InboxRule cmdlet. For each rule, we output a line that gives us all the properties of that rule. The real work of making the output Splunk ready is in the Get-Date cmdlet and the join. The Get-Date cmdlet gives the event a time stamp, and the join allows us to provide an array of key-value pairs and sends the output to Splunk as a string.

Splunk does not run Powershell natively, so we have to help it out. In addition, The Exchange Command Shell brings in the Exchange cmdlets before you run scripts. We have to employ a wrapper cmd script to do this. The script just needs to work out where Exchange is installed and then call powershell with the right arguments. I call this script “exchangepowershell.cmd”

@ECHO OFF
SET SplunkApp=TA-Exchange-2010-MailboxStore
:: delims is a TAB followed by a space
FOR /F "tokens=2* delims=	 " %%A IN ('REG QUERY "HKLM\Software\Microsoft\ExchangeServer\v14\Setup" /v MsiInstallPath') DO SET Exchangepath=%%B
Powershell -PSConsoleFile "%Exchangepath%\bin\exshell.psc1" -command ". '%SPLUNK_HOME%\etc\apps\%SplunkApp%\bin\powershell\%1'"

What this script does is firstly to look up where Exchange Server 2010 is installed and then to start up Powershell with the appropriate Exchange cmdlets preloaded. Now that we have the right scripts, we can run the following:

splunk cmd exchangepowershell.cmd get-inboxrules.ps1

It should produce the same results as when running the powershell script in the Exchange Command Shell. Our final piece of the puzzle is to actually grab the data. For this, we use a scripted input, defined in inputs.conf, where we tell Splunk to run our script on a daily basis.

[script://.\bin\exchangepowershell.cmd get-inboxrules.ps1]
index=msexchange
source=Powershell
sourcetype=MSExchange:2010:InboxRule
interval=86400

The magic is grabbing the .NET data is in utilizing powershell for the heavy lifting. This same magic is used in the Splunk App for Exchange and the Splunk App for Active Directory – both are free downloads from splunkbase.com.
With these simple techniques, you can pull data from the internal .NET data structures for any of your Windows applications – SQL Server, Sharepoint, System Center and Lync all are within your reach. It really gives you transparency for your Windows environment.

If you happen to be at the Microsoft Exchange Conference, drop by Booth #18 and ask me how you can get better data from your Windows systems. There is more useful data than just the logs.

↧

Splunk App for Active Directory and the Top 10 Issues

October 21, 2012, 5:32 am

≫ Next: Splunking Exchange in a Simple XML World

≪ Previous: Splunking Powershell and .NET Data Structures

I work a lot with the various people who plan, deploy and support the Splunk App for Active Directory. Some issues come up quite frequently and I thought it would be a good idea to give you a roadmap of things to check as you deploy your environment. I’ll go through the issue and how to check for it so that you can make your roll-out as smooth as possible.

10. Audit Logging is not turned on

A lot of the apps distributed through Splunk Base have no external configuration – just install them on a Splunk instance where the data is being produced and you are done. Splunk App for Active Directory is not one of those. In particular, Domain Controllers don’t produce audit logs by default – you need to turn the audit policies on. If you have installed the technology add-ons (Splunk_TA_windows and TA-DomainController-NT*) on a Splunk Universal Forwarder, installed on each Domain Controller, but you still are not seeing data in the security and change audit dashboards, then this is likely the reason. You can check it by checking that the windows security event logs are producing appropriate events using the search:

eventtype=msad-successful-user-logons

Turning on audit is fairly simple – you need to create a Group Policy Object (GPO) on each domain that configures the audit policies, then apply that GPO to the domain controllers. This activity needs to be done within each and every domain that you run. You can read about this process in our documentation. A new GPO will need to be pushed out using the GPUPDATE command. Once the Group Policy is applied, you will see the data flowing into the Splunk Indexer and this will drive the dashboard creation.

9. Powershell is not enabled

The other data input that requires a little bit of configuration is the health scripts. Active Directory stores a lot of the health information in data structures stored in memory rather than in the directory. It requires that we access .NET libraries to retrieve the information. If you are not seeing the domain selector working, then it is likely that you are not receiving this health data. You can further check this by executing the following search:

eventtype=msad-dc-health

If you log on to a domain controller, you can run the health script manually with the following command:

CD C:\Program Files\SplunkUniversalForwarder\etc\apps\TA-DomainController-NT6\bin C:\Program Files\SplunkUniversalForwarder\bin\splunk cmd runpowershell.cmd ad-health.ps1

If Powershell is turned off, the error message will tell you that scripts are disabled on this host. You can repair this situation by turning on Powershell within the same GPO you use to alter the audit settings, or you can create a new GPO for this purpose. As with the audit settings GPO, it needs to be attached to the domain controllers on each domain. As with the audit settings, you can read about this process in our documentation.

8. Lookup Tables are not Created

Once you have the data for health flowing into indexer, you need to generate the default domain lookup tables. There are two of them that drive the drop downs and domain selectors that drive the dashboards. These dashboards get generated each night, but you can generate them yourself to expedite the information. If you have information when you do the following search:

eventtype=msad-dc-health

then you have this issue. To generate the lookup tables, you need to run the following searches over that last 24 hours:

`domain-list`|dedup host|outputlookup DomainList.csv `domain-selector-search`|outputlookup DomainSelector.csv

Once these searches finish (and they should not take a lot of time), your dashboard selectors will start working.

7. Group Information is Not Available in the Security Reports

Active Directory associates groups with the lists of users by listing the users by Distinguished Name (DN). When we retrieve these groups from Active Directory and expand the membership, we need to associate the DN with a Domain. That activity uses something called the RootDSE within Active Directory. The RootDSE is stored in the Global Catalog on the Domain Naming FSMO role, and replicated to all Global Catalogs within the Forest root domain. That’s a mouthful. If you only have one domain, then it’s a Global Catalog on any one of your domain controllers. If you have more than one domain in your forest, then you need to be a little more careful with picking it. You need to configure the chosen server in the ldap.conf file within the SA-ldapsearch app on your search heads. If you don’t, then group expansion does not happen and the dashboards that deal with group expansion (including all the Group reports within the Security Reports) will not bring up data.

To configure the RootDSE server, add a default stanza to the ldap.conf

[default] server=dc1.domain.com

You don’t need to restart the server when changing the ldap.conf file – changes take effect immediately.

6. Domains Have Three Names

Active Directory provides a storage container, called a domain, for users, groups, computers, and other objects. Some domain names are easy to find – the DNS domain name, for example, is configured when you create the domain and is well-understood. Many security events include the NetBIOS name – a legacy name from the NT5 days but still in use today. However, there is a third name – the distinguished name – which specifies the location within the Active Directory tree that the data for the domain is stored. When Windows produces event logs for the domain, it can use any of the three depending on what it is logging, so we need to know about all three domains. Specify the DNS domain name with all the details of the domain in the ldap.conf file:

[domain.com] server=dc1.domain.com;dc2.domain.com port=389 ssl=false basedn=DC=domain,DC=com binddn=CN=Splunk,OU=Managed Service Accounts,DC=domain,DC=com password=changeme

Once you have set up the DNS domain name reference you can specify the other two domain references as aliases:

[DOMAIN] alias=domain.com

[DC=domain,DC=com] alias=domain.com

5. Attribute Names in the Base DN are Not Capitalized

We mentioned the chained lookups we do for group expansion earlier. That’s not the only place we use chained lookups – in fact, you can try chained lookups yourself by using the new custom LDAP commands we provide with the SA-ldapsearch app. However, sometimes you need to specify the right capitalization. You will note in the previous issue that we capitalized the Domain Component (DC) attribute – that was very deliberate.

You can check the capitalization by looking at the distinguishedName of a record within the domain. Open up Active Directory Users & Computers and expand the target domain until you get to a user or computer record. Enabled the Advanced View, then right click on the record and select Properties. There will be an Attribute Editor tab – within that tab, find the distinguishedName attribute and take a look. It will tell you what the correct capitalization will be.

4. Using bindas instead of binddn

In the very first version of the Splunk App for Active Directory, we used a file called activedirectory.conf to configure the Active Directory connectivity. This very first version only supported one domain, but in that file you configured a single user for searching with the bindas parameter. When customers upgraded to the multi-domain version (the very next release), we switched to the more normal binddn parameter. Some people cut-and-paste the old value into the new file without changing the parameter name.

This will manifest itself (as with all the top 4 elements) with LDAP searches not working. You can execute the following search:

|ldapsearch domain=DOMAIN search="(cn=Administrator)"

You should get a red bar across the search app that tells you binddn is not set. This is a clear indication that you have not configured the ldap.conf domain record properly.

3. SSL is not enabled within Active Directory

Microsoft can sometimes be a little sneaky, and this is one of those times. In the default configuration of Active Directory, the SSL port (TCP port 636) is enabled, but the server does not accept authentication through this channel until you install a certificate and enable authentication over SSL. Microsoft has a support article (KB 321051) on this very topic.

If you have enabled SSL support within the ldap.conf file and you receive red bars indicating that the system could not establish a secure connection or authentication over the secure channel was not accepted, then try disabling SSL within ldap.conf to see if that fixes the issue.

2. Java SE 1.7 is not installed

Do you have a red bar when running LDAP-related dashboards that tells you an Invalid Version of Java is installed?

Many of our customers run Splunk on a Linux system from one of the major distributions like Redhat, CentOS or Ubuntu. All of these Linux distributions are currently shipping with Java 1.6 installed. Even our Windows customers may have a down-revved Java installation. The custom LDAP commands distributed with the SA-ldapsearch app (and that the Splunk App for Active Directory relies on) require newer features of Java 1.7. As a result, Java 1.7 is a requirement to run the SA-ldapsearch commands. You can easily test what version of Java you have installed by opening up a command prompt are running ‘java -version’. This works on both Windows and Linux, although you may have to specify the path to Java for Windows.

$ java -version java version "1.7.0_07" Java(TM) SE Runtime Environment (build 1.7.0_07-b10) Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)

If you see 1.7.0 in the first line, then you are good to go. If you have multiple versions of java installed on your system and you need to select the right one, then you may have to alter the scripts that load java, located in the Splunk_for_ActiveDirectory/bin directory. However, most of the time this will not be necessary.

1. The Wrong Password

Yes, the #1 issue for Splunk for Active Directory is the age-old problem of the correct password. Honestly, it’s remarkable the number of people who swear the password is right, then reset the password to something simple for a test and the whole system works. If you have seen a red bar stating a DSID error code of 525 or 52E, then this is your problem. A good test is to just reset the password, store it in clear text in the ldap.conf file and use cut-and-paste to set the password on the user being used.

The Support Process

So how do I go about supporting the Splunk app for Active Directory with individual customers? Once I see which dashboard is not working, that generally directs me to a line of investigation. A lot of the issues are manifest by the red error bars that direct me to a line of inquiry. I start at the top and take Splunk out of the equation. Using standard tools like ldapsearch (on Linux) and LDAPBrowser (on Windows), I go through each of these ten elements until I find the one that is at the root of a support call. Hopefully, knowing these issues will help you as you are deploying the Splunk App for Active Directory. If not, our Support group is always standing by, ready to assist.

↧

Splunking Exchange in a Simple XML World

February 7, 2013, 4:19 am

≫ Next: Detecting iOS 6.1 with the Splunk App for Exchange

≪ Previous: Splunk App for Active Directory and the Top 10 Issues

With the release of Splunk 5.0, the Simple XML language we use to define the dashboards and forms for an app was greatly extended. So, we were given a challenge – could a reasonably complex app, such as the Splunk App for Microsoft Exchange – be represented using only Simple XML?

Most apps that are developed outside of the point and click interface use Advanced XML. This is a more complex definition language that allows for flexibility and extensibility – things that are generally important. Modules such as Google Maps and Sideview Utils rely on this extensibility to handle complex cases outside the bounds of the core language.

Simple XML is, well, simpler, and provides even the newest of Splunk professionals the ability and opportunity to develop an interesting set of dashboards for Splunk. However, as a result of providing this simplification, the extensibility is sacrificed. Generally, this is a good thing. It makes app development more accessible. Seasoned veterans of Splunk app development can always move to Advanced XML later on, and particularly complex views can be written in Advanced XML while leaving the bulk of the app in Simple XML.

App development got a boost in Splunk 5.0 with the addition of two important features – Report Acceleration and native PDF Generation. However, these are only available when developing apps in Simple XML. In addition, when the dashboard is written in Simple XML, you – the Splunk Administrator – can edit the dashboard using the web-based dashboard editor.

So, how did we do with the Splunk App for Microsoft Exchange? It is one of the larger apps, having over 150 panels over 50 views. I’m happy to say that of all the panels that were developed in Advanced XML, only one could not be converted and used in Simple XML. That one? That’s the one that uses an extension of Advanced XML allowing you to visualize your data using a Google Maps view.

While we were doing this, we also implemented new routines to support the Windows performance gathering that was introduced in Splunk 5.0, and we added new functionality to support additional features in Microsoft Exchange 2013 and Windows Server 2012.

This does mean that you need to be running Splunk v5.0.2 on your search heads, or central Splunk instance, and that you need to be running Splunk Universal Forwarder v5.0.2 on your Microsoft Exchange servers. We’ve gone to some length to ensure the data dictionary doesn’t change, however, so you can upgrade the search head independently from the Exchange hosts.

I encourage you to download Splunk App for Microsoft Exchange and let us know what you think, what you would improve, and what you would change.

↧

Detecting iOS 6.1 with the Splunk App for Exchange

February 28, 2013, 7:47 am

≫ Next: Enabling Splunk as a Windows Domain User with Group Policy

≪ Previous: Splunking Exchange in a Simple XML World

If you are an Exchange Administrator, you might have heard this one. Basically, if you upgrade your iPhone or iPad to iOS 6.1 and then accept a calendar invitation under certain (unfortunately common) circumstances, then your phone starts generating excessive traffic to the Exchange server. This fills up the logs on your Exchange client access servers and mailbox servers with unnecessary and irrelevant information. Many articles have been written about how to ban users, which defeats getting work done.

Fixing the issue short term is relatively easy – institute a throttling policy for these ActiveSync users. I’ll leave that to the Exchange MVP bloggers (you can find information on this here though). The next thing is to get a list of users who have not upgraded yet. Can we do that with Splunk? Yes, we can, and we already have the data – it’s in the IIS logs.

Let’s start by looking at a typical IIS log for an ActiveSync connection. You can get these by searching for eventtype=client-activesync-usage.

2013-02-28 15:11:05 172.16.70.7 POST /Microsoft-Server-ActiveSync/default.eas User=zane&DeviceId=ApplNX1LEBOU4YYE&DeviceType=iPhone&Cmd=Sync&Log=V121_Fc1_Fid:5_Ty:Em_Filt2_Sr:S_Sk:2387060_Sst1_LdapC0_LdapL0_RpcC19_RpcL31_Ers1_Pk1115773768_S1_ 443 zane@spl.com 75.32.103.212 Apple-iPhone3C3/805.401 200 0 0 55

As you can see, nothing looks like “iOS 5.1″ in there. The Splunk App for Microsoft Exchange decodes this for us, and the string we want is stored in the cs_user_agent field. In this particular event, that’s Apple-iPhone3C3/805.401. So, we know it’s an iPhone, and we use that fact to give you a chart of phone types in the ActiveSync dashboard, but let’s look closer at this string.

The first three digits after the Apple-iPhone are the model, including the type. Here is a short table:

Identifier	Real Model
Apple-iPhone3C1/	iPhone 4
Apple-iPhone3C3/	iPhone 4 CDMA
Apple-iPhone4C1/	iPhone 4S
Apple-iPhone5C1/	iPhone 5 GSM
Apple-iPhone5C2/	iPhone 5 CDMA
Apple-iPad3C1/	iPad 3 WiFi Only
Apple-iPad3C2/	iPad 3 WiFi + 4G Verizon / International
Apple-iPad3C3/	iPad 3 WiFi + 4G AT&T / International

You can also get these strings using the Powershell command Get-ActiveSyncDeviceStatistics on a DeviceId (which you can get with Get-ActiveSyncDevice to list the devices known as a user). So, what about the numbers? These are build numbers which can be directly translated into a version number. These have specific major and minor numbers. Here are a few in common circulation:

Build Number	iOS Version
1001.405	iOS 6.0
1001.523	iOS 6.0.1
1002.141	iOS 6.1
1002.146	iOS 6.1.2

Note that these aren’t an exhaustive list. Sometimes two builds will be released with slightly different numbers to cover the differences between CDMA and GSM networks. However, this gives us a good idea for how to determine which users have not upgraded yet. We start with working out the latest cs_user_agent for each user. We then move on to extracting the model and version of the iOS devices, and finally, we work out which ones are abouve 1001.xxx and below our target of 1002.146. Here is my search string:

eventtype="client-activesync-usage" cs_user_agent="Apple-*"|stats latest(cs_user_agent) as cs_user_agent by User,DeviceId|rex field=cs_user_agent "Apple-iPhone(?<model>[^/]+)/(?<version>.*)"|lookup ad_username cs_username as User|table user_subject,DeviceId,cs_user_agent,model,version|where version>1001.000 AND version<1002.146

Simply do this search and create a custom report from it. Then your entire IT group can work with usersThis is the sort of ad-hoc querying that Splunk makes possible. Sure, the dashboards (especially the new ones) are nice and give you lots of on-going information about the habits of your users, but the ability to respond to new threats like the iOS bug is just another reason to use Splunk.

↧

Enabling Splunk as a Windows Domain User with Group Policy

April 15, 2013, 11:49 am

≫ Next: Are all my Microsoft Servers being Splunked?

≪ Previous: Detecting iOS 6.1 with the Splunk App for Exchange

Many times, we develop Windows-based apps (for example, the Splunk App for Exchange or the Splunk App for Active Directory) without special privileges. We recommend installing the Universal Forwarder on the target system with system-level privileges, which has all the necessary rights we need. Sometimes, we come across situations where we need to install Splunk with domain privileges. If you have set up WMI-based remote audit log collection, then this applies to you. Recently, we found that some of the upcoming apps needed domain privileges, so we set about researching exactly how this could be accomplished through the application of group policy in an Active Directory server. We learned that, although the process is long-winded, it is possible and makes the maintenance of many domain-enabled Splunk systems easier to manage.

We start logged in to a domain controller in the relevant domain as a Domain Admin.

Open up Administrative tools -> Active Directory Users & Computers.
Open the domain in question and select an appropriate place to put Groups.
Right-click on this area and select New -> Group
Enter the name “Splunk Service Accounts”, then click on OK
Right-click on the groups area again and select New -> Group
Enter the name “Splunk Enabled Computers”, then click on OK

You have now created two groups – one for computers that will have Splunk running as a Domain User, and the other for the service accounts that Splunk will run as. The next step is to create a suitable group policy object:

Open up Administrative Tools -> Group Policy Management
Open up domains and then your domain, then the Group Policy Object folder
Right-click the Group Policy Object folder and select “New”
Enter a new name for the group polict, for example “Splunk Permissions”, then click on OK
Right-click on the newly created policy object and select “Edit…”
Browse to Computer Configuration -> Policies -> Windows Settings -> Security Settings -> Local Policies -> User Rights Assignment
Double-click on “Act as part of the operating system”
Check “Define these policy settings”
Click on “Add User or Group”
Click on “Browse”
Enter “Splunk Service Accounts”, then click on “Check Names” (the name will be underlined)
Click on OK
Double-click on “Bypass traverse checking” and repeat steps 8-12
Double-click on “Log on as a batch job” and repeat steps 8-12
Double-click on “Log on as a service” and repeat steps 8-12
Double-click on “Replace a process-level token” and repeat steps 8-12
Browse to Computer Configuration -> Policies -> Windows Settings -> Security Settings -> Restricted Groups
Right-click and select “Add Group…”
Click on the “Add” button next to “Members of this group.”, then click on “Browse”.
Enter “Splunk Service Accounts”, then click on “Check Names” (the name will be underlined)
Enter “Domain Admins”, then click on “Check Names” (that name will also be underlined)
Click on OK twice to close the dialog, then close the Group Policy Editor console

You can add other permissions to this group policy. For example, if you wish to use the SA-ModularInput-PowerShell add-on, then you will want to set the execution policy for PowerShell to RemoteSigned and this can be done within this GPO. Now that we have our Group Policy object, we need to ensure it is only applied to those systems and accounts that we need. To do this, we use Security Filtering.

Find the group policy object you just created in the Group Policy Management console
Click on the group policy object to select it – the Security Filtering portion will filll the lower half of the panel.
Click on the “Add” button beneath “Security Filtering”
Enter “Splunk Enabled Computers”, then click on “Check Names” (the name will be underlined)
Enter “Splunk Service Accounts”, then click on “Check Names” (that name will also be underlined)
Click on “OK”
Highlight the “Authenticated Users” setting
Click on “Remove”

Our final preparatory step is to apply the group policy you just created to the domain. We could also place all the computers in a specific organizational unit and apply the group policy to that group, but we have limited the application of this GPO to specific computers already, so we don’t need to put in artificial organization.

Within the Group Policy Management console, right click on the domain and select “Link an Existing GPO…”
Select your group policy object and click on OK
Wait for the group policy to replicate everywhere

This final step – replication – can take some time depending on your Active Directory configuration. Replication needs to happen before we continue because we cannot be certain which domain controller the target systems are bound to. Normally, replication will complete within 24 hours and there are ways to force a replication if your topology is small enough. That’s it for the one-off steps. We can now move to installing a new server with domain privileges. This is a three step process. Let’s assume the computer in question is already bound to the domain. We firstly need to create a service account for running Splunk.

Log on to a domain controller as a Domain Admin
Open Active Directory Users & Computers
Browse to the domain and open a suitable organizational unit for the account (I use “Managed Service Accounts”)
Right-click on the container and select New -> User
Enter a suitable login ID, full name and click on Next
Enter a suitable password and click on Next
Click on Finish
Make a note of the username and password – you will need them later

Our second step is to enable the service account and computer to enable the GPO we created earlier.

Right-click on the service account you just created and select “Add to a group…”
Enter “Splunk Service Accounts” and click on “Check Names” (the name will be underlined)
Click on OK, then on OK again when the success dialog appears
Find the computer account for the system you are installing Splunk on. It will normally be under the “Computers” container
Right-click on the computer account and select “Add to a group…”
Enter “Splunk Enabled Computers”, then click on “Check Names” (the name will be underlined)
Click on OK, then on OK again when the success dialog appears

We have now added the appropriate permissions into Active Directory. All that remains is installing Splunk on the target system. I recommend rebooting the target system at this point as it will cause the computer to pick up the new permissions when it binds to the domain. When you log back into the system, log in as a local Administrator – you can use the Splunk service account you just created if you like. Finally, let’s install Splunk:

Use the Start->Search for the Command Prompt (enter “cmd” in the search box)
Right-click on the command prompt and select “Run as Administrator”
Change drive and directory to where the MSI installer is located
Run “msiexec /i splunkforwarder-*.msi” – if you need to get more specific to run the correct version, then do so.
When the installer asks “Local System Data” or “Remote Data”, select “Remote Data”
Enter the username and password for the service account you created earlier
Continue with the rest of the installation as normal.

Once you are completed, the SplunkForwarder service will be running with domain credentials and there should not be any errors. If you get an error at the end of the Splunk install, it will have an “ExitCode”. That’s a sure-fire indication that you left a step off, such as not waiting long enough for replication.

Of course, this isn’t the whole story. There are multiple methods of getting to this same point, but this is the one we use in our labs now. It works with an Active Directory environment based on Windows Server 2008R2 or Windows Server 2012, and we currently use Splunk Universal Forwarder version 5.0.2 internally for our development (we always use the latest version). This isn’t the whole story, but it’s a major piece of the Splunk on Windows piece. I’ll be covering other parts in future blog posts.

↧

Are all my Microsoft Servers being Splunked?

May 9, 2013, 3:41 pm

≫ Next: Developing Modular Inputs in C# – Part 1

≪ Previous: Enabling Splunk as a Windows Domain User with Group Policy

I recently got asked a question – how can I tell if all my Microsoft servers are being Splunked? Interesting question and one that takes a little bit of effort. But we have all the bits, so let’s take a look at what it would take to answer that question. First off, let’s assume that by “Is a Server being Splunked?”, we mean that the server in question has a universal forwarder on it, is hooked into a deployment server, and is sending events to an indexer. All these bits need to have the events land within the same environment.

To answer this question, we need three pieces of information:

A list of all servers within the Microsoft Active Directory domain
The last time each server sent an event to an indexer
The last time each server checked into the deployment server

Let’s tackle each one in turn, starting with the last time each server sent us an event. We have a search command for that called metadata. This search commands returns a table of information about each host in an index. If we utilize the _internal index, which contains the Splunk logs, then we can get a good approximation of the last time something happened. Our command is:

| metadata type=hosts index=_internal | table host,lastTime

We can also utilize the _internal logs for the deployment server. The deployment server writes out a log entry to a component called DeploymentMetrics every time a server checks in with it. We can use this to find out all sorts of useful information – most of it is available when you run splunk list deploy-clients on your deployment server. However, the data is indexed, so we just need a simple search and stats command:

index=_internal sourcetype=splunkd component=DeploymentMetrics | stats latest(_time) as lastPollTime,latest(status) as status,latest(ip) as ip, latest(build) as build by hostname

The status will normally be ok unless an error occurred. The build is a six-digit number that is specific to the build of Splunk being used and is embedded in the filename of the Splunk Universal Forwarder that you download.

Our final piece of information needed is the list of Microsoft servers. We have a Splunk Addon for querying Active Directory called SA-ldapsearch. Once configured, it can provide the results of any LDAP search that we want to execute against Active Directory. In this case, we want to get a list of all computers that have been bound to the domain and have Server in their operating system field:

|ldapsearch domain=SHELL search="(&(operatingSystem=*Server*)(objectCategory=computer))" attrs="CN,operatingSystem"

In this case, SHELL is my domain, so make sure you replace that with your domain. Now, let’s put it all together:

| ldapsearch domain=SHELL search="(&(operatingSystem=*Server*)(objectCategory=computer))" attrs="CN,operatingSystem" | join type=outer [ metadata index=_internal type=hosts|table host,lastTime | rename host as cn] | join type=outer [search index=_internal sourcetype=splunkd component=DeploymentMetrics |s tats latest(status) as status,latest(ip) as ip,latest(build) as build by hostname | rename hostname as cn ]

I normally run this over the last 24 hours, but results can be correct in as little as 15 minutes. I also put this into a macro so that I can run it easily. Note that when you put this search in a macro, you need to remove the first pipe from the macro (so you macro starts ldapsearch...), then add the initial pipe back into the search command you enter, like those I’ve provided below.

What can we do with this? How about finding out which Microsoft servers do not have a Splunk Universal Forwarder installed?

| `SplunkServerCoverage` | where isnull(guid)

Microsoft servers with a Splunk universal forwarder that is not hooked into a deployment server?

| `SplunkServerCoverage` | where isnotnull(guid) AND isnull(status)

Microsoft servers that have an error on the deployment client?

|`SplunkServerCoverage` | where isnotnull(status) AND status!="ok"

Finally, servers with everything working, but no events in the last 15 minutes:

|`SplunkServerCoverage` | eval td=time()-lastTime | where td>900

By combining the power of Microsoft Active Directory and some simple Splunk search skills, you can manage your environment easily.

↧

Developing Modular Inputs in C# – Part 1

May 13, 2013, 1:57 pm

≫ Next: Microsoft Patch Tuesday! Are your servers patched?

≪ Previous: Are all my Microsoft Servers being Splunked?

One of the cool new features of Splunk 5.0 is modular inputs, and we’ve already seen some great examples of this, such as the built-in perfmon gathering modular input and the Splunk Addon for PowerShell. However, the examples that are provided in the documentation are in Python. When I started writing my own modular input, I saw that much of the process of writing a modular input is scaffolding and repeatable. Thus I set out to write an SDK that would alleviate much of the scaffolding and provide a good framework for writing modular inputs. This multi-part series will cover the same process by writing a C# version of the Twitter example from the documentation.

The first part of writing a modular input is to implement the introspection scheme. When Splunk starts up, it searches for defined modular inputs and runs each modular input with the –scheme parameter. Splunk expects an XML document back that defines the parameters and configuration of the modular input. This is the first part that I thought I could improve with some of the scaffolding. Rather than embed the XML into the program, why not produce a definition of the scheme programmatically and then serialize it with the standard C# XML Serialization library?

Let’s look at my base program:

namespace Splunk.Twitter
{
    class Twitter
    {
        static Twitter twitterApp = new Twitter();

        static void Main(string[] args)
        {
            if (args.Length > 0 && args[0].ToLower().Equals("--scheme"))
            {
                twitterApp.Scheme();
                Environment.Exit(0);
            }
            else
            {
                Console.WriteLine("ERROR Not Implemented");
                Environment.Exit(1);
            }
        }

        public Twitter()
        {
        }
}

Our program is a standard console application that looks for when Splunk feeds us the –scheme parameter and runs the Scheme() method. Our Scheme() method will construct the introspection scheme programmatically and output it to Console.Out (the Windows equivalent of stdout):

public void Scheme()
{
    Scheme s = new Scheme
    {
        Title = "Twitter",
        Description = "Get data from Twitter",
        UseExternalValidation = true,
        StreamingMode = StreamingMode.SIMPLE
    };
    s.Endpoint.Arguments.Add(new EndpointArgument
    {
       Name = "username",
       Title = "Twitter ID/Handle",
       Description = "Your Twitter ID."
    });
    s.Endpoint.Arguments.Add(new EndpointArgument
    {
        Name = "password",
        Title = "Password",
        Description = "Your Twitter password."
    });
    Console.WriteLine(s.Serialize());
}

This is all fairly basic object creation stuff. There are a couple of enumerations that are important. Most notable in this code-segment, the StreamingMode can be SIMPLE (which is a simple line-based output similar to a log file) or XML (where each event is encapsulated in XML before being transmitted to the Splunk server for indexing). We also define the endpoint. This drives the Splunk UI when defining the new data input within the Splunk Manager. In this case, the Splunk UI will ask for two parameters – a username and password.

Compile and run the Twitter.exe application with the –scheme argument and you will see the XML introspection scheme.

<?xml version="1.0" encoding="utf-16"?>
<scheme xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <title>Twitter</title>
  <description>Get data from Twitter</description>
  <use_external_validation>true</use_external_validation>
  <use_single_instance>false</use_single_instance>
  <endpoint>
    <args>
      <arg name="username">
        <title>Twitter ID/Handle</title>
        <description>Your Twitter ID.</description>
        <required_on_edit>false</required_on_edit>
        <required_on_create>false</required_on_create>
      </arg>
      <arg name="password">
        <title>Password</title>
        <description>Your Twitter password.</description>
        <required_on_edit>false</required_on_edit>
        <required_on_create>false</required_on_create>
      </arg>
    </args>
  </endpoint>
</scheme>

Compare this to the XML embedded in the Python version of the Twitter app and you will see that this version is more compliant with an XML document (something that isn’t required by Splunk), but it is otherwise identical.

Next week, we will move on the instantiation of the modular input and getting the parameters you have configured in inputs.conf parsed. Until then, you can follow my progress on github by pulling down my github repository at http://github.com/adrianhall/splunk-csharp-modinputs-sdk.

↧

Microsoft Patch Tuesday! Are your servers patched?

May 15, 2013, 8:24 am

≫ Next: Running as a Windows Service

≪ Previous: Developing Modular Inputs in C# – Part 1

It’s my most favorite time of the month – Patch Tuesday! Ok, I might be slightly exaggerating there. Let’s face it. It’s a pain in the neck. I have to go around to every server in my development environment and ensure that all the critical patches have been taken care of. Usually, this means a trip to Windows Update, or checking the logs of the Windows Server Update Services (WSUS) server. Today, I woke up and decided Splunk was going to assist with this.

Last week, I noted that you could easily get a list of all the servers in your Active Directory environment with the use of the custom command ldapsearch. You can find this command in the SA-ldapsearch app. The hardest part of this command is the configuration for your domain. Once you have that out of the way, it’s incredibly useful.

If you are like me, you also have the Splunk Technology Add-on for Windows installed on all your servers. One of the inputs available (but disabled by default) is the Windows Update Log. You can enable it by going into the Splunk_TA_windows and altering local\inputs.conf to read:

[monitor://$WINDIR\WindowsUpdate.log]
disabled = 0

Push the updated Splunk_TA_windows to your clients and within a few minutes you will get details of all the Windows Update activity that has happened recently. Now, let’s take a look at the logs that we get. The important one looks like this:

2013-05-15	03:26:08:868	 844	e5c	Report	REPORT EVENT: {4BCB468C-170E-4BA8-8C2E-99AAE4CD853A}	2013-05-15 03:26:04:914-0600	1	190	101	{DFAA6388-FE05-49D7-A410-71B92D1C1B37}	202	0	AutomaticUpdates	Success	Content Install	Installation Successful: Windows successfully installed the following update: Update for Windows Server 2008 R2 x64 Edition (KB2798162)

The Splunk_TA_windows extracts certain information from this. The most notable are the Common Information Model compatible signature_id and status fields. These tell us which patches have been installed and the status. Try this search:

sourcetype=WindowsUpdateLog "REPORT EVENT:" "Content Install" | chart latest(status) by host,signature_id

Now, back to the task at hand. Microsoft releases security bulletins on Patch Tuesday, and commentators generally take note and see which ones are critical or not so critical and which ones have exploits in the wild. This month there are three that are notable: MS13-038 is an Internet Explorer exploit that has exploit code in the wild and allows the execution of arbitrary code on the server. This exploit covers Windows Server 2003 through Windows Server 2008R2. It is fixed by patch KB2847204. Other critical patches include MS13-040, fixed by patch KB2836440 and MS13-037, fixed by patch KB2829530. You should, of course, understand the effects of each patch and the risks posed by the vulnerabilities in each case before patching. Never patch a server blindly.

Putting the various search bits together, we can construct a search that tells us which servers are at risk. Here is my search:

| ldapsearch domain=SHELL search="(&(operatingSystem=*Server*)(objectCategory=computer))" attrs="CN,operatingSystem"
| table cn,operatingSystem
| join cn [search sourcetype=WindowsUpdateLog "REPORT EVENT:" "Content Install"
    | chart latest(status) by host,signature_id
    | table host,KB2836440,KB2847204,KB2829530
    | fillnull value="not installed" KB2836440 KB2847204 KB2829530 | rename host as cn]
 | fillnull value="not installed" KB2836440 KB2847204 KB2829530
| where KB2836440!="installed" OR KB2847204!="installed" OR KB2829530!="installed"

You will need to replace the domain name in the ldapsearch command with your domain name. In addition, find a blog that reviews the monthly patches from Microsoft so you know which ones are important to you. Then let Splunk do the work of analyzing your servers.

↧

Running as a Windows Service

May 29, 2013, 12:37 pm

≫ Next: Importing SharePoint ULS Logs

≪ Previous: Microsoft Patch Tuesday! Are your servers patched?

There are some things that are just plain difficult on a Windows box. Take, for example, debugging Splunk scripted inputs. It seems simple enough. But Splunk runs as a Windows Service and is usually running with the “NT AUTHORITY\SYSTEM” ID – a privileged account on the local machine, but with no privileges on the network. In the Linux world, you can use su to become the user that Splunk is running as. But Windows? That’s a bit harder. You used to be able to use RUNAS, but that has been closed off as insecure. SCHTASKS was another technique – gone on the latest Windows platforms.

How does one become NT AUTHORITY\SYSTEM? Well, it takes a download and some explaining. First of all, you need Microsoft SysInternals. This is a suite of utilities that Microsoft provides for all sorts of things. I download the whole lot and install them in my PATH somewhere on any development machine. They are just plain useful. For instance, the BgInfo utility produces a great background with all sorts of useful information about the machine you are on. This is a wonderful utility in a virtualized environment, and well recommended.

Another of the utilities is PsExec. This command allows you to run commands on remote systems. This seems reasonable enough, but how is that going to help us? Well, if you run the following command:

psexec -i -s cmd.exe

You get a system prompt. Do a whoami and you will see you are running as NT AUTHORITY\SYSTEM. The -I switch tells the psexec command that this is an interactive session and the -s switch tells it that we want a system account. Yes, it works on any interactive console, including PowerShell.

Now that you have a tool to execute as the NT AUTHORITY\SYSTEM, you can easily debug those pesky permissions issues simply by running your script with the same permissions as Splunk.

↧

Importing SharePoint ULS Logs

June 11, 2013, 3:33 pm

≫ Next: Monitoring Processes on Windows

≪ Previous: Running as a Windows Service

We like logs – no shock there. However, system administrators also like logs. Some of the most difficult logs to work with come from the Microsoft world. I’ve seen DNS debug logs in Active Directory, IIS and Message Tracking logs in Exchange, and Windows Event Logs from just about everything else. My current view is around Unified Logging System (ULS) logs. These trace logs are produced by several packages, including Microsoft SharePoint and Project Server. They are also among the most requested format for support that I get. They are useful when it comes to diagnosing problems in your Microsoft IT infrastructure since they get down to the individual .NET calls that happen under the covers, but they are far from easy to understand. They are so problematic that Microsoft released ULSViewer to assist in their understanding. Of course, it only handles one file at a time and has rudimentary search and no statistical analysis, but that’s why you use Splunk. Splunk offers the SharePoint administrator some really cool features, but the main one we need is that basic function of Splunk – getting your logs into one place for troubleshooting.

Let’s start by taking a look at a typical log entry.

06/11/2013 15:15:49.90  PSConfigUI.exe (0x0624)                         0x04D0  SharePoint Foundation           Upgrade                        fbv7    Medium          [psconfigui] [SPDelegateManager] [DEBUG] [6/11/2013 3:15:49 PM]: Waiting for mutex to initialize type dictionary

This is actually from my test SharePoint 2010 environment and goes through a debug message. There isn’t anything special about it – it tells you the executable, the PID, the session ID, the product, severity and a message – all good stuff. We can decode this relatively easily with some regular expression magic. Here is a typical multi-line log entry.

06/11/2013 15:15:49.87  PSConfigUI.exe (0x0624)                         0x04D0  SharePoint Foundation           Topology                        8xqz    Medium          Updating SPPersistedObject SPFarm Name=SharePoint_Config. Version: -1 Ensure: False, HashCode: 37121646, Id: 19eb72de-6a41-4c79-9210-1b4ae749c790, Stack:    at Microsoft.SharePoint.Administration.SPPersistedObject.BaseUpdate()     at Microsoft.SharePoint.Administration.SPFarm.Update()     at Microsoft.SharePoint.Administration.SPConfigurationDatabase.RegisterDefaultDatabaseServices(SqlConnectionStringBuilder connectionString)
  at Microsoft.SharePoint.Administration.SPConfigurationDatabase.Provision(SqlConnectionStringBuilder connectionString)    at Microsoft.SharePoint.Administration.SPFarm.Create(SqlConnectionStringBuilder configurationDatabase, SqlConnection StringBuilder administrationContentDatabase, IdentityType identityType, String farmUser, SecureString farmPassword, SecureString...
06/11/2013 15:15:49.87* PSConfigUI.exe (0x0624)                         0x04D0  SharePoint Foundation           Topology                        8xqz    Medium          ... masterPassphrase)     at Microsoft.SharePoint.Administration.SPFarm.Create(SqlConnectionStringBuilder configurationDatabase, SqlConnectionStringBuilder administrationContentDatabase, String farmUser, SecureString farmPassword, SecureString masterPassphrase)     at Microsoft.SharePoint.PostSetupConfiguration.ConfigurationDatabaseTask.CreateOrConnectConfigDb()     at Microsoft.SharePoint.PostSetupConfiguration.ConfigurationDatabaseTask.Run()     at Microsoft.SharePoint.PostSetupConfiguration.TaskThread.ExecuteTask()     at System.Threading.ExecutionContext.runTryCode(Object userData)     at System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode code, CleanupCode backoutCode, Object userData)     at System.Threading.ExecutionContext.Run(ExecutionContext execu...
06/11/2013 15:15:49.87* PSConfigUI.exe (0x0624)                         0x04D0  SharePoint Foundation           Topology                        8xqz    Medium          ...tionContext, ContextCallback callback, Object state)     at System.Threading.ThreadHelper.ThreadStart()

Notice how the embedded token (three dots in a row on either side) is in the middle of the line. Before the three dots, the information is the same except for an asterix next to the date. It’s relatively easy to handle events that have the continuation token at the beginning of the line, but in the middle? Aside from this continuation aspect (about which – yes – I’m just a little bitter) we have a header to contend with and line breaking issues because of the asterix.

Of course Splunk can handle all this. We have the following steps we need to get through:

Read the data through a tailing file monitor
Remove or ignore the header area we don’t need
Separate the data into events through a custom line breaker
Remove the data continuation so the event is “correct”

Let’s take a look at the first bit. I’m receiving my ULS log from my Microsoft SharePoint 2010 farm servers, so I will tag the data with the MSSharePoint:2010:ULS source type. You will need to install a Splunk Universal Forwarder on each SharePoint server and create an app to read the data into your indexer. Then add the following to the inputs.conf:

[monitor://C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\LOGS]
whitelist=.*-\d+-\d+\.log$
sourcetype=MSSharePoint:2010:ULSAudit
queue=parsingQueue
disabled=false

Note that we need a white list because Microsoft places other types of data in this directory, known as the “14 Hive”. The white list restricts the data we are reading to just the ULS log files. Our second step is to create a transforms.conf entry that ignores the header information. The header information contains the field headers and the first thing is “Timestamp” – normally, the file contains a date-time stamp as the beginning of the line, so we can ignore that:

[uls_remove_comments]
REGEX=^Timestamp
DEST_KEY=queue
FORMAT=nullQueue

This is linked into the events using a props.conf entry:

[MSSharePoint:2010:ULSAudit]
SHOULD_LINEMERGE=false
CHECK_FOR_HEADER=false
TRANSFORMS-ulscomment=uls_remove_comments

We next need to set up an appropriate line breaker. The date format is always the same, so we can break on that:

LINE_BREAKER=([\r\n]+)\d{2}/\d{2}/\d{4}\s\d{2}:\d{2}:\d{2}.\d{2}\s

The final piece of this caused me the most concern. How do I make the ULS log events look like they should, without the continuation tokens. Fortunately, I was in our European headquarters recently, and I tackled one of our senior professional services guys, who (after just a little bit of work) came up with the following props.conf addition:

SEDCMD-cleanup=s/(\.\.\.([^\*]+).*?\.\.\.)//g

Our completed props.conf entry looks like this:

[MSSharePoint:2010:ULSAudit]
SHOULD_LINEMERGE=false
CHECK_FOR_HEADER=false
LINE_BREAKER=([\r\n]+)\d{2}/\d{2}/\d{4}\s\d{2}:\d{2}:\d{2}.\d{2}\s
TRANSFORMS-ulscomment=uls_remove_comments
SEDCMD-cleanup=s/(\.\.\.([^\*]+).*?\.\.\.)//g

You will still need to do field extractions on the resultant events. But the heavy lifting of getting the events into Splunk is now done. You will see each event is complete and does not have the intervening additions from the logs. So, as a bonus, the events are actually smaller than the original log.

Now, what will you do with SharePoint ULS logs?

↧

Monitoring Processes on Windows

June 24, 2013, 3:08 am

≫ Next: Catching Errors in PowerShell

≪ Previous: Importing SharePoint ULS Logs

We get a lot of questions here at the Splunk Microsoft Practice – not just on our apps (which are awesome starting points for common Microsoft workloads), but also how to do specific things in Windows. One of the things I recently got asked was “how do I get a top-10 type report of processes on a system and who is running them?” This should be fairly straight-forward. After all, Microsoft provides a perfmon object called “Process” – maybe I can just monitor that. Unfortunately, the owner is not available. Ditto with WMI. Once I’ve exhausted the built-in methods of getting information, I turn to my favorite tool – PowerShell.

There are two methods of getting the list of processes on a system. Get-Process is the de-facto standard for getting a process list from PowerShell, but I prefer the WMI approach – Get-WmiObject –class win32_process. The reason for the choice is that the objects that you get back have a bunch of useful methods on them, one of which is GetOwner() that retrieves the owner of the process – just what we are looking for. You can always get the list of things you can do by piping the command to Get-Member. For example:

Get-WmiObject -class win32_process | Get-Member

In order to get the owner information into the objects, we have to do a little work. Joel Bennett assisted with this small scriptlet:

Get-WmiObject –class win32_process |
    Add-Member -MemberType ScriptProperty -PassThru -Name Username -Value {
        $ud = $this.GetOwner();
        $user=$ud.Domain+"\"+$ud.User;
        if ($user -eq "\") { "SYSTEM" } else { $user }
    }

Although I have split this over multiple lines for readability, you should type this all on the same line. What this does is add an “Owner” property to each object in the pipeline, and it gets the value by called GetOwner() on the object. There is a special case when the process does not have an owner, and in this case, we set the owner to “SYSTEM”.

You will notice an awful lot of properties being returned when you run this command. We will fix that when we start importing it into Splunk. Speaking of which, how do we do that? We turn to one of my favorite addons – SA-ModularInput-PowerShell. You can download it from Splunkbase. This addon persists a PowerShell scripting host for running scripts and gathering the results. Any objects that are output by our script are converted into key-value pairs and sent on to Splunk. You need to install the .NET 4.5 framework and WinRM 3.0 as well as the Splunk Universal Forwarder for Windows.

Since the SA-ModularInput-PowerShell addon does not define any scripts, you need to add your script to the inputs.conf of an app. Our script would appear like this:

[powershell://Processes]
script = Get-WmiObject -class win32_process | Add-Member -MemberType ScriptProperty -PassThru -Name Username -Value { $ud = $this.GetOwner();  $user=$ud.Domain+"\"+$ud.User;  if ($user -eq "\") { "SYSTEM" } else { $user } }|select ProcessId, Name, Username, Priority, ReadOperationCount, WriteOperationCount, CreationDate, Handle, VirtualSize, WorkingSetSize, UserModeTime, ThreadCount
schedule = 0,15,30,45 * * ? * *
source = PowerShell
sourcetype = PowerShell:Process

Our script is fairly evident, but we have added a Select to limit the properties that are sent on to Splunk. I’ve picked some interesting ones around memory usage, thread counts and IOPS. The schedule will be recognizable as a cron-style scheduler. The SA-ModularInput-PowerShell is based on Quartz.NET – a well known open-source scheduling system for the .NET framework.

Once the data is flowing into Splunk (check the splunkd.log file if it isn’t), we need a search that will get us the processes at any given time. Here is my search:

sourcetype=PowerShell:Process |
    stats count as Polls,
        latest(Name) as Name,
        latest(Username) as Username,
        latest(Priority) as Priority,
        max(ReadOperationCount) as ReadOperationCount,
        max(WriteOperationCount) as WriteOperationCount,
        latest(Handle) as Handle,
        max(VirtualSize) as VirtualSize,
        latest(WorkingSetSize) as WorkingSetSize,
        latest(UserModeTime) as UserModeTime,
        max(ThreadCount) as ThreadCount by host,ProcessId,CreationDate

Again, run this all together on the same line – it’s just split up for readability. We need the CreationDate field because a ProcessId can be recycled on a given host. By utilizing the host, ProcessId and CreationDate, we get a unique key to identify each process. I normally place useful searches like this in a macro – either by editing my macros.conf file or in the Manager. I’ve named my macro “all-windows-processes”.

So, what about that top-ten processes. Well, it depends on how you measure the top ten. Here are some interesting searches using that macro:

Top 10 Processes run by users that have the largest virtual memory footprint

`all-windows-processes` | search Username!=”SYSTEM” | top VirtualSize

Top 10 Processes that have the largest amount of disk activity

`all-windows-processes` | eval DiskActivity = ReadOperationCount + WriteOperationCount | top DiskActivity

Top 10 Users that are running the most processes

`all-windows-processes` | stats count by Username,host | top count

Top 10 longest running user processes

`all-windows-processes` | search Username!=”SYSTEM” | top Polls

Hopefully, this gives you some ideas on what you can do to monitor processes on your Windows systems, and if you are wondering how to monitor something on your Windows systems, let us know at microsoft@splunk.com or use Ask an Expert – just look for my picture.

↧

Catching Errors in PowerShell

June 30, 2013, 10:20 pm

≫ Next: Audit File Access and Change in Windows

≪ Previous: Monitoring Processes on Windows

I’ve been recently writing a lot of PowerShell for the SA-ModularInput-PowerShell addon. It’s amazingly flexible at capturing data that is embedded in the .NET framework and many Microsoft products provide convenient access to their monitoring counters via PowerShell. This modular input can replace perfmon, regmon, WMI and all the other things we used to use for monitoring Windows boxes. However, sometimes bad things happen. Scripts don’t work as expected. In the Splunk world, permissions, connectivity and other problems make the diagnosis of scripted inputs a problem. I can run the script myself and get the right stuff, but when I put it in an inputs.conf file, it breaks.

One way to get some diagnostics in there is to ensure the script throws exceptions when necessary and then use a wrapper script to capture those exceptions and produce log events from them. We use this a lot within new apps, and if you have signed up for the Splunk App for SQL Server Beta Program, you will know that all our PowerShell scripts are wrapped in this manner. You can download and view the script on Github, so I am not going to reproduce it here.

This script traps errors. Along the way it writes out two events for you. The first (with sourcetype=PowerShell:ScriptExecutionSummary) contains an Identity field (more on that later), InvocationLine and TerminatingError fields. The more important one from a diagnostics point of view is the second event (with sourcetype=PowerShell:ScriptExecutionErrorRecord) has a ParentIdentity (which matches the Identity field from the first event so you can correlate the two events), and all the Error information as fields. Just in case that wans’t enough, it adds timing information to the ScriptExecutionSummary so you can see how long your script is running.

Using this script is easy. In your addon, create a bin directory for your PowerShell scripts and place the above script in the bin directory as “Invoke-MonitoredScript.ps1” as well. Let’s take a look at the normal running of a script and the wrapped version. Here is our normal inputs.conf stanza for a typical script, taken from the addon for Microsoft SQL Server:

[powershell://DBInstances]
script = & "$SplunkHome\etc\apps\TA-SQLServer\bin\dbinstances.ps1"
schedule = 0 */5 * ? * *
index = mssql
sourcetype = MSSQL:Instance:Information
source = Powershell

Now let’s take a look at the modified version for producing the error information:

[powershell://DBInstances]
script = & "$SplunkHome\etc\apps\TA-SQLServer\bin\Invoke-MonitoredScript.ps1" -Command ".\dbinstances.ps1"
schedule = 0 */5 * ? * *
index = mssql
sourcetype = MSSQL:Instance:Information
source = Powershell

The script you want to run is not affected – only the execution of the script is adjusted. Now you will be able to see any errors that are produced within the monitored script. I have added an Errors dashboard that shows the errors I get combined with the parent invocation information to show timing as well.

↧

Audit File Access and Change in Windows

July 8, 2013, 11:10 am

≫ Next: Statistics and Windows Perfmon

≪ Previous: Catching Errors in PowerShell

One of the bigger problems that we come across is auditing of file systems – specifically, you want to know who read, modified, deleted or created files in a shared area. This is not an unreasonable task, but it is different in every single operating system. Windows has built-in facilities for doing this. We just need to do a few things to get the information into Splunk.

• Object Access Auditing needs to be turned on
• The Shared Folder needs to have auditing enabled
• You need to collect and interpret events from the system

To turn on object access auditing, you need to alter the local security policy. This can be done centrally via a group policy object or it can be done on the local machine. You may even have this turned on already. To turn on object access audit using the local security policy, following this process:

1. Open up Administrative Tools -> Local Security Policy, or run secpol.msc
2. Open Local Policies -> Audit Policy
3. Right-click on “Object Access Audit” and select Properties
4. Ensure “Success” and “Failure” are both checked
5. Click on OK, then close the Local Security Policy window.

You can do a similar thing in group policy – create a new group policy object, edit it, open Computer Configuration and find the Local Security Policy, then adjust as described above, save it and then apply it to some machines in the normal manner. Once it is distributed (which happens roughly every 4 hours by default), your selected systems will have audit forced on.

The next piece is to turn on auditing for a specific folder (and all its sub-folders and files). You normally do this for only a select few places and users, since the information generated is very chatty. For each folder, following this process:

1. Open up the File Explorer by right-clicking and selecting Run As Administrator.
2. Browse to the folder you want to turn auditing on.
3. Right-click on the folder and select Properties.
4. Select the Security Tab.
5. Click on Advanced, then Auditing.
6. Click on Add
7. Enter the name of the users you wish auditing (Everyone is usually a good choice!), click on Find Now to ensure it is registered, then click on OK
8. Check the Successful and Failed boxes, then click on OK
9. Close the windows by clicking OK

Remember that the exact process changes slightly between versions of Windows Server, so be aware that the exact paths may be slightly modified, but they will be called the same thing.

You should be able to see audit information in your Security event log. The final step is to make that information appear in a Splunk instance. I generally install the Splunk Universal Forwarder on the host and deploy the Splunk_TA_windows to the host. This is an essential add-on that collects the Windows Security Event Log by default for you.

Once you are gathering the data, you will see four distinct event codes produces. On NT5 systems (Windows Server 2003 and prior), event codes 560 (open object) and 562 (close object) are produced. On NT6 systems (Windows Server 2008 and later), codes 4656 (open object) and 4658 (close object) are created. Here is an example of Event Code 4656

A handle to an object was requested.
Subject:
   Security ID:  SHELL\ahall
   Account Name:  ahall
   Account Domain:  SHELL
   Logon ID:  0x1ff76
Object:
   Object Server:  Security
   Object Type:  File
   Object Name:  C:\Finance\Accounts.xlsx
   Handle ID:  0x994678
Process Information:
   Process ID:  0xff1
   Process Name:  C:\Program Files\Microsoft Office\Office15\EXCEL.EXE
Access Request Information:
   Transaction ID:  {00000000-0000-0000-0000-000000000000}
   Accesses:  READ_CONTROL
     SYNCHRONIZE
     ReadData
     ReadEA
     ReadAttributes
   Access Mask:  0x120089
   Privileges Used for Access Check: -
   Restricted SID Count: 0

You can see the person who is accessing the resource, the resource itself and the program used to access the resource are all available. In addition, the Logon ID is available. If you have Account Logon Audit turned on, then a logon EventCode (528, 540, 4624) will have been logged from the same machine with the same Logon ID. In addition, you can see how long the file was opened by looking for a corresponding close from the same host with the same Handle ID.

On my search head, I have defined a new event type called “windows-fileaudit” – this is defined in eventtypes.conf, but you can also define it in the Manager. Add this to your eventtypes.conf:

[windows-fileaudit]
search = sourcetype=WinEventLog:Security (EventCode=560 OR EventCode=562 OR EventCode=4656 OR EventCode=4658)

As an example, let’s find all the accesses to the C:\Finance area on host FINANCE, who opened the files and how long they had them open for.

eventtype=windows-fileaudit host=FINANCE Object_Type=”File” Object_Name=”C:\\Finance\\*” | eval CodeType=if(EventCode=560 OR EventCode=4656,”Open”,”Close”) | transaction startswith=CodeType=Open endswith=CodeType=Close host Handle_ID | table _time Security_ID Object_Name Process_Name duration

One word of warning in closing. The object access audit security log events are extremely chatty, so you may want to look at methods of controlling what gets indexed, and perhaps set up a small free version of Splunk to allow you to discover how much data will be logged before moving the data over to your main Splunk index.

↧

Statistics and Windows Perfmon

July 15, 2013, 6:45 am

≫ Next: Developing Modular Inputs in C#: Part 2

≪ Previous: Audit File Access and Change in Windows

Sometimes, things that you expect to be trivial are less so, and you learn by experience of the pitfalls that you may fall into. One such this is Windows Perfmon. In order to save valuable license space, the Splunk Perfmon implementation squashed zero values. In other words, a zero value is not logged. This is normally not a big deal – after all, if you are recording a time chart of the % Processor Time, you might do something like this:

index=perfmon counter=”% Processor Time” instance=”_Total” | timechart avg(Value) by host

When you turn this into a chart, you can specify that null values be rendered as zero and you have a nice chart.

But is it correct?

Let’s say you are monitoring your perfmon counters every minute. At each minute interval, the splunk-perfmon.exe process wakes up and polls for the counter value. If it is not zero, it emits an event with the value. Let’s look at an example. Let’s say your counter is the number of current connections to the IIS process. We monitor this every 60 seconds and get our results. Now, when we do our timechart, we put these values into time-span buckets. Maybe our bucket is every 5 minutes. If the bucket is full, then the value reported by avg(Value) is correct. Similarly, if the bucket is empty, then the value is null, which is handled properly by the nullValueMode on the chart. But what if the bucket is partially full?

In our example, let’s say we get the following samples: {0,3,2,0,0} for our five one-minute intervals. The average for this set is 1 (a total of 5 connections divided by the 5 sample entries). But zero-values are squashed (i.e. not emitted), so what the timechart sees in the bucket is {3,2} for an average of 2.5. This is way off what we expect. The unfortunate thing here is that we don’t know why the zero is squashed – it could be because the value is zero, but it could also be because the server is down. When doing statistical analysis, zero is relevant, so we need to fix that.

Fortunately, we have a good way to correct this. Go on to your Splunk Universal Forwarder and edit the file %SPLUNK_HOME%\bin\scripts\splunk-perfmon.path. This is a text file and contains the following:

$SPLUNK_HOME\bin\splunk-perfmon.exe

Basically, our path file executes the normal executable. However, splunk-perfmon.exe also can take arguments, and one is of interest to us. Change this file to:

$SPLUNK_HOME\bin\splunk-perfmon.exe -showzero

The showzero argument tells the splunk-perfmon.exe to emit zero values. Now you can do statistical analysis on your perfmon data. This not only includes averages, but statistics like 95th percentile.

There are a couple of obvious caveats here:
1. This is a system-wide change. All the perfmon data from all apps will now record zero values.
2. This will increase your license usage. How much? That all depends on how many zero values you are getting.

Ultimately, the decision rests with you – do you do statistics on your perfmon data? If so, you need to make this change. If your needs are a little less statistical (maybe correlation with the Windows Event Logs), then you probably don’t need this change.

↧

Developing Modular Inputs in C#: Part 2

May 23, 2013, 2:59 pm

≫ Next: Thoughts from Microsoft TechEd North America

≪ Previous: Statistics and Windows Perfmon

I’m annoyed at our engineering team, but I’ll get over it. You see, just hours after I posted my first blog post on writing modular inputs in C#, the team up in Seattle released the latest edition of the C# SDK. Within that SDK is a bunch of class libraries that do a much better job than my work on the scaffolding needed to produce a modular input. I highly recommend you go over to their site and dig in to this. Within this blog post, I’m going to adjust my code to use the new scaffolding and take a look at actually running the modular input. Let’s start with the framework. Here is a starting recipe for a modular input:

using System;
using Splunk.ModularInputs;
using System.Collections.Generic;

namespace Splunk.Twitter
{
    internal class Twitter : Script
    {
        public override Scheme Scheme
        {
            get {
                throw new NotImplementedException();
            }
        }

        public static int Main(string[] args)
        {
            return Run(args);
        }

        public override void StreamEvents(InputDefinition inputDefinition)
        {
            throw new NotImplementedException();
        }
    }
}

As you can see, there isn’t much to it – we have a property that returns our Scheme. This is basically the same Scheme class that we used in part 1, but we implement it as a property now. We also need to implement a StreamEvents() method. This is the new method that is called to actually gather events. Let’s take a look at our new Scheme implementation:

        public override Scheme Scheme
        {
            get {
                return new Scheme
                {
                    Title = "Twitter",
                    Description = "Get data from twitter",
                    StreamingMode = StreamingMode.Simple,
                    Endpoint =
                    {
                        Arguments = new List {
                            new Argument {
                                Name = "username",
                                Title = "Twitter ID/Handle",
                                Description = "Your Twitter ID"
                            },
                            new Argument {
                                Name = "password",
                                Title = "Twitter Password",
                                Description = "Your Twitter Password"
                            },
                        }
                    }
                };
            }
        }

Notice that it’s pretty much the same as before – just formatted differently. I like this one better – I don’t have to parse command line arguments, serialize the XML data or understand that the Scheme is returned from a –scheme command. It just happens for me. Now, on to the meat of todays post – actually dealing with the data. I’m not going to tell you how to connect to Twitter and pull data – there are better blog posts than mine on this subject. However, let’s explore what happens when Splunk starts a modular input to receive data. Splunkd runs the modular input with no arguments, and feeds the modular input an XML document via stdin. This is captured by the Splunk C# framework, which turns it into an InputDefinition object and then calls StreamEvents(). Your StreamEvents() method should never end (unlike mine) and can access the parameters that the modular input was configured with. You will need a sample XML document to fully test this. Here is an example:

<?xml version="1.0" encoding="utf-8" ?>
<input>
  <server_host>DEN-IDX1</server_host>
  <server_uri>https://127.0.0.1:8089</server_uri>
  <session_key>123102983109283019283</session_key>
  <checkpoint_dir>C:\Program Files\SplunkUniversalForwarder\var\lib\splunk\modinputs\twitter</checkpoint_dir>
  <configuration>
    <stanza name="twitter://aaa">
      <param name="username">ahall</param>
      <param name="password">mypwd</param>
      <param name="disabled">0</param>
      <param name="index">default</param>
    </stanza>
  </configuration>
</input>

This is actually generated from the information you enter into the inputs.conf file or through the Manager. However, we need to hand-craft this when we are testing. My StreamEvents() method looks like this:

        public override void StreamEvents(InputDefinition inputDefinition)
        {
            Console.Out.WriteLine("# stanzas = " + inputDefinition.Stanzas.Count.ToString());
            foreach (string st in inputDefinition.Stanzas.Keys) {
                Console.Out.WriteLine(st + ":");
                Console.Out.WriteLine("\tUsername = " + inputDefinition.Stanzas[st].Parameters["username"]);
                Console.Out.WriteLine("\tPassword = " + inputDefinition.Stanzas[st].Parameters["password"]);
            }
            throw new NotImplementedException();
        }

I’m still throwing the NotImplementedException(), but first I’m printing some of the data we got from the input definition. Now you can use this to configure your modular input and start gathering data. From PowerShell, I can run this with the following command:

Get-Content MyXMLFile.xml | .\Twitter.exe

There are some great examples of modular inputs out there, including modular inputs for PowerShell execution and SNMP. Modular Inputs are a powerful method of gathering hard-to-get data, and I encourage you to explore your systems like they’ve never been explored before.

↧

Thoughts from Microsoft TechEd North America

June 7, 2013, 4:27 pm

≫ Next: Windows, Perfmon and Internationalization

≪ Previous: Developing Modular Inputs in C#: Part 2

Splunk was an exhibitor at this years TechEd North America in New Orleans, and was lucky enough to not only hand out t-shirts, but also to give numerous demos, talk to some extraordinary customers and spend time with industry experts covering the most commonly deployed Microsoft technologies, including Windows Server, SQL Server, SharePoint, Lync and Exchange. At times, our booth was 3 to 4 people deep listening in on our demos of Exchange, Active Directory and Enterprise Security, and many attendees were interested in our take on Business Intelligence in a Big Data world. We also got many enquiries about development opportunities, SDKs and I spoke to several people about semantic logging. Some highlights for me:

• One of our customers described how they monitor the Windows Active Directory domain controllers using ETW tracing, and wondered if we could incorporate that into a modular input
• Another was taking a fresh look at security in a SQL Server environment and liked our approach to looking for exceptions to the rule in SQL Server audit.
• Yet another wondered how he could find users that are locked out due to mobile device passwords not being reset (common in Exchange ActiveSync environments)

In addition, our partners and customers were sponsoring or exhibiting in force – AppDynamics, Centrify, Cisco, EMC, ExtraHop, F5, NetApp, Hortonworks and Thycotic Software joined us to make up a very Splunky TechExpo. SendGrid had the booth opposite ours and talked to us about how they use Splunk for monitoring their service. Radiant Logic had an engineer developing a Splunk app to use their technology. I probably forget some (and I apologize for that – there were that many!)

I also spent time attending the sessions. Microsoft announced a whole slew of new products – Windows 8.1 had been well reported, but who saw Windows Server 2012 R2 coming? Cloud services based on Windows Azure got a major push, and PowerShell v4 was everywhere. We heard in detail about the new features of 2012 R2 coming down the road, learned about DSC (Desired State Configuration) in PowerShell v4 from the master Jeffrey Snover and Kenneth Hansen. PowerShell v4 will be the default in Windows Server 2012 R2. I managed to squeeze in a session from Todd Klindt and Shane Young on the final day talking about SharePoint 2013. Our own Hal Rottenberg had to do a repeat of his Powershell beginners session because over 100 people were turned away from his first session.

As important as the sessions were, the networking opportunities and discussions with the people that literally “wrote the book” were enlightening – our own Hal Rottenberg was joined by Don Jones, Todd Klindt, Mark Minasi, Aaron Nelson, Denny Cherry, Allen White and Argenis Fernandez for an evening. We also spent time with a lot of the Microsoft product managers discussing futures and where their individual products are going.

When one goes to technical conferences, one cannot help but come away encouraged and energized by what is seen. I add dread since I see a lot of work ahead as we ensure Splunk is ready to take in the data and provide the operational intelligence that makes IT Operations and Security Operations work more intelligently.

But watch us take on the challenge!

↧

Windows, Perfmon and Internationalization

June 17, 2013, 8:00 am

≫ Next: SharePoint, PowerShell and Network Latency

≪ Previous: Thoughts from Microsoft TechEd North America

When we write apps within Splunk, we are generally working with a US English focus. People don’t write logs in multiple languages, after all, so we generally don’t have to worry about multiple languages in the core applications that we write. Except, that is, for Windows. Specifically, perfmon data is delivered localized for the various languages that Windows runs under. (Windows Event Logs are also delivered localized, but this post is specifically about Perfmon data). If you have a US English version of Windows and you want to do a time chart of the percentage of the processor used over the last 24 hours, you might do a search like this:

index=perfmon object=Processor counter=”% Processor Time”
| timechart span=10m avg(Value) by host

However, when you are using a French version of Windows, you need to do this:

index=perfmon object=Processeur counter=”% Temps Processeur”
| timechart span=10m avg(Value) by host

Same thing – different language. How are we meant to deal with the same thing in multiple languages? The best method I have come up with involves a two-step process:

Convert the inputs.conf so that it is retrieving the localized version of the perfmon counters
Adjust the searches to do a lookup based on what I want

I’m going to focus on the second in this article. My method is to use a lookup on the object and counter. I first of all set up a lookup table. This is a CSV file that I write that looks like this:

object,counter,l_object,l_counter
Processeur,% Temps Processeur,Processor,% Processor Time

Add a line for each combination of object and counter that you want to handle. Note that the object and counter that we are receiving are on the left and the non-localized versions are on the right. We set up the lookup in transforms.conf:

[TranslatePerfmon]
filename = TranslatePerfmon.csv
max_matches = 1

Now we can apply the lookup automatically to all Perfmon data with a props.conf entry:

[Perfmon:*]
LOOKUP-perf = TranslatePerfmon object counter OUTPUT l_object l_counter

Now, instead of using the object and counter fields, we can use the l_object and l_counter fields, so our search becomes:

index=perfmon l_object=Processor l_counter=”% Processor Time”
|timechart span=10m avg(Value) by host

Note that this only works if the specific combination of object and counter are available in our lookup file. What about the ones that aren’t? In this case, we need to correct with an eval statement. In version 5.0 of Splunk, we can create evaluated fields to create a copy of the object and counter into l_object and l_counter. Since this is done prior to the lookup, the lookup will overwrite our evaluated fields later on. Our new props.conf entry looks like this:

[Perfmon:*]
EVAL-l_object = object
EVAL-l_counter = counter
LOOKUP-perf = TranslatePerfmon object counter OUTPUT l_object l_counter

Now every single perfmon event will have an l_object and l_counter. Of course, you still have to do the localization file – TranslatePerfmon.csv must be produced for every language you want to support, but you can produce a common file that translates all the languages at once. For instance, you could do the following as the contents of the CSV file

language,object,counter,l_object,l_counter
en_US,Processor,% Processor Time,Processor,% Processor Time
fr_FR,Processeur,% Temps Processeur,Processor,% Processor Time

Here you can see I am supporting both English and French together. I could easily add German, Italian, Spanish and Portuguese to this list. I could also add other objects like Memory, Network Utilization, Logical Disk and Physical Disk. You just need to add appropriate entries to the CSV file.

If you use this technique on one of the Splunk apps – Exchange, Active Directory or Windows – note that you will need to go through several files, including macros.conf, eventtypes.conf, savedsearches.conf and each view in order to change all the references.

Fortunately, most Windows Server applications that introduce new perfmon counters do not localize the counters, so you really only need to support the base Windows counters. Unfortunately, there are a lot of them!

Care to assist? We won’t be able to produce every single language ourselves. If you want to help, then send us your counters. You can obtain a counters.txt file by executing the following PowerShell command on a suitable Windows Server:

(Get-Counter –ListSet *).Counter | Out-File counters.txt

Then send the counters.txt file to microsoft@splunk.com – don’t forget to tell us what language the counters are in! I will compile all the responses we get and publish in a Splunkbase app in the future.

↧