Quantcast
Channel: Thoughtsofanidlemind's Blog » Get-Mailbox
Viewing all articles
Browse latest Browse all 5

Musing on searching

$
0
0

The publication of the post by the Exchange team to reveal the secret registry instruction to allow multi-mailbox searches to interrogate more than 25,000 mailboxes got me thinking. First,  I thought that the era of registry hacks was over for Exchange. But on reflection I don’t think that we are on our way back to the bad old days of Exchange 2000 and Exchange 2003 when Microsoft published copious registry hacks to influence the way that the software operated and figuring out just what had been changed on a server became a real problem for support professionals.

Of course, these weren’t the first versions of the product to use secret registry settings and the standard was set by the famous “Squeaky Lobster” hack that you had to input to reveal advanced performance counters on an Exchange 5.5 server. Exchange 2000 introduced a huge variety of new features. The administration interface lagged somewhat and, unlike today, the developers were not allowed to introduce new UI in a service pack. So they enabled features and tweaks through registry hacks. The disease rapidly spread throughout Exchange to a point where I doubt that even the most devoted Exchange nerd could keep up.

The regime of a new Vice President of the Exchange group changed things and we don’t have so many registry settings to tweak today. Of course, you can argue that registry settings have been replaced by obtuse XML-formatted configuration files such as those used by the Mailbox Replication Service (MRS) or the transport service.This is true, and XML configuration files suffer from the same fatal flaw as registry settings in terms of being server-specific and not friendly to the needs of a distributed environment. They also suffer from the problem of language and debugging in that it is all too easy to make a mistake when you edit one of Exchange’s configuration file. The product doesn’t include an intelligent editor for these files, possibly because it’s the developers’ way of saying “hands off – don’t edit this”, so most administrators resort to Notepad and make changes on the “suck it and see” principle. Sounds very much like editing the registry…

In any case, returning back to multi-mailbox discovery searches, it’s a nice thing to know that administrators in large organizations can bring servers to their knees by launching searches that span 100,000 mailboxes and gather tens of gigabytes of data, possibly even dragging all that data across the network to the default discovery mailbox that’s still located on the first Exchange 2010 mailbox server installed into the organization. Clearly not a good thing to do and indicative of the need for planning before the deployment and use of multi-mailbox searches.

What other issues might affect these searches? Here are a number of tips that you might like to bear in mind.

  • The UI for discovery searches is not revealed by the Exchange Control Panel (ECP) unless your account holds the Discovery Management RBAC role. Obvious, but often overlooked… There’s no way to execute searches from the Exchange Management Console (EMC), so this is one of the items of functionality that is unique to ECP. If you don’t like using ECP, you can create mailbox searches using the New-MailboxSearch cmdlet to create new searches, Get-MailboxSearch to return details of searches, Set-MailboxSearch to update search criteria, Start-MailboxSearch to start a search, and Remove-MailboxSearch to remove the search criteria from the arbitration mailbox (see below).
  • Mailbox searches depend on the content indexes that Exchange populates as items arrive into mailbox databases. Even though Exchange 2007 supports content indexes, you can only search data hosted on Exchange 2010 mailbox servers. This means that you have to complete your migration from Exchange 2003 or Exchange 2007 before discovery searches are really feasible. Of course, you can short-circuit the process by moving the mailboxes that are involved in a discovery action to Exchange 2010 servers.
  • Discovery searches can find items in the Recoverable Items folder (aka the “dumpster”) or those on retention or legal hold because these items are held in folders that are invisible to users but are indexed.
  • Exchange can search message properties (for example, subject, addressee list) very effectively because these data are available in the mailbox databases. Attachments have to be made discoverable to Exchange before their content can be incorporated into the indexes. Microsoft makes the Office 2010 filter pack available to allow you to install the IFilters necessary to index Word, Excel, PowerPoint, Visio, and so on and the pack must be installed on all mailbox servers (for content indexing) and transport servers (to allow transport rules to examine content in en-route messages). These filters cover the vast bulk of documents circulating in corporate environments with the glaring exception of PDF. Adobe has an IFilter available for PDF but some have reported better results with the version available from Foxit Software. You know you have problems with IFilters when searches report a high number of unsearchable items (the properties of these items will be searched – the item is unsearchable if its content is inaccessible). Of course, in this context, a high number is linked to the total number of items searched. If you search 10,000 mailboxes it’s probably acceptable to have 250 unsearchable items (but still a good idea to understand what these items are) while 2,500 unsearchable items might be problematic.
  • Determining the effectiveness of your search parameters is not easy. Exchange will report the mailboxes that it scanned and the number of hits that it generated but it’s hard to understand whether you have found the desired information until you look through the captured items. Clearly you need to experiment with search criteria (Exchange uses the AQS syntax for searches so you can construct very complex and precise searches) to hone in on the right material and it may take several attempts until you know you have the right search. Exchange allows you to test search criteria without capturing any data and that’s absolutely the way to proceed until you know you’re looking in the right place. After that, you can decide to capture either deduplicated or all data. A deduplicated search captures the first instance of an item no matter how many mailboxes in which it is found. An “all-in” search captures each and every instance of an item. Obviously, it’s the nature of email that many items occur in multiple mailboxes so a deduplicated search (introduced in Exchange 2010 SP1) captures far less data.
  • As mentioned above, the first Exchange 2010 mailbox server installed into the organization hosts the default discovery mailbox. The mailbox is disabled but visible through the admin tools with the name “Discovery Search Mailbox”. This mailbox is used to store the copies of items recovered by searches so it has a large 50GB quota. It can be moved to another server if appropriate or you can create additional search mailboxes for use with specific investigations. To create a new discovery mailbox, use a command like this:

New-Mailbox -Name 'Discovery Mailbox for ABC Investigation' -Discovery -UserPrincipalName 'ABCDiscoveryMailbox@contoso.com' -Database 'MB2'

Note that I’m careful to assign the new discovery mailbox in a specific mailbox database. Ideally, this database should be close (in network terms) to the databases that contain the mailboxes that will be searched to minimize the amount of network traffic generated when discovered items are captured and stored in the discovery mailbox. Remember that if the discovery mailbox is in a database that has copies, Exchange will need to replicate the search results to all servers that host database copies, so a big search can have a very real impact on many aspects of system performance.

New discovery mailboxes are immediately available as a target for search results but they are not automatically accessible to the members of the Discovery Management role group. This is by design as the intention is to allow for the separation between the work done by the people who create and execute searches and those who review the gathered results. You have to specifically change the permissions on the newly-added discovery mailbox to make it available to those who have the authority to review the material captured there. Discovery searches can turn up huge masses of confidential business and personal data so it’s obviously critical to keep close control over the users who can access discovery mailboxes. It’s also a good idea to agree guidelines with your legal advisors as to how long the results of discovery searches should be kept as obviously you don’t want confidential material being kept for longer than it should be.

Exchange 2010 stores the metadata (the parameters used to describe the search) for searches in a hidden system mailbox called “SystemMailbox{e0d1c29-89c3-4034-b678-e6c29d823ed9). Thankfully, you won’t have to type that name too often. You can see this mailbox listed with this command:

Get-Mailbox -Arbitration

Overall, I like the structure that Microsoft has established in Exchange 2010 for multi-mailbox searches. I don’t like the tools available to analyze the effectiveness of searches or to review the results that are captured in the discovery mailboxes. Hopefully Microsoft will improve matters in future releases.

- Tony

For more information about multi-mailbox discovery searches, read chapter 15 of Microsoft Exchange Server 2010 Inside Out (pages 1033-1049), also available at Amazon.co.uk. The book is also available in a Kindle edition



Viewing all articles
Browse latest Browse all 5