<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>PerformancePoint Blog &#187; SQL Analysis Services</title>
	<atom:link href="http://performancepointblog.com/category/sqlanalysisservices/feed/" rel="self" type="application/rss+xml" />
	<link>http://performancepointblog.com</link>
	<description>A Blog about PerformancePoint and Microsoft BI technologies. Your host is Russell Christopher</description>
	<lastBuildDate>Thu, 30 Jun 2011 11:14:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.4</generator>
		<item>
		<title>Visualizing Microsoft Market Basket Analysis with Tableau</title>
		<link>http://performancepointblog.com/2011/02/visualizing-market-basket-analysis-with-tableau/</link>
		<comments>http://performancepointblog.com/2011/02/visualizing-market-basket-analysis-with-tableau/#comments</comments>
		<pubDate>Thu, 24 Feb 2011 19:29:08 +0000</pubDate>
		<dc:creator>Russell</dc:creator>
				<category><![CDATA[SQL Analysis Services]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Tableau]]></category>

		<guid isPermaLink="false">http://performancepointblog.com/2011/02/visualizing-market-basket-analysis-with-tableau/</guid>
		<description><![CDATA[In this post we explore different ways you can use Tableau to explore the output of the Microsoft Association Rules algorithm.]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;">Over the past couple of weeks I’ve shared my thoughts on using Tableau to visualize Microsoft’s <a href="http://performancepointblog.com/2011/02/visualizing-sql-server-analysis-services-data-mining-output-with-tableau/" target="_blank">Time Series</a> and <a href="http://performancepointblog.com/2011/02/the-microsoft-clustering-algorithm-tableau-and-more-cowbell/" target="_blank">Clustering</a> algorithms. Today, I’ll finish off with Association Rules.</p>
<p>The Association Rules algorithm focuses on <em>itemsets, </em>which you can think of as a bundle of “things” (attributes) which frequently go together. For example, a popular itemset might be {Peanut Butter, Jelly, Bread}. Itemsets don’t have to be related to shopping, however. A perfectly valid itemset might be {Red Sox Fan, Cretin, Convict}: We’re just creating groups of attributes which go together.</p>
<p>Given many <em>itemsets</em>, the Association Rules algorithm discovers most frequently occurring ones. Then, it creates association rules using those frequently occurring itemsets.  Here’s what a rule looks like:</p>
<p>{Peanut Butter = Existing , Jelly = Existing =&gt; Bread = Existing}</p>
<p>If we see Peanut Butter and Jelly in someone’s shopping cart, they are likely to buy Bread, as well.</p>
<p>The concepts of support, probability, and importance are used to tell us how many times we see an item or itemset, the confidence we have in the rule, and the “lift” we might expect for a rule.</p>
<p>If you’d like to understand more about these concepts, <a href="http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/a1b1f0e4-0e8c-42f0-bf0b-cff176019e03" target="_blank">read this thread</a> by Jamie Maclennan.</p>
<p>Here’s a basic query to return information from our data mining model:</p>
<p><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">SELECT * FROM OpenQuery (SSAS,&#8217;SELECT NODE_CAPTION as [Rule], NODE_SUPPORT AS [Support], MSOLAP_NODE_SCORE AS [Importance], </span></p>
<p><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">NODE_PROBABILITY AS [Probability] from</span></p>
<p><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">[Association].content WHERE NODE_TYPE=8&#8242;)</span></p>
<p>Executing this puppy will return all our rules and their support,  probability &amp; importance values. We’re looking for rules with relatively high importance (&gt; 0) which indicate that the existence of peanut butter and jelly will impact bread buying positively:</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/1-Orginal-Query.gif" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="Original Query" src="http://performancepointblog.com/wp-content/uploads/2011/02/1-Orginal-Query.gif" border="0" alt="Original Query" width="557" height="435" /></a></p>
<p>If you open the image above, you’ll see that each rule looks pretty much like a “formula”. To make these formulae useful, we’re going to need to parse them into individual fields that be used as labels on an axis. To do that, we write lots of kludgy string parsing code:</p>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">SELECT </span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">    [Rule], </span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">    &#8212; Cast NTEXT to nvarchar then determine number of items in itemset by keying on number of = symbols before the -&gt; characters</span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">    &#8212; Divide by 2 since DATALENGTH against NVARCHAR will be twice the string length</span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">    (DATALENGTH(LEFT(CAST([rule] as nvarchar(300)),CHARINDEX(&#8216;-&gt;&#8217;,cast([RULE] as nvarchar(300)))))- DATALENGTH (REPLACE(LEFT(CAST([rule] as NVARCHAR(300)),CHARINDEX(&#8216;-&gt;&#8217;,[RULE])),&#8217;=',&#8221;)))/2   as [ItemSet Size],</span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">    [Support],</span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">    [Importance],</span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">    [Probability],</span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">    &#8212; Grab first item in itemset</span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">    LEFT(CAST([rule] as nvarchar(300)), CHARINDEX(&#8216;=&#8217;, CAST([rule] as nvarchar(300))) -1) as [Basket Item 1],</span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">    &#8212; If two items exist in itemset, grab second using kludgy string parsing</span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">    CASE (DATALENGTH(LEFT(CAST([rule] as nvarchar(300)),CHARINDEX(&#8216;-&gt;&#8217;,cast([RULE] as nvarchar(300)))))- DATALENGTH (REPLACE(LEFT(CAST([rule] as NVARCHAR(300)),CHARINDEX(&#8216;-&gt;&#8217;,[RULE])),&#8217;=',&#8221;)))/2 </span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">        WHEN 2 Then SUBSTRING(CAST([rule] as NVARCHAR(300)),CHARINDEX(&#8216;,&#8217;,CAST([rule] as nvarchar(300)))+2, ((CHARINDEX(&#8216;-&gt;&#8217;,cast([RULE] as nvarchar(300)))-12) &#8211; (CHARINDEX(&#8216;,&#8217;,CAST([rule] as nvarchar(300)))+2) ))</span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">    END as [Basket Item 2],</span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">    &#8212; Recommended Item</span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">    REPLACE(RIGHT(CAST([rule] as NVARCHAR(300)), (DATALENGTH(CAST([rule] as NVARCHAR(300)))/2) &#8211; (CHARINDEX(&#8216;-&gt;&#8217;,cast([RULE] as nvarchar(300)))+2)),&#8217; = Existing&#8217;,&#8221;) as [Recommended Item]</span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">FROM </span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">OpenQUery (SSAS,&#8217;SELECT NODE_CAPTION as [Rule], NODE_SUPPORT AS [Support], MSOLAP_NODE_SCORE AS [Importance], </span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">NODE_PROBABILITY AS [Probability] from</span></h6>
<h6><span style="font-family: Courier New; font-size: x-small; font-weight: normal;">[Association].content WHERE NODE_TYPE=8&#8242;)</span></h6>
<p>Writing the SQL above was by far the ugliest bit. On to the results:<br />
<a href="http://performancepointblog.com/wp-content/uploads/2011/02/2-Updated-Query.gif" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="Updated Query" src="http://performancepointblog.com/wp-content/uploads/2011/02/2-Updated-Query.gif" border="0" alt="Updated Query" width="557" height="435" /></a></p>
<p>The resultset above adds “Basket Item 1”, “Basket Item 2”, and “Recommended Item” fields by parsing the the rule column.</p>
<p>Now for the fun stuff. First, let’s look at our rules by Importance and Probability:</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/3-Association-Rules.gif" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="3 - Association Rules" src="http://performancepointblog.com/wp-content/uploads/2011/02/3-Association-Rules_thumb.gif" border="0" alt="3 - Association Rules" width="557" height="435" /></a></p>
<p>Next, lets look at the actual products which are being recommended:</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/4-Recommendation.gif" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="4 - Recommendation" src="http://performancepointblog.com/wp-content/uploads/2011/02/4-Recommendation_thumb.gif" border="0" alt="4 - Recommendation" width="557" height="433" /></a></p>
<p><em>Support</em> is being used you determine our bubble size, so we can see below that Mountain Bottle cages are recommended by 5 different rules:</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/5-Recommednations.gif" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="5 - Recommednations" src="http://performancepointblog.com/wp-content/uploads/2011/02/5-Recommednations_thumb.gif" border="0" alt="5 - Recommednations" width="557" height="433" /></a></p>
<p>By highlighting a specific item (product), we can see that while we generated lots of rules which recommend it, those rules aren’t very important – I’d probably ignore these.</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/6-Highligt.gif" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="6 - Highligt" src="http://performancepointblog.com/wp-content/uploads/2011/02/6-Highligt_thumb.gif" border="0" alt="6 - Highligt" width="557" height="433" /></a></p>
<p>Finally, let’s see which items  tend to go together in the same basket. Here we can see that the Mountain Bottle Cage gets grouped with a lot of other items.</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/7-Recommendations-per-Product.gif" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="7 -Recommendations per Product" src="http://performancepointblog.com/wp-content/uploads/2011/02/7-Recommendations-per-Product_thumb.gif" border="0" alt="7 -Recommendations per Product" width="557" height="423" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://performancepointblog.com/2011/02/visualizing-market-basket-analysis-with-tableau/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The Microsoft Clustering Algorithm, Tableau, and More Cowbell</title>
		<link>http://performancepointblog.com/2011/02/the-microsoft-clustering-algorithm-tableau-and-more-cowbell/</link>
		<comments>http://performancepointblog.com/2011/02/the-microsoft-clustering-algorithm-tableau-and-more-cowbell/#comments</comments>
		<pubDate>Sun, 13 Feb 2011 23:27:07 +0000</pubDate>
		<dc:creator>Russell</dc:creator>
				<category><![CDATA[Feature]]></category>
		<category><![CDATA[SQL Analysis Services]]></category>
		<category><![CDATA[The Competition]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Tableau]]></category>

		<guid isPermaLink="false">http://performancepointblog.com/?p=603</guid>
		<description><![CDATA[Using Tableau to visualize the output of the Microsoft Clustering algorithm...More cowbell.]]></description>
			<content:encoded><![CDATA[<p>As a follow-up to my <a href="http://performancepointblog.com/2011/02/visualizing-sql-server-analysis-services-data-mining-output-with-tableau/">work with the Time Series algorithm</a>, I’ve been playing with the SSAS clustering algorithm over several cups of <a href="http://www.baldguybrew.com/">Bald Guy</a> coffee.</p>
<p>One will generally use the clustering algorithm to segment “things” (customers into “customer clusters”, for example) based on similarities between attributes that describe the “things” in question.</p>
<p>The Adventure Works sample cube has a ready-made data mining structure which contains a customer-focused clustering model. It’s called <strong>Customer Clusters</strong> and takes the following inputs:</p>
<ul>
<li>Commute Distance</li>
<li>Education</li>
<li>Gender</li>
<li>Home Owner</li>
<li>Marital Status</li>
<li>Number of Cars Owned</li>
<li>Number of Children at Home</li>
<li>Occupation</li>
<li>Total Children</li>
<li>Yearly Income</li>
</ul>
<p>Rather than recreate the wheel, I’ll use the sample.</p>
<p>The first thing we’ll probably want to understand is how different combinations of attributes and attribute values combine to create distinct clusters of customers – otherwise known as a “cluster profile”. SQL Server Analysis Services has a quick but unsupported system stored procedure named GetClusterProfiles() which will give us this information, and since it’s the weekend, I don’t care if it’s unsupported. I’m going to execute this sproc against my data mining model:</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/1-System-Sproc.gif" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="1 - System Sproc" src="http://performancepointblog.com/wp-content/uploads/2011/02/1-System-Sproc_thumb.gif" border="0" alt="1 - System Sproc" width="557" height="384" /></a></p>
<p>Click and view the image above as there’s a lot of good information to be found. The columns <strong>001-010</strong> represent 10 distinct clusters of customers that were discovered based on the data the model was trained on. <strong>Attribute Name</strong> and <strong>Attribute Value</strong> show the inputs and potential input values that were found in the data. For each attribute name/value pair, you’ll see a <strong>support</strong> and <strong>probability</strong> value. <strong>Support</strong> represents the number of cases (rows) we found that fit the criteria in question. <strong>Probability</strong> tells us the probability a particular attribute name/value pair will show up in the customer cluster in question</p>
<p>For example in Customer Cluster 001, there is a ~ 42% chance the customer lives less than a mile away from work, about a 1% chance the person lives 10+ miles away from work, and a ~25% chance they live within 1-2 miles, etc.</p>
<p>As an aside, it’s normal to actual rename clusters from 001, 002, 003 to something more meaningful like “Soccer Moms” , “Yuppies” or “Hipsters” based on our read of the primary drivers of each cluster. I haven’t bothered, so we’ll refer to our clusters as 001-010.</p>
<p>As I mentioned, I initially wanted to use GetClusterProfile()’s output as my data source, but found that I couldn’t use OPENQUERY and the Microsoft MSOLAP provider to execute it. I’m guessing that the fact we return an unknown number of columns (based on the number of clusters we find) could be causing the problem.</p>
<p>Instead, I had to use multiple DMX statements “UNIONed” together. Each DMX SELECT returns cluster attribute name/value pairs for a single attribute. Here’s one for Commute Distance:</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/1.5-DMX-for-Single-Attribute.gif" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="1.5 - DMX for Single Attribute" src="http://performancepointblog.com/wp-content/uploads/2011/02/1.5-DMX-for-Single-Attribute_thumb.gif" border="0" alt="1.5 - DMX for Single Attribute" width="557" height="414" /></a></p>
<p>There’s something important here to note, by the way. Notice the WHERE clause where I filter out any attributes with less support than 500 cases. Choosing 500 was an arbitrary decision on my part, but it impacted my results in a big way. In the output above, there are now only 3 Commute Distance nodes returned for customer cluster 001, while there were 5 when I executed GetClusterProfile() in the first screenshot. Is this good or bad? Hard to say – there were only 35 cases supporting the “1% probability of living 10+ miles away” node in cluster 001. Is 35 statistically significant enough to include?</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/2-OpenQuery.gif" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="2 - OpenQuery" src="http://performancepointblog.com/wp-content/uploads/2011/02/2-OpenQuery_thumb.gif" border="0" alt="2 - OpenQuery" width="557" height="396" /></a></p>
<p>(By the way, after the whole discussion around SUPPORT vis-à-vis “Customer Commute” I managed to completely FORGET to add it to the final query above! Since this is just an example I decided not to go back and correct my oversight, however – don’t be surprised when you don’t see the Commute Distance attribute in any of the following vizes.)</p>
<p>After dropping our final SELECT statement into a Tableau “Custom SQL” data connection, we’re ready to build a Viz:</p>
<ul>
<li>The <strong>Node_Name</strong> (The name of the cluster: 001, 002, etc.) pill lands on <strong>Columns</strong></li>
<li><strong>Attribte_Name</strong> and the <strong>Probability</strong> measure get dropped in <strong>Rows</strong></li>
<li><strong>Attrbute_Value</strong> lives in the <strong>Color</strong> shelf</li>
</ul>
<p>You can now easily see how each attribute value drives the composition of different customer clusters (001-010, on columns).</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/3-Cluster-Profile.gif" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="3- Cluster Profile" src="http://performancepointblog.com/wp-content/uploads/2011/02/3-Cluster-Profile_thumb.gif" border="0" alt="3- Cluster Profile" width="557" height="428" /></a></p>
<p>I found there were mixed text and number values in the <strong>Attribute_Value</strong> column, and this was painful. For example, I had to deal with values like “Married” and “Single” (Marital Status) <em>and</em> .056365 (Number of Cars Owned in cluster 010) in the same field. I used Groups to simplify things a bit, but I’d probably massage the data model / query a bit more if I were doing this “for real” rather than “for fun”.</p>
<p>After we understand how customer clusters are composed, we can really have some fun.</p>
<p>The Adventure Works sample SSAS database contains a second cube called “Mined Customer”. Mined Customers combines two data mining models (<strong>Customer Clusters </strong>and an <strong>Association Rules</strong> model for market basket analysis) with lots of other metrics like sales, profit. The mash up of our “default” metrics and dimensions with the “mining insights” gives us some exciting possibilities for visualization.</p>
<p>In the viz below, we’re able to look at average sales and profit margin by our ten customer clusters. One can see that Cluster 5 drives the highest profit margin with a relatively large number of customers:</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/4-Clustered-Profit-1.gif" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="4  - Clustered Profit 1" src="http://performancepointblog.com/wp-content/uploads/2011/02/4-Clustered-Profit-1_thumb.gif" border="0" alt="4  - Clustered Profit 1" width="557" height="428" /></a></p>
<p>And the fun continues! By dropping one of the “demographic inputs” on the <strong>Level of Detail shelf</strong>, we can begin exploring within individual clusters. I’m going to drill in by <strong>Occupation</strong>.</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/5-Clustered-Profit-2.gif" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="5-Clustered Profit 2" src="http://performancepointblog.com/wp-content/uploads/2011/02/5-Clustered-Profit-2_thumb.gif" border="0" alt="5-Clustered Profit 2" width="557" height="456" /></a></p>
<p>It now becomes apparent that skilled manual workers in Cluster 6 are more profitable than Cluster 5 as a whole. By putting together my Quick Filters and Demographic fields on the Level of Detail shelf I also have the ability to completely explore the sales and profitability of my customer clusters.</p>
<h4>More Cowbell</h4>
<p>And I have an admission to make: I’m obsessed. Since Gartner released its <a href="http://www.gartner.com/technology/media-products/reprints/microsoft/vol2/article15/article15.html">2011 Magic Quadrant for Business Intelligence Platforms</a> report a few weeks ago, I’ve had MQ on the brain. I’ve been thinking about MQs non-stop – it’s my cowbell. In fact, I got a fever, and the only prescription&#8230;<em><strong>is more cowbell!!&#8221;</strong></em></p>
<p>So, I decided I’d take the previous viz and turn it into a MQ. I essentially zoomed in a tad, and dropped in four annotation areas with no labels and played around with background shading. Voila, more cowbell!</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/6-Clustered-Profit-MQ.png" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="6 - Clustered Profit MQ" src="http://performancepointblog.com/wp-content/uploads/2011/02/6-Clustered-Profit-MQ_thumb.png" border="0" alt="6 - Clustered Profit MQ" width="557" height="430" /></a></p>
<p>Next time, Association Rules – otherwise known as market basket analysis</p>
]]></content:encoded>
			<wfw:commentRss>http://performancepointblog.com/2011/02/the-microsoft-clustering-algorithm-tableau-and-more-cowbell/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Visualizing SQL Server Analysis Services Data Mining Output with Tableau</title>
		<link>http://performancepointblog.com/2011/02/visualizing-sql-server-analysis-services-data-mining-output-with-tableau/</link>
		<comments>http://performancepointblog.com/2011/02/visualizing-sql-server-analysis-services-data-mining-output-with-tableau/#comments</comments>
		<pubDate>Mon, 07 Feb 2011 20:35:34 +0000</pubDate>
		<dc:creator>Russell</dc:creator>
				<category><![CDATA[SQL Analysis Services]]></category>
		<category><![CDATA[The Competition]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Tableau]]></category>

		<guid isPermaLink="false">http://performancepointblog.com/?p=577</guid>
		<description><![CDATA[I’m constantly amazed that such a small portion of people who own SQL Server are using the fantastic data mining capabilities they have access to. One thing that has occurred to me as a potential bump in the road towards wider adoption  is visualization. Can Tableau help here?]]></description>
			<content:encoded><![CDATA[<p>I’m constantly amazed that such a small portion of people who own SQL Server are using the fantastic data mining capabilities they have access to. The (free) <a href="http://www.microsoft.com/downloads/en/details.aspx?FamilyId=896A493A-2502-4795-94AE-E00632BA6DE7&amp;displaylang=en">Microsoft SQL Server 2008 Data Mining Add-ins</a> and (not free but nonetheless awesome) <a href="https://www.predixionsoftware.com/predixion/PredixionProducts.aspx">Predixion Insight</a> solution make this stuff really easy to do while still offering industrial-strength capabilities.</p>
<p>One thing that has occurred to me as a potential bump in the road towards wider adoption  is visualization. Microsoft offers some sample (and admittedly long-in-the-tooth) <a href="http://msdn.microsoft.com/en-us/library/ms160727(SQL.90).aspx">data mining visualizations</a> but perhaps users still find it too hard to get their newly won insights out of SQL and/or Excel and present same to rest of the world? Maybe users want more control over how the visualizations look and feel and how one interacts with the results?</p>
<p>To that end, I thought it would be fun to work with a product from Microsoft gold certified partner, <a href="http://www.tableausoftware.com/">Tableau Software</a>. The folks at Tableau make a pretty awesome piece of BI goodness named Tableau Desktop that has really, really nice visualization capabilities. Tableau Desktop is a bit like my old favorite ProClarity, except that it can consume data from SSAS to SQL to Teradata to Netezza to PowerPivot and back to text files. Did I mention the tool has awesome visualization capabilities?</p>
<p>What I’m going to do over the next few weeks or as time allows is to drop Tableau on top of SQL Server Analysis Services data mining output. I’ll try to come up with meaningful examples of how one might visualize the output of all the major data mining algorithms SSAS supports. Sounds fair?</p>
<h2>Time Series</h2>
<p>I’ll start with the Time Series algorithm, which we’ll use for predicting values based on numbers tied to slices of time in the past. One essentially feeds a Time Series model a bunch of numbers which are each associated with a timestamp, and SSAS can come back to you with predicted values for other time periods.</p>
<p>In PerformacePoint 2007, we took advantage of this functionality directly and had a really cool report type which one could nest in a dashboard. Unfortunately, the report type leveraged the deprecated Office Web Components, so we don’t have it PPS 2010. In honor of my lost friend, we start with the Time Series:</p>
<p>I’m going to assume you know something about Microsoft data mining. If you don’t, I advise you to spend some time with the <a href="http://technet.microsoft.com/en-us/library/ms167167.aspx">fine tutorial</a> and <a href="http://technet.microsoft.com/en-us/library/ms174949.aspx">books online</a>. As much as I can, I’ll actually base my work off these walkthroughs and examples in the sample AdventureWorks cube.</p>
<p>So, we’ll start off in BIDS with a simple DMX query which predicts the next 6 periods (months) of sales based on history:</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/1-DM-Query.jpg" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="1 - DM Query" src="http://performancepointblog.com/wp-content/uploads/2011/02/1-DM-Query_thumb.jpg" border="0" alt="1 - DM Query" width="554" height="426" /></a></p>
<p>I’m re-using the “Forecast” model that is already waiting for us inside the SQL Serve Analysis Services sample cube <strong>AdventureWorks DW 2008R2. </strong>The query will return the model of item we’re selling, the region it’s being sold in and 6 months worth of predictions.</p>
<p>The “as-is” query has a problem in that the results come nested as distinct datasets.  This is pretty hard to use :</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/2-DM-Query-Result.jpg" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="2 - DM Query Result" src="http://performancepointblog.com/wp-content/uploads/2011/02/2-DM-Query-Result_thumb.jpg" border="0" alt="2 - DM Query Result" width="554" height="671" /></a></p>
<p>We’ll fix this annoyance by leaning on the DMX FLATTEN statement. One can flip from ‘Designer” to “SQL” view and modify the query itself:</p>
<h5><span style="font-family: Courier New;">SELECT FLATTENED</span></h5>
<h5><span style="font-family: Courier New;">  [Forecasting].[Model Region],</span></h5>
<h5><span style="font-family: Courier New;">  (PredictTimeSeries([Forecasting].[Amount],6)) as [Amount],</span></h5>
<h5><span style="font-family: Courier New;">   True as [Predicted]</span></h5>
<h5><span style="font-family: Courier New;">From</span></h5>
<h5><span style="font-family: Courier New;">  [Forecasting]</span></h5>
<p>Note how I’ve also added a hard-coded field – “Predicted”; I’ve set the value to “True” to indicate that the values in question are predictions, not historical values.</p>
<p>The look that we’re ultimately going for in our visualization is a single, unbroken line or bar chart which gradually transitions from plotting historical values to predicted values. The bit above gets us our “in the future” numbers.</p>
<p>Getting the historical numbers is pretty easy because there happens to be a SQL Server view which provides them to the data mining algorithm for training purposes. The view is named vTimeSeries, and I’ll just re-purpose it:</p>
<h5><span style="font-family: Courier New;">SELECT ModelRegion,</span></h5>
<h5><span style="font-family: Courier New;">   TimeIndex,</span></h5>
<h5><span style="font-family: Courier New;">   Amount, </span></h5>
<h5><span style="font-family: Courier New;">   0 as [Predicted] –-‘False’: Historical values</span></h5>
<h5><span style="font-family: Courier New;">FROM vTimeSeries </span></h5>
<p>How do we merge these two distinct sets of data? UNION:</p>
<h5><span style="font-family: Courier New;">SELECT * FROM OpenQuery (SSAS,&#8217;SELECT FLATTENED</span></h5>
<h5><span style="font-family: Courier New;">  [Forecasting].[Model Region],</span></h5>
<h5><span style="font-family: Courier New;">  (PredictTimeSeries([Forecasting].[Amount],6)) as [Amount],</span></h5>
<h5><span style="font-family: Courier New;">   True as [Predicted]</span></h5>
<h5><span style="font-family: Courier New;">From</span></h5>
<h5><span style="font-family: Courier New;">  [Forecasting]&#8216;)</span></h5>
<h5><span style="font-family: Courier New;">UNION ALL</span></h5>
<h5><span style="font-family: Courier New;">SELECT </span></h5>
<h5><span style="font-family: Courier New;">   ModelRegion,</span></h5>
<h5><span style="font-family: Courier New;">   TimeIndex,</span></h5>
<h5><span style="font-family: Courier New;">   Amount, </span></h5>
<h5><span style="font-family: Courier New;">   0 as [Predicted]</span></h5>
<h5><span style="font-family: Courier New;">FROM vTimeSeries</span></h5>
<p>Note that we use OpenQuery to make this all work – we therefore have a Linked SQL Server Analysis Server plugged into our SQL Server. I’ve named it SSAS, and here’s the property dialog:</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/3-Linked-Server.jpg" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="3 - Linked Server" src="http://performancepointblog.com/wp-content/uploads/2011/02/3-Linked-Server_thumb.jpg" border="0" alt="3 - Linked Server" width="554" height="494" /></a></p>
<p>Once the linked server is there, we can test our query:</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/4-Union-Results.jpg" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="4 - Union Results" src="http://performancepointblog.com/wp-content/uploads/2011/02/4-Union-Results_thumb.jpg" border="0" alt="4 - Union Results" width="554" height="604" /></a></p>
<p>Now that the query is done, the real fun can begin. In a new Tableau workbook, we’ll need to create a new Data Connection pointing at SQL Server and using the <strong>Custom SQL</strong> option. Paste the query above into the Custom SQL dialog, and you’re pretty much ready roll.</p>
<ul>
<li>Drop your <strong>Time</strong> dimension on <strong>Columns</strong></li>
<li>Drop <strong>ModelRegion</strong> and <strong>Amount</strong> on Rows</li>
<li>Place the “new” <strong>Predicted</strong> dimension on the <strong>Color </strong>shelf.</li>
</ul>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/5-Tableau.jpg" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="5 - Tableau" src="http://performancepointblog.com/wp-content/uploads/2011/02/5-Tableau_thumb.jpg" border="0" alt="5 - Tableau" width="554" height="426" /></a></p>
<p>If you click the image above, you’ll see that the <em>Predicted</em> field drives the historical/predicted value transition via a color change.</p>
<p>I also added a few Calculated Fields to break out “Region” and “Model” into two distinct dimensions and added each as a filter. For kicks, I changed my shapes from bars to prediction-or-not influenced squares and circles:</p>
<p><a href="http://performancepointblog.com/wp-content/uploads/2011/02/6-Tableau-More.jpg" target="_blank"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="6 - Tableau More" src="http://performancepointblog.com/wp-content/uploads/2011/02/6-Tableau-More_thumb.jpg" border="0" alt="6 - Tableau More" width="554" height="455" /></a></p>
<p>That’s it!</p>
]]></content:encoded>
			<wfw:commentRss>http://performancepointblog.com/2011/02/visualizing-sql-server-analysis-services-data-mining-output-with-tableau/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>&#8220;Persisted file cannot be found&#8221; when executing DMX via OpenQuery</title>
		<link>http://performancepointblog.com/2011/02/persisted-file-cannot-be-found-when-executing-dmx-via-openquery/</link>
		<comments>http://performancepointblog.com/2011/02/persisted-file-cannot-be-found-when-executing-dmx-via-openquery/#comments</comments>
		<pubDate>Mon, 07 Feb 2011 16:46:08 +0000</pubDate>
		<dc:creator>Russell</dc:creator>
				<category><![CDATA[SQL Analysis Services]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[SQL Server Analysis Services]]></category>

		<guid isPermaLink="false">http://performancepointblog.com/?p=558</guid>
		<description><![CDATA[What does "Persisted file cannot be found" mean when you're executing a DMX query? I didn't know and couldn't find anything on the internets.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m about to start playing around with custom visualization of data mining output, and am doing some basic setup work on my servers.</p>
<p>While testing that I could return results from DMX via OpenQuery, I ran a basic DMX statement to return results from a time series model:</p>
<pre>SELECT * FROM OpenQuery (SSAS,'SELECT FLATTENED
  [Forecasting].[Model Region],
  (PredictTimeSeries([Forecasting].[Amount],3)) as [Amount],
   True as [Predicted]
FFrom
  [Forecasting]')</pre>
<p>Doing so returned this error message:</p>
<pre>   OLE DB provider "MSOLAP" for linked server "SSAS" returned message "Error (Data mining): The '\\?\C:\Program Files\Microsoft SQL Server\MSAS10_50.MSSQLSERVER\OLAP\Data\Adventure Works DW 2008R2.0.db\Forecasting.0.dms\Forecasting.0.dmm\0.Forecasting.cnt.bin' persisted file cannot be found.".
Msg 7321, Level 16, State 2, Line 1
An error occurred while preparing the query "SELECT FLATTENED
  [Forecasting].[Model Region],
  (PredictTimeSeries([Forecasting].[Amount],3)) as [Amount],
   True as [Predicted]
From
  [Forecasting]" for execution against OLE DB provider "MSOLAP" for linked server "SSAS".</pre>
<p>Problem? Stupid user trick &#8211; I hadn&#8217;t processed the model.  I was surprised to find that this error message couldn&#8217;t be found via Bing/Google, however. So here&#8217;s a little entry to save someone else some time.</p>
<p>Cheers.</p>
]]></content:encoded>
			<wfw:commentRss>http://performancepointblog.com/2011/02/persisted-file-cannot-be-found-when-executing-dmx-via-openquery/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating a data-bound, drillable heat map with Silverlight</title>
		<link>http://performancepointblog.com/2010/11/creating-a-data-bound-drillable-heat-map-with-silverlight/</link>
		<comments>http://performancepointblog.com/2010/11/creating-a-data-bound-drillable-heat-map-with-silverlight/#comments</comments>
		<pubDate>Wed, 24 Nov 2010 16:02:04 +0000</pubDate>
		<dc:creator>Russell</dc:creator>
				<category><![CDATA[ProClarity]]></category>
		<category><![CDATA[Silverlight]]></category>
		<category><![CDATA[SQL Analysis Services]]></category>

		<guid isPermaLink="false">http://performancepointblog.com/?p=294</guid>
		<description><![CDATA[There are four big reasons I will always love ProClarity: It writes MDX for me The Decomposition tree The Heat/Performance map It writes MDX for me Since SharePoint/PerformancePoint 2010 now includes a decomp tree, I generally don]]></description>
			<content:encoded><![CDATA[<p>There are four big reasons I will always love ProClarity:</p>

It writes MDX for me
The Decomposition tree
The Heat/Performance map
It writes MDX for me

<p>Since SharePoint/PerformancePoint 2010 now includes a decomp tree, I generally don]]></content:encoded>
			<wfw:commentRss>http://performancepointblog.com/2010/11/creating-a-data-bound-drillable-heat-map-with-silverlight/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Re-creating the PowerPivot Management Dashboard’s “Sliding Bubble Chart”</title>
		<link>http://performancepointblog.com/2010/08/re-creating-the-powerpivot-management-dashboard%e2%80%99s-%e2%80%9csliding-bubble-chart%e2%80%9d/</link>
		<comments>http://performancepointblog.com/2010/08/re-creating-the-powerpivot-management-dashboard%e2%80%99s-%e2%80%9csliding-bubble-chart%e2%80%9d/#comments</comments>
		<pubDate>Thu, 05 Aug 2010 19:41:36 +0000</pubDate>
		<dc:creator>Russell</dc:creator>
				<category><![CDATA[Feature]]></category>
		<category><![CDATA[PowerPivot]]></category>
		<category><![CDATA[Silverlight]]></category>
		<category><![CDATA[SQL Analysis Services]]></category>
		<category><![CDATA[SQL Reporting Services]]></category>

		<guid isPermaLink="false">http://performancepointblog.com/?p=254</guid>
		<description><![CDATA[I love PowerPivot. And I really, really love the visualization in the management dashboard that allows one to see which reports are active across time. I’ve thought to myself many a time, “Self, I sure like that visualization. I wish I could show]]></description>
			<content:encoded><![CDATA[I love PowerPivot. And I really, really love the visualization in the management dashboard that allows one to see which reports are active across time. I’ve thought to myself many a time, “Self, I sure like that visualization. I wish I could show]]></content:encoded>
			<wfw:commentRss>http://performancepointblog.com/2010/08/re-creating-the-powerpivot-management-dashboard%e2%80%99s-%e2%80%9csliding-bubble-chart%e2%80%9d/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Creating Pivot collections via Reporting Services &amp; SQL Analysis Services: Challenges and Solutions!</title>
		<link>http://performancepointblog.com/2010/06/creating-pivot-collections-via-reporting-services-sql-analysis-services-challenges-and-solutions/</link>
		<comments>http://performancepointblog.com/2010/06/creating-pivot-collections-via-reporting-services-sql-analysis-services-challenges-and-solutions/#comments</comments>
		<pubDate>Wed, 30 Jun 2010 12:35:32 +0000</pubDate>
		<dc:creator>Russell</dc:creator>
				<category><![CDATA[Feature]]></category>
		<category><![CDATA[Pivot]]></category>
		<category><![CDATA[SQL Analysis Services]]></category>
		<category><![CDATA[SQL Reporting Services]]></category>

		<guid isPermaLink="false">http://performancepointblog.com/?p=178</guid>
		<description><![CDATA[While watching the keynote from the recent BI Conference, I saw a demo of the Pivotviewer Extensions for Reporting Services. This is an interesting tool that will help automate creating Pivot collections. Unfortunately, even as an Microsoft FTE I can]]></description>
			<content:encoded><![CDATA[While watching the keynote from the recent BI Conference, I saw a demo of the Pivotviewer Extensions for Reporting Services. This is an interesting tool that will help automate creating Pivot collections. Unfortunately, even as an Microsoft FTE I can]]></content:encoded>
			<wfw:commentRss>http://performancepointblog.com/2010/06/creating-pivot-collections-via-reporting-services-sql-analysis-services-challenges-and-solutions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PowerPivot, DAX and Semi-additive measures</title>
		<link>http://performancepointblog.com/2010/01/powerpivot-dax-and-semi-additive-measures/</link>
		<comments>http://performancepointblog.com/2010/01/powerpivot-dax-and-semi-additive-measures/#comments</comments>
		<pubDate>Thu, 21 Jan 2010 16:45:29 +0000</pubDate>
		<dc:creator>Russell</dc:creator>
				<category><![CDATA[PowerPivot]]></category>
		<category><![CDATA[Project Gemini]]></category>
		<category><![CDATA[SQL Analysis Services]]></category>

		<guid isPermaLink="false">http://performancepointblog.com/?p=149</guid>
		<description><![CDATA[Over the week-end I was doing some analysis on SQL Server disk usage, and wanted to be able to display current disk usage by database. Up to this point, I'd mainly been doing a SUM over  my measures. Well, that would make no sense in this scenario -]]></description>
			<content:encoded><![CDATA[Over the week-end I was doing some analysis on SQL Server disk usage, and wanted to be able to display current disk usage by database. Up to this point, I'd mainly been doing a SUM over  my measures. Well, that would make no sense in this scenario -]]></content:encoded>
			<wfw:commentRss>http://performancepointblog.com/2010/01/powerpivot-dax-and-semi-additive-measures/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Creating useful PowerPivot data models for public consumption via Reporting Services</title>
		<link>http://performancepointblog.com/2009/11/creating-useful-powerpivot-data-models-for-public-consumption-via-reporting-services/</link>
		<comments>http://performancepointblog.com/2009/11/creating-useful-powerpivot-data-models-for-public-consumption-via-reporting-services/#comments</comments>
		<pubDate>Wed, 25 Nov 2009 20:13:29 +0000</pubDate>
		<dc:creator>Russell</dc:creator>
				<category><![CDATA[Project Gemini]]></category>
		<category><![CDATA[SQL Analysis Services]]></category>

		<guid isPermaLink="false">http://performancepointblog.com/?p=124</guid>
		<description><![CDATA[If you've played with PowerPivot at all, it's pretty obvious how flexible the tool is in terms of creating data models. After you initially create a model, you may want to spend some additional time with it to make sure users can easily leverage what]]></description>
			<content:encoded><![CDATA[If you've played with PowerPivot at all, it's pretty obvious how flexible the tool is in terms of creating data models. After you initially create a model, you may want to spend some additional time with it to make sure users can easily leverage what]]></content:encoded>
			<wfw:commentRss>http://performancepointblog.com/2009/11/creating-useful-powerpivot-data-models-for-public-consumption-via-reporting-services/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>CTP3 Setup Error: Could not load file or assembly &#8216;Microsoft.AnalysisServices.SharePoint.Integration&#8217;</title>
		<link>http://performancepointblog.com/2009/11/ctp3-setup-error-could-not-load-file-or-assembly-microsoft-analysisservices-sharepoint-integration/</link>
		<comments>http://performancepointblog.com/2009/11/ctp3-setup-error-could-not-load-file-or-assembly-microsoft-analysisservices-sharepoint-integration/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 21:38:17 +0000</pubDate>
		<dc:creator>Russell</dc:creator>
				<category><![CDATA[ProClarity]]></category>
		<category><![CDATA[SQL Analysis Services]]></category>
		<category><![CDATA[PowerPivot]]></category>

		<guid isPermaLink="false">http://performancepointblog.com/?p=98</guid>
		<description><![CDATA[I had a bear of a time getting SQL Server 2008 R2 November CTP's Integrated PowerPivot feature installed.  It looks like several other people in the Twitter/Blogosphere are running into the same issue, but for potentially different reasons. For m]]></description>
			<content:encoded><![CDATA[I had a bear of a time getting SQL Server 2008 R2 November CTP's Integrated PowerPivot feature installed.  It looks like several other people in the Twitter/Blogosphere are running into the same issue, but for potentially different reasons.

For m]]></content:encoded>
			<wfw:commentRss>http://performancepointblog.com/2009/11/ctp3-setup-error-could-not-load-file-or-assembly-microsoft-analysisservices-sharepoint-integration/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

