Saturday, November 21, 2015


I attended the recent Tableau Conference in Vegas where the Tableau team debuted new features and functionalities in Tableau 9. Since I was still using Tableau 8 at that time, the new features warranted a download of the new version 9. That said, I'm currently using both versions since the company still hasn't integrated Tableau 9 server functionality and the only way to access dashboards built using the new version would be through Tableau reader/desktop.

Recently, I had a request from a client to build a dashboard that they are currently receiving from one their partners - a Demand Side Platform (DSP) and the client loves the functionality there. I set out to build the dashboard on my end and created a look-a-like in a few hours.

Here's how it looks (click to enlarge):

The main functionality of the dashboard would be to provide rough estimates at where the optimal spend might be for the account. This rough estimation is achieved through displaying a time-series line chart with 2 lines (commission and spend). There are a few filters in this dashboard to select one or a few campaigns, date range and most importantly an attribution methodology of fixed percentages on commissions through either post-click or post-view.

Overall, I would honestly say that this isn't the best dashboard used for the sole business purpose of determining optimal spend. However, an added benefit is that this dashboard might double as a tool to investigate outliers in the data, for instance the huge spike in the commissions chart which might be a result of the client running a promotional event.

Moving away from the above dashboard, I was working on another 3 dashboards today as part of an interview exercise and I found the experience quite fulfilling - particularly around the insights that could be derived from the given dataset.

Here are the dashboards (click to enlarge):

I was provided with a csv file and with that I ingested the data into Tableau to create the above dashboards. But prior to this there was another exercise where I had to script a SQL query given some information about the database schema in order to generate the same dataset.

The first dashboard (Bike Usage) was mainly created to look at the granularity of a bike level the number of trips that were taken and the number of days used. I had to do a simple count for the former and a simple calculated formula for the latter to convert data from seconds into days. The premise of the 3 views in this single dashboard is to inform when a bike needs to be replaced because of being worn out either from the number of trips taken or the actual usage of the bike. I left it as-is and did not go ahead with blending the two sets of data in trying to create a 'bike-replacement' prediction since there really isn't an accurate way to model this out without data for the dependent variable.

The second dashboard (Bike Movement) is probably the most informative dashboard out of the three dashboards created. This dashboard would inform where there are bike shortages and excesses at respective stations shown in the top two packed bubble charts and the view in the bottom half. The views are quite self-explanatory, which in essence is the crux of creating a good dashboard. Top-right of this dashboard is a view of a geographic map of North America showing the number of bike sharers in a specific zip code, and allows filtering by subscribers/customers. This geographic view in my opinion serves no real purpose other than to show where users have come from and I have only included it as an accessory to the dashboard.

The third dashboard (Platform Popularity) shows the growth trend in terms of the number of trips that were taken over time. One thing that stands out is the more or less consistent drop off in trips taken during weekends (troughs in the area chart) which would mean downtime and lower revenue generated. Another interesting finding is through the filter by bike station where you will be able to dive into which stations are heavily utilized and which are not. From there you will then be able to decide whether to expand the bike station to accommodate more bikes or to demolish under-utilized stations if there is focus on improving the bottom-line.

Overall a great exercise to work although it did take up quite a fair amount of time.

Wednesday, December 17, 2014

Visualizations on Tableau

I have been exploring Tableau recently and I found that it's actually quite a powerful data visualization tool. I've attached a few screenshots of the dashboards I drafted and I'm still working on improving them right now.

While using Tableau, I realized that a downside is that online documentation and tutorials of how to use Tableau features might be rather limited. I had to spend a few hours just to find out how to use simple features. In addition, I realized that some things that can be easily done in excel can't be done in Tableau (for example - side-by-side stacked charts). Nonetheless, Tableau in my opinion is still a great tool to use for data visualizations and connecting to data sources (where in Excel you would have to meddle around with ODBC connections).

Here are some screenshots of some dashboards I recently created (click to enlarge):


Wednesday, June 18, 2014

Using Treemaps and Heatmaps to visualize data

Recently I tried exploring the functionality and use of Treemaps in helping me manage my list of campaigns. I used R in Excel and hard coded the data into the template diagram, after running a MySQL script to extract the data from my company's database. What I found is that while it helps me visualize which campaigns are under-performing, what I really needed is a BI tool that would allow me to bucket certain campaigns by geographic split/account managers/size of campaigns etc.

The limitation of what I had tried was that it allowed me only 2 variables - I chose performance to be color coded into a scale of 100% (red) to 0% (black) to 100% (green) and the revenue spend over the campaign's lifetime as the size of each rectangle, in an attempt to understand the big picture of my campaigns per account manager to see which campaign I should look at and optimize right away. All in all, I think a tool like this would be useful for campaign analysts, and it would probably take a few software engineers around 6-8 months to build out a BI tool with this functionality. In the meantime, writing a macro with integration to the MySQL database would temporarily give us this functionality - with a lot of query tweaking for different sets of data.

Performance Treemap across a region

Performance Treemap subset

These were the links I found particularly useful when creating Treemaps using R:


For Heatmaps, i find it only particularly useful in the context of showing distribution of a particular variable (in this case - Gas Prices) on an actual geographic map.

In the context of analytics in online advertising, it is commonplace to generate insights for the client that includes a map of the world or targeted locality (e.g. by country - US) and have heat spots on a varying color scale showing states or dmas with varying levels of performance (e.g. clicks/conversions or their respective click/conversion lifts).


Other common uses of Heatmaps would be to visualize values or a data set in tabular format but I don't find this extremely useful as of now. If I were to utilize this, it would most likely be a supplementary diagram since the functionality of this visualization is rather limited.

You can easily create something like that below in excel using conditional formatting on the entire table of values. For help creating this diagram, please visit this link:


Wednesday, February 26, 2014

Representing data in a dashboard

I have been working on a dashboard lately and I feel that the choice of tools and diagrams that are used in building the dashboard are extremely crucial to effectively deliver the dashboard's message to the end user. Since I'm most familiar with excel, it naturally became my tool of choice for building the dashboard. Other analysts or data scientists might be more familiar with other tools such as Tableau, R, etc. but the principles are the same in creating an effective dashboard.

Wikipedia describes a dashboard as "an easy to read, often single page, real-time user interface, showing a graphical presentation of the current status (snapshot) and historical trends of an organization's key performance indicators to enable instantaneous and informed decisions to be made at a glance". 

As per the Wikipedia description above - many businesses try to compact all their data into a single page dashboard, and a clear disadvantage is that the dashboard might become too abstract that it makes it difficult for the end user to interpret the data. However, if a single page dashboard is still required, it might be useful to add in appendices to support and explain complex representations.

I'm currently working on a dashboard to address a need to establish a macro-level view of the health of a particular agency's campaigns. We would frequently choose to ignore the use of tools such as dashboards and continue micro-managing each campaign - which isn't particularly efficient especially when you have a ton of campaigns to manage. For an analyst, we often have to take charge of a particular group of clients - whether they are grouped into a particular geographic region, type of industry or type of channel. By having a macro-level view of all the campaigns you are managing, it will eliminate the resources wasted on micro-managing campaigns and allow for better management of campaigns on a daily basis.

The dashboard that I have created focuses on 3 main variables - performance, client happiness and advertising campaign spend. To visualize all these variables into a single diagram - I used the donut graph functionality available in Excel. And on the left side of the graph, I included a raw performance stats table that was sorted by highest performing to the lowest and conditionally formatted to make the table easier to understand. I also included speed meter gauges to help summarize performance and client happiness findings.

click for larger image

I felt that the main dashboard wasn't sufficient to explain my message to the intended user, and thus I created 2 other pages in the appendix. The 1st appendix page has sparkline charts to show in detail trend data alongside the data set that was used to populate these charts since the intended users were technical and more data focused.

click for larger image

The 2nd page in the appendix focuses mainly on underperforming vs performing campaigns and also includes the data set and sparkline charts highlighting the highest and lowest performance within the 3 month period. Another important aspect was the inclusion of a table illustrating the campaign's goals - which are used as a benchmark to determine performance numbers.

click for larger image

Visually the dashboard looks more or less pleasant, but it can definitely be worked on further to remove clutter and repetition. Also, if the agency finds this prototype to be really useful and valuable for their business we can consider putting together a team of engineers to build a BI tool having similar functions.