This Week in Data Reading

My input to This Week’s data reading on data.blog

Data for Breakfast

This week Boris shares a piece on the persuasive power of data visualization. Read something provocative in the field of data science? Be sure to share your links in the comments.

Boris Gorelik

boris In “The Persuasive Power of Data Visualization,” a group of New York University researchers demonstrates the results of an experimental assessment: the claim that data visualization is indeed an effective tool for conveying a message.

The 2014 study claims that despite the fact that “…data visualization has been used extensively to inform users…little research has been done to examine the effects of data visualization in influencing users or in making a message more persuasive.”

To make their point, the researchers presented a series of questions to 150 Amazon Mechanical Turk users. (Such a setup isn’t flawless, of course, but is a common practice in many perception studies.)

Indeed, graphical representation was effective, at least for…

View original post 129 more words

Welcoming New Colleagues — a Data-Based Story

My latest post on data.blog

Data for Breakfast

With over 545 employees spread over more than 50 countries, Automattic is one of the largest distributed companies in the world.

Being distributed means that we, as Automatticians, work on our common goal to democratize publishing from wherever we wish. It also means that we heavily rely on online communication.

Besides providing flexibility, working in a distributed company brings challenges when you meet in person. Does X from team I/O appreciate informal humor?What is the hobby of Y, a member of the Happiness Engineering team?Does Z, our team’s HR person, smile a lot or only when taking a gravatar picture? The answers to these questions are trivial to get in a “traditional” company but not in a distributed environment.

However, this information is critical. It helps us relate to one another and form strong relationships that nourish creativity and cooperation. If the challenge doesn’t seem hard enough to…

View original post 1,069 more words

Chart legends and the Muttonchops

Adding legends to a graph is easy. With matplotlib, for example, you simply call plt.legend() and voilà, you have your legends. The fact that any major or minor visualization platform makes it super easy to add a legend doesn’t mean that it should be added. At least, not in graphs that are supposed to be shared with the public.

Take a look at this interesting graph taken from Reddit:

The chart provides fascinating information. However, to “decipher” it, the viewer needs to constantly switch between the chart and the legend to the right. Moreover, having to encode eight different categories, resulted in colors that are hard to distinguish. And if you happen to be a colorblind person, your chances to get the colors right are significantly lower.

What is the solution to this problem? Let’s reduce the distance between the labels and the data by putting the labels and the data together.

Notice the multiple advantages of the “after” version. First, the viewer doesn’t need to jump back-and-forth to decide which segment represents which data series. Secondly, by moving the legends inside the graph, we freed up valuable real estate area. But that’s not all. The new version is readable by the colorblind. Plus, the slightly bigger letters make the reading easier for the visually impaired. It is also readable and understandable when printed out using a black and white printer.

“Wait a minute,” you might say, “there’s not enough space for all the labels! We’ve lost some valuable information. After all,” you might say, “we now only have four labels, not eight”. Here’s the thing. I think that losing four categories is an advantage. By imposing restrictions, we are forced to decide what is it that we want to say, what is important and what is not. By forcing ourselves to only label larger chunks, we are forced to ask questions. Is the distinction between “Moustache with Muttonchops” and “Moustache with Sideburns” THAT important? If it is, make a graph about Muttonchops and Sideburns. If it’s not, combine them into a single category. Even better, combine them with “Mustache”.

Muttonchops
Muttonchops. By Flickr user GSK

Having the ability to add a legend with any number of categories, using only one code line is super convenient and useful, especially, during data exploration. However, when shared with the public, graphs need to contain as fewer legends as practically needed. Remove the legends, place the labels close to the data. If doing so results in unreadable overlapping labels, refine the graph, rethink your message, combine categories. This may take time and cause frustration, but the result might surprise you. If none of these is possible, put the legend back. At least you tried.

Chart legends are like Muttonchops — the fact that you can have them doesn’t mean you should.

Evolution of a Plot: Better Data Visualization, One Step at a Time

My latest post on data.blog

Data for Breakfast

The goal of data visualization is to transform numbers into insights. However, default data visualization output often disappoints. Sometimes, the graph shows irrelevant data or misses important aspects; sometimes, the graph lacks context; sometimes, it’s difficult to read. Often, data practitioners “feel” that something isn’t right with the graph, but cannot pinpoint the problem.

In this post, I’ll share the process of visualizing a complex issue using a simple plot. Despite the fact that the final plot looks elementary and straightforward, it took me several hours and trial-and-error attempts to achieve the result. By sharing this process, I hope to accomplish two goals: to offer my perspectives and approaches to data visualization and to learn from other options you suggest. You’ll find the code and the data used in this post here.

Plotting power distribution in the Knesset

This post is devoted to a graph I created to explore…

View original post 1,738 more words

16-days work month — The joys of the Hebrew calendar

Tishrei is the seventh month of the Hebrew calendar that starts with Rosh-HaShana — the Hebrew New Year*. It is a 30 days month that usually occurs in September-October. One interesting feature of Tishrei is the fact that it is full of holidays: Rosh-HaShana (New Year), Yom Kippur (Day of Atonement), first and last days of Sukkot (Feast of Tabernacles) **. All these days are rest days in Israel. Every holiday eve is also a de facto rest day in many industries (high tech included). So now we have 8 resting days that add to the usual Friday/Saturday pairs, resulting in very sparse work weeks. But that’s not all: the period between the first and the last Sukkot days are mostly considered as half working days. Also, the children are at home since all the schools and kindergartens are on vacation, so we will treat those days as half working days in the following analysis.

I have counted the number of business days during this 31-day period (one day before the New Year plus the entire month of Tishrei) between 1993 and 2020 CE, and this is what we get:

 

tishrei_working_days

 

Overall, this period consists of between 15 to 17 non-working days in a single month (31 days, mind you). This is how the working/not-working time during this month looks like this:

tishrei_workign_weeks.png

 

Now, having some vacation is nice, but this month is absolutely crazy. There is not a single full working week during this month. It is very similar to constantly interrupted work day, but at a different scale.

So, next time you wonder why your Israeli colleague, customer or partner barely works during September-October, recall this post.

 

(*) New Year starts in the seventh’s month? I know this is confusing. That’s because we number Nissan — the month of the Exodus from Egypt as the first month.
(**)If you are an observing Jew, you should add to this list Fast of Gedalia, but we will omit it from this discussion

 

Interview with a WordPress.com data scientist

Two weeks ago, I gave an interview to Matthew Kaboomis Loomis from http://www.buildyourownblog.net. This was my first time, and I was pretty nervous. During the interview, Matthew and I talked about the recent findings that I have published in my previous post. Surprisingly, I really enjoyed the interview.

Click on the image below to see the interview on Matthews’ blog.

 

30-Matthew-April27_26-30-1024x478