In this period of “Star Wars fever”, the post “The Star Wars social network” recently appeared on Evelina Gabasova’s blog attracted my attention. Evelina made a great job by parsing the transcripts of the six previous episodes of Star Wars and building the social networks of interacting characters. I refer to her great post for details about data, while here I will focus on an alternative analysis.
In fact, if some characters appear and interact each other in more than one episode, it is natural (for my biased mind) to identify each episode with the layer of a multiplex network. The “allInteractions” data from Evelina are in JSON format: I post-processed a bit to convert them in a suitable format for muxViz, using the same unique IDs for all episodes (the converted data are available on muxViz’s Github repository).
Analysis of Star Wars multiplex network data
First of all, let us import the data as undirected and weighted networks. Here the weight encodes the number of interactions. If everything goes fine, the following should appear:
Centrality analysis: who is the most important character across the whole saga?
While it is possible to perform the analysis of characters’ importance for each episode separately, the multilayer framework allows to quantify the importance across the whole series of episodes. Although the concept of centrality as “importance” is debated and strongly depends on the context, here we use three simple descriptors (see this paper and this paper for mathematical details):
- Character’s Multiplexity: this quantifies the participation of the character to multiple episodes (i.e., the fraction of layers where the node appears)
- Character’s PageRank: introduced by Google’s founders, roughly speaking it ranks characters by assuming that more important ones are likely to interact with other important characters.
- Character’s Degree: this just quantifies the number of different interactions of each character.
The top 20 characters ranked by their multiplexity are shown below, with the two droids and Obi-Wan being the first ones, followed by the Emperor, Yoda, Anakin and Darth Vader (here treated separately, because they interact in different way with other characters, as if they were different individuals).
The results from PageRank ranking are shown below and suggest that Anakin is the most central character of the saga according to this centrality descriptor (remember that this is the same used by Google to rank websites!). Interestingly, Obi-Wan and R2-D2 are almost as important as Anakin. The results for each episode separately, shown in the stacked histogram, reveal that the most important characters per episode are:
- Qui-Gon and Anakin;
- Anakin and Obi-Wan;
- Anakin and Obi-Wan;
- Luke and Leia;
- Luke and Darth Vader;
- Luke and R2-D2.
For completeness, I also included the result from the aggregate network, obtained by summing up all interactions across the whole saga while neglecting the layered structure (i.e., loosing any memory about the episodes where interactions happened). In this case (that is a bad approximation of the correct approach), R2-D2 would be the most central character.
The PageRank can be of difficult interpretation in this context and other results come from the Degree ranking, with the two droids again in the top, together with Anakin Skywalker, Obi-Wan, Padmè Amidala (Anakin’s wife) and Luke Skywalker:
The analysis of strength, i.e. the number of total interactions per character (a kind of weighted degree), reveals that the two droids and Han are the most interacting characters, with Obi-Wan and Anakin following.
Finally, I was wondering about a possible relationship between the three centrality descriptors, summarized in the plot below:
where circles indicate characters and their radii are proportional to characters’ multiplexity. An evident pattern is present, with higher-degree nodes having also higher PageRank, regardless of their participation in multiple episodes.
Annular visualization to summarize the results
A nice (alternative) way to visualize the results from multiple centrality analyses is the annular visualization developed ad hoc for muxViz. Each visualization consists of sectors (indicating characters) arranged into concentric rings (indicating measures or layers), ordered and sized to maximize readability (details about algorithms used for this purpose can be found in this paper).
Below, I consider all multiplex centrality descriptors described so far. Each ring encodes a centrality measure:
Alternatively, one can focus on a centrality descriptor and visualize how it varies when calculated in the multiplex network, in its aggregation or in each layer separately. Below I show the cases for Degree, Strength and PageRank previously discussed.
The visual inspection reveals that some patterns are preserved across episodes and that degree and strength in the multiplex network are correlated with degree and strength in the aggregate.
Layer-layer correlations: uncovering the relationships between episodes
Here, we are interested in relationships between different episodes. In practice, we can obtain a lot of information by looking at different layer-layer correlations.
The chart below shows how episodes are clustered together when looking at character’s multiplexity. Darker colors indicate higher number of shared characters. Two clusters of layers, corresponding to the two different trilogies, are really evident, with characters shared more among episodes 4, 5 and 6. In the other trilogy, episode 1 is more isolated, sharing less characters with episodes 2 and 3.
A similar analysis, this time focused on interactions (i.e., edge overlapping), reveal the same division in two clusters: here we look at which interactions are preserved in different episodes.
The following two charts reveal correlations by comparing the number of different interactions per character: characters important in one episode, are generally important also in other episodes? With some differences, in general the answer is YES, and this pattern allows to cluster, again, the episodes in two evident groups, corresponding to the two trilogies.
The analysis of communities of nodes, i.e. how characters (and not layers!) are clustered together, can be performed in the multiplex social network as well. Here, I use the Multiplex Infomap algorithm, recently developed with Martin Rosvall, Andrea Lancichinetti and Alex Arenas (see the paper for further details and the mapequation.org website for a standalone implementation). The results will be used later for the visualization, whereas below I just report the size of communities per layer, suggesting the presence of two big clusters of characters, corresponding to key cores in each trilogy.
Visualizing the Star Wars Multiplex Social Network
We have almost completed the overview of this nice multiplex network: let us conclude with a visualization of the previous results. In the following, we color nodes by their community assignment and we size them by their degree, obtaining the layered visualization below:
I guess that the rankings proposed by the centrality analysis might be the source of infinite debates among Star Wars enthusiasts: for this reason, I prefer to avoid discussing the multiplex communities identified by our algorithm!