Creating a network using R

On January, 10 2016 David Bowie left this earthly realm. Last month I decided to create a network and here is how to do that.

Required packages

You need jsonlite, igraph, network, plyr and R base.

Other tools

D3Plus by Alex Simoes and Dave Landry. Also Google Sheets.

My data is here.

Loading packages

# 1: define the libraries to use
libraries <- c("jsonlite","igraph","network", "data.table", "plyr")

# 2: this is the function to download and or load libraries on the fly
download_and_or_load <- function(pkg){
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg))
    install.packages(new.pkg, dependencies = TRUE)
  sapply(pkg, require, character.only = TRUE)
}

# 3: use the function from step 2
download_and_or_load(libraries)

Building the network

D3Plus needs three files: data, edges and nodes to visualize networks.

Data

This is the easy part. I downloaded the sheet named "data" from my spredsheet in CSV format. Then I convert the CSV to JSON with these lines:

data <- read.csv("data.csv")
data <- toJSON(data, pretty = TRUE)
write(data, file = "bowie_data.json")

Edges

Here is a bit trickier.

I downloaded the sheet named "collaborations" from my spredsheet in CSV format. In this matrix \(M\) this is the meaning of the entries:

$$ m_{ij} = \begin{cases} 1 &\text{if } \text{artist } i \text{ and artist } j \text{ did collaborate to each other}\cr 0 &\text{othewise} \end{cases} $$

Then arrange the matrix to fix row names and column names:

bowie_collaborations <- read.csv("collaborations.csv")
rownames(bowie_collaborations) <- bowie_collaborations[,1]
bowie_collaborations <- bowie_collaborations[,-1]
colnames(bowie_collaborations) <- rownames(bowie_collaborations)

With the matrix ready I can create the network. You can try different layouts explained in igraph documentation. This is the code to create the network and display a static version of it:

bowie_gr <- matrix(unlist(bowie_collaborations), ncol = nrow(bowie_collaborations), byrow = TRUE)
rownames(bowie_gr) <- rownames(bowie_collaborations)
colnames(bowie_gr) <- colnames(bowie_collaborations)

bowie_gr <- which(bowie_gr > 0, arr.ind=TRUE)
bowie_gr.graph <- minimum.spanning.tree(graph.data.frame(bowie_gr, directed=F))
bowie_gr.names <- colnames(bowie_collaborations)[as.numeric(V(bowie_gr.graph)$name)]
bowie_gr.graph <- simplify(bowie_gr.graph, remove.multiple = T, remove.loops = T) 

set.seed(1234)
bowie_gr.layout <- layout_with_fr(bowie_gr.graph)
plot(bowie_gr.graph, edge.arrow.size=.3, vertex.label=bowie_gr.names, layout=bowie_gr.layout)

Now I do save the edges (names and ids) and the network layout:

write.graph(bowie_gr.graph, "exported_edges_bowie.csv", format=c("pajek"))
write.csv(bowie_gr.names, "exported_names_bowie.csv")
write.csv(bowie_gr.layout, "exported_coordinates_bowie.csv")

Finally I rearrange the edges to display names instead of numeric ids and save the result in JSON format:

network_names <- read.csv("exported_names_bowie.csv")
setnames(network_names, colnames(network_names), c("source_num","source"))
network_names$target_num <- network_names$source_num
network_names$target <- network_names$source

network_edges <- read.csv("exported_edges_bowie.csv", sep = " ")
network_edges <- network_edges[-1,]
setnames(network_edges, colnames(network_edges), c("source_num","target_num"))
network_edges <- join(network_edges, network_names[,c("source","source_num")], by = "source_num")
network_edges <- join(network_edges, network_names[,c("target","target_num")], by = "target_num")
network_edges <- network_edges[,c("source","target")]
source <- as.data.frame(network_edges$source)
colnames(source) <- "source"
target <- as.data.frame(network_edges$target)
colnames(target) <- "target"

network_edges <- data.frame(matrix(ncol = 1, nrow = nrow(network_edges)))
network_edges$source <- source
network_edges$target <- target
colnames(network_edges$source) <- "Artist"
colnames(network_edges$target) <- "Artist"

network_edges_json = toJSON(network_edges, pretty = TRUE)
write(network_edges_json, "bowie_edges.json")

Nodes

This is easier than the edges part. The code to save the nodes in JSON format with names instead of numeric ids is:

network_nodes <- read.csv("exported_coordinates_bowie.csv")
setnames(network_nodes, colnames(network_nodes), c("target_num","x","y"))
network_nodes <- join(network_nodes, network_names[,c("target","target_num")], by = "target_num")
network_nodes <- network_nodes[,c("target","x","y")]
setnames(network_nodes, colnames(network_nodes), c("Artist","x","y"))

network_nodes_json <- toJSON(network_nodes, pretty=TRUE)
write(network_nodes_json, "bowie_nodes.json")

Put your files in a D3Plus network template

In my case I decided to use bl.ocks.org to show my network. Use this template and edit the links to data, edges and nodes to make it to work.

<!doctype html>
  <meta charset="utf-8">
  <script src="https://d3plus.org/js/d3.js"></script>
  <script src="https://d3plus.org/js/d3plus.js"></script>
          <div id="network"></div>
          <script>
          var visualization = d3plus.viz()
          .container("#network")
          .data("bowie_data.json")
          .edges("bowie_edges.json")
          .nodes("bowie_nodes.json")
          .type("network")
          .resize(true)
          .id(["Genre","Artist"])
          .font({"family": "Lato"})
          .size(1)
          .depth(1)
          .color("Color")
          .title("David Bowie Collaborations")
          .tooltip({"value": ["Genre","Collaboration"],"size": false})
          .legend({"size": 32})
          .draw()
          </script>
<link href="https://fonts.googleapis.com/css?family=Lato:400,700" rel="stylesheet" type="text/css">

You can also use Roboto, another Google Font or just any typography you want.

Final result

After some edges editing in Atom (just aesthetic changes to put some edges closer to similar artists) the result is here.