Showing posts with label R and Python. Show all posts
Showing posts with label R and Python. Show all posts

Sunday, May 30, 2021

Bar Chart Animation with R

In this blog, I illustrate about how we can build bar chart race in R that is used to visualize the evolution of a variable over time. I will show the evolution of public debt in top 25 countries in the world over the period of 2007 to 2020. The dataset has been taken from IMF WEO 2021 (April) database available here.   

The dataset as well as the R script files are available here

The first step is to import the dataset in R.

rm(list=ls())
setwd("C:/Users/SIDDHARAJ BHATTA/Desktop/R for Data Analysis")
debt <- read.csv("debt.csv")

In the second step, we convert the data from wide to long format by using the tidyr package.

library(tidyr)
names(debt)<-as.character(2003:2020)
debt_long <- gather(data=debt, key=Year, value=debt, 4:18, factor_key=TRUE)
debt_long$Year <- as.numeric(as.character(debt_long$Year))
debt_long$country<-debt_long$`2005`

In the third step, we rank the countries according to debt GDP ratio and filter the top 25 countries using dplyr package.

library(dplyr)
names(debt_long)
data<-debt_long %>%
  group_by(Year) %>%
  arrange(Year, -debt) %>%
  mutate(rank = 1:n()) %>%
  filter(rank <= 25)

 We will use ggplot2 and ggnimate package to plot the bar chart and animate it.

library(ggplot2)
library(gganimate)

data %>%  
  ggplot() +  
  aes(xmin = 0 ,  
      xmax = debt) +  
  aes(ymin=rank-0.45,
      ymax=rank+0.45,
      y = rank) +  
  facet_wrap(~ Year) +  
  geom_rect(alpha = .7, fill="darkblue") +  
  scale_x_continuous(  
    limits = c(0, 500),  
    breaks = c(0, 100, 300, 500)) +  
  geom_text(col = "darkblue",  
            hjust = "right",  
            aes(label = country),  
            x = -50, size=6) +  
  scale_y_reverse() +  
  labs(fill = NULL) +  
  labs(x = 'Debt GDP Ratio') +  
  labs(y = "") +
  theme_classic()->  
  my_plot

plot<-my_plot +
  theme_classic()+
  facet_null() +  
  scale_x_continuous(  
    limits = c(-500, 600),  
    breaks = c(0, 200, 400, 600)) +  
  geom_text(x = 500 , y = -2,  
            family = "Times",  
            aes(label = as.character(Year)),  
            size = 14, col = "darkblue") +
  geom_text(col = "white",  
            hjust = "right",  
            aes(label = debt),  
            x = 80, size=5) +
  ggtitle("Outstanding Public Debt  as percent of GDP")+
  xlab("Debt GDP Ratio")+
  ylab("Top 25 Countries")+
  theme(
    panel.background = element_rect(fill='lightblue', size=0.5,
                                    linetype=0, colour='lightblue'),
    plot.title = element_text(color="red", size=16, hjust=0.5, face="bold.italic"),
    axis.title.x = element_text(color="darkblue", size=13, face="bold"),
    axis.title.y = element_text(color="darkblue", size=13, face="bold")
  ) +
    gganimate::transition_time(Year)

animate(plot, nframes=14, fps=1, width=800)
 

The following lines of codes produce the Output in GIF Format. 

p1<-animate(plot, nframes=14,  fps = 1,  width = 800,
            renderer = gifski_renderer())

anim_save("animation.gif", animation = p1 )

The video explanation of this process is available here. 

I can be reached at siddhabhatta@gmail.com.


Friday, December 18, 2020

Bar Chart Race/Bar Chart Animation in R

R is a very powerful software for data visualization. In this  post, I present a simple case of how data can be visualized in Bar Chart Race in R. I have used the COVID cases data by country and showed the evolution of COVID cases in the 10 most affected countries during the last 350 days. 

rm(list=ls())  # removes the existing objects from the environment.  

# library used

library(tidyverse)
library(readxl)
library(dplyr)
library(gganimate) 

# setting working directory
setwd('C:/Users/siddhabhatta/Desktop/October31')
# data source : https://www.ecdc.europa.eu/en/covid-19/data

# importing data 

The data and R script can be downloaded from the link below:

https://drive.google.com/drive/folders/1HkPFE7v4fIx2rOOJnhE1E52rCQlGkigq?usp=sharing

 data=read_excel('coviddec14.xlsx')
 
# first few observations

head(data)

# creating a new date variable with standard date format

 data$date<-as.Date(data$dateRep, format="%m/%d/%y")
head(data$date)
 
# Making the country names short

data$country[data$country=="United_States_of_America"]<-"USA"
data$country[data$country=="United_Kingdom"]<-"UK"
data$country[data$country=="Cases_on_an_international_conveyance_Japan"]<-"Intl_CV_Center_Japan"   

#groupoing the data by country and date and finding cumulated total of cases per day
datanew<-data %>% # %>%  can be read as then   
  select(country, cases, date, continent) %>%  
   group_by(continent, country, date) %>%
  summarise(total=sum(cases)) %>%
  mutate(cumtotal=cumsum(total))
 # prepare data by ranks and filter the top 10 countries
 data2=datanew %>%
   group_by(date) %>%
   arrange(date, -cumtotal) %>%  
   mutate(rank = 1:n()) %>%  
  filter(rank <= 10)
# producing the static 350 ggplots 

data2 %>%  
  ggplot()+  
  aes(xmin = 0 ,  
      xmax = cumtotal) +  
  aes(ymin = rank - 0.45,  
      ymax = rank + 0.45,  
      y = rank) +  
  facet_wrap(~ date) +  
  geom_rect(alpha = .7) +  
  aes(fill = continent) +  
  scale_fill_viridis_d(option = "magma",  
                       direction = -1) +  
  scale_x_continuous(  
    limits = c(-5000000, 16000000),  
    breaks = c(-5000000, 0, 4000000, 8000000, 12000000, 16000000)) +  
  geom_text(col = "darkblue",  
            hjust = "right",  
            aes(label = country),  
            x = -100) +
  geom_text(col = "darkblue",  
            hjust = "right",  
            aes(label = paste(cumtotal), x=12000000)) +
    scale_y_reverse() +  
  labs(fill = NULL) +
  ggtitle("Evolution of Covid-19 Cases")+
  labs(x = "Covid Cases") +  
  labs(y = "Top 10 Countries") +  
  theme_classic() ->  
  my_plot
# saves the plot in the object my_plot

# animate the 350 frames by date and save it as p

 p<-my_plot +  
  facet_null() +  
  geom_text(x = 8000000 , y = -10,  
            family = "Times",  
            aes(label = as.character(date)),  
            size = 12, col = "green") +
    aes(group = country) +  
 transition_time(date)

#Animate p with total 350 frames and 5 frames per second

 animate(p, nframes=350, fps=5, width=1000)

Saving the results as gif format 

 gif<- animate(p, fps = 5,  width = 1000, height = 700,
        renderer = gifski_renderer("gganim.gif"), end_pause = 15, start_pause =  15)
anim_save("gganim.gif", animation = gif )
 

 Here is the output.


 And here is the video explanation.

Wednesday, December 9, 2020

Line Chart Animation in R

R is a powerful software environment for dealing with graphics. In this post, I illustrate the use of R for producing line chart animation. I will use Nepal Stock exchange data with 2205 daily observations.

The data and R script can be downloaded from here.

# It uses the following packages in R 

library(ggplot2)
library(lubridate)
library(dplyr)
library(gganimate)
library(tidyr)

 # First set the working directory 

 setwd("C:/Users/siddhabhatta/Desktop/October31")
# read the data by using the 'readxl' package.

library(readxl)
nepse=read_excel('nepse.xlsx')

head(nepse)

# save date as standard date format
nepse$new_date<-as.Date(nepse$date, format="%m/%d/%y")
head(nepse$new_date)
summary(nepse)

# produce a static line plot
ggplot(data=nepse, aes(x=new_date, y=close))+
  geom_line(color="blue",  size=1.0)+
  theme_classic()+
  ggtitle("Nepse Index Movement in Nepal")

# add aesthetics and labels to the plot  and save it as an object (p here) 
p<-nepse %>%
  ggplot(aes(x=new_date, y=close))+
  geom_line(color="blue",  size=1.0)+
  geom_point(size=5, color="green")+
  geom_text(aes(label=new_date),color="darkblue", fontface="bold", vjust=-2)+
  geom_text(aes(label=close),color="red",fontface="bold", vjust=-4)+
  theme_classic()+
  theme(plot.title = element_text(hjust = 0.5))+
  ggtitle("NEPSE Index Movement of the Past 2205 Days")+

  transition_reveal(new_date) # this last line produces the animation by date
animate(p, fps=2, nframes=500, width=1200) # number of frames 500 and frame per second is 2

# you can save the animation in gif format by usng the following line of codes
p1<-animate(p, nframes=500,  fps = 2,  width = 1200,
        renderer = gifski_renderer())
anim_save("animation.gif", animation = p1 )

Here is the output.

Here is the video explanation in my YouTube Channel.


 



Tuesday, November 24, 2020

Treemap in R

Treemap is a hierarchical chart in which higher values are represented by bigger rectangles. Such a chart can be created in R by using treemap package. 

For instance, we have the COVID-19 cases in Nepal by province and gender . The data looks like :

SN province  gender  cases 
1 Province_I Male  15614
2 Province_II Male  16066
3 Province_III Male  70088
4 Province_IV_ Male  8705
5 Province_V Male  16597
6 Province_VI Male  4480
7 Province_VII Male  8314
8 Province_I Female  8741
9 Province_II Female  3326
10 Province_III Female  44503
11 Province_IV_ Female  3732
12 Province_V Female  6997
13 Province_VI Female  1470
14 Province_VII Female  2842

We can make a simple treemap by creating this data in R and saving it as 'datafile' object and following  the  commands below . 

library(treemap)

treemap(datafile,
        index="province",
        vSize="cases",
       title="Distribution of CoviD-19 Cases in Nepal",
        type="index"
)

It will produce a chart like below :

 

 

To plot the treemap with labels, we can do a simple trick : find totals by province and the plot the treemap. 

 library(dplyr)
new<-datafile%>%
  group_by(province)%>%
  summarise(p_total=sum(cases))%>%
  mutate(newlab=paste(province, p_total, sep ="\n"))%>%
  treemap(index="newlab", vSize="p_total",
          title="Disribution of COVID-19 Cases in Nepal",
          palette = "Reds",      

            fontsize.title=12                                 
           )
It will produce the map as shown below : 

Finally, we can group the chart with gender and show the map with province and gender. This can be done by : 

treemap(datafile, index=c("province","gender"),    
        vSize="cases", type="index",
        fontsize.labels=c(15,12),                #
        fontcolor.labels=c("white","blue"),    #
        fontface.labels=c(2,1),                  #
        title="Disribution of COVID-19 Cases in Nepal",
        bg.labels=c("transparent"),              #
        align.labels=list(
          c("center", "center")),
          c("right", "bottom"),
                                       # place of labels in the rectangle
        overlap.labels=0.5,                      
        inflate.labels=F  )                    # If true, labels are bigger when rectangle is bigger.
     
It will produce the following map in R .

The R script file is available here.

 



Friday, November 20, 2020

Visualizing Data in Map with R

R is a powerful software environment for graphics and data visualization. Here I have used R to produce data visualization in the Map of Nepal.  I have plotted the poverty rate by districts in the map as shown below: 

 

Here is the video for creating the data visualization explained below.

Here is the link to YouTube : https://youtu.be/f26U2kwAWkQ

The datafiles and R script can be downloaded from here

The first step here is to install the necessary R packages. This can be done by running the following codes in R/R studio:

install.packages("cartography")
install.packages("sf")
install.packages("tidyverse")

Now let us bring the packages into the memory of R by running :

library(cartography)
library(sf)
library(tidyverse)

The second step is to set the working directory. Mine is 'map' folder in my desktop.
setwd("C:/Users/siddhabhatta/Desktop/map")

The third step is to download the  shape files for Nepal and extract them in the working directory.  We can simply find the shape files by googling. 

 # I found them for Nepal at https://codefornepal.carto.com/tables/shape_files_of_districts_in_nepal/public

Alternatively, the files can be downloaded from here.

# Simply download the files and put them in the working directory.  There may be more than one files with the format .cpg, .dbf, .shp, .prj and .shx. We need the files with .dbf and .shp extension

The next step is to import the data and save it as some object 'data' in my case.

 data<-st_read("shape_files_of_districts_in_nepal.shp")

The above data contains all the information for drawing a map of Nepal.

We next need the data that we want to visualize in the map. The next two line imports the data containing district wise poverty rates in R.  The poverty data file can be downloaded by clicking here.

library(haven)
poverty <- read_dta("poverty.dta")
View(poverty)
poverty$pov<-poverty$poverty*100 # converts the poverty rate into percentages
View(poverty)

Now, we combine the poverty data with the main data set that we have imported from the shape file of Nepal. This can be done by running the following command. The combined data is saved as 'mapdata'
mapdata<-merge(data, poverty, by="dist_name")
View(mapdata)
names(mapdata)

Now, the map can be produced with the following command. 

 plot(st_geometry(mapdata)) # plots the map of Nepal
choroLayer(x=mapdata, var="pov", method="quantile", nclass=5) # fills the districts with color as per the poverty rate.
layoutLayer(title="Poverty by Districts: Nepal ", tabtitle = TRUE, frame=TRUE, scale=6)

Here are minor changes in the plot. 

 plot(st_geometry(mapdata))
choroLayer(x=mapdata, var="pov", method="quantile", nclass=8, legend.title.txt = "Poverty")
layoutLayer(title="Poverty in Nepal by Districts", tabtitle = TRUE, frame=TRUE, scale=6)

#using ggplot package, the graph can be enhanced.

ggplot(data=mapdata)+geom_sf(aes(fill=pov), color="white")+
  scale_fill_viridis_c(option = "viridis", trans = "sqrt")+
  xlab("Longitute")+ ylab("latitude")+
  ggtitle("Poverty Rate of Nepal by Districts")

#To plot the labels, we extract the centroids and save them as X and Y by the following command.

 points<-cbind(mapdata, st_coordinates(st_centroid(mapdata$geometry)))

library(ggthemes) # used for changes background theme, title theme and others
ggplot(data = points) +
  geom_sf(aes(fill=pov), color="black", size=0.2) +
  scale_fill_viridis_c(option = "viridis", trans = "sqrt")+
  geom_text(data= points,aes(x=X, y=Y, label=paste(dist_name)),
            color = "darkblue", size=2.5, fontface = "italic", angle=0, vjust=-1, check_overlap = FALSE) +
  geom_text(data= points,aes(x=X, y=Y, label=paste(pov)),
            color = "white", size=2.0, fontface = "bold", angle=0, vjust=+1, check_overlap = FALSE) +
  ggtitle("Poverty Rate of Nepal by Districts")+xlab("Longitude")+ylab("Latitude")+

  theme(
    panel.background = element_rect(fill='lightblue', size=0.5,
                                        linetype=0, colour='lightblue'),
  plot.title = element_text(color="red", size=16, hjust=0.5, face="bold.italic"),
  axis.title.x = element_text(color="blue", size=10, face="bold"),
  axis.title.y = element_text(color="red", size=10, face="bold")
)

We can save the plot by using ggsave command.  

ggsave("map.png", width=6, height=6, dpi='screen')

ggsave("map.pdf", width=6, height=6, dpi='screen')

...............................................................................................................................................................

 If you are interested in creating such map with stata, here is the link.