Give a chestnut! Tableau Tips (106): Use R to implement cluster analysis
About cluster analysis:What is clustering? Clustering is a way to aggregate or group data. Clustering allows multiple variables to be used to create groupings (using the model k-means clustering). In Tableau
How to implement clustering analysis in? You can try R.
Tableau can do some advanced analysis through R: such as principal component analysis, factor analysis, cluster analysis, classification analysis, and so on. Related content can be clicked to view: Learn to use third-party tools in Tableau through examples.
Let us give an example of a cluster analysis scenario: as the living standards of Chinese residents continue to improve, consumer demand also continues to grow, but there are still certain regional differences in the consumption structure. If cities with similar consumption levels can be clustered into one category, it is easy to see the similarities and differences between cities.
Therefore, we want to use the cluster analysis method to study and analyze the consumption structure of urban residents in 31 provinces (municipalities and autonomous regions) in my country, and find the differences in the consumption structure of each region, so as to provide more effective decision-making basis for local governments.
Today's chestnuts will show the consumption level of urban residents through cluster analysis to share with you: use R to do cluster analysis in Tableau.
In this issue of "Give a Chestnut", the Tableau technique that Ada wants to share with you is: Use R to implement clustering analysis.
To facilitate learning, we use the 2012 statistical data of the China Statistical Yearbook (as shown in the figure below). If you need this data source to learn, please contact me by private message~
Tips: In order to eliminate the inherent differences in regional area, population, etc., and make the data analysis results more reasonable, the indicators here use the average annual consumption expenditure per capita of urban households in each region as the analysis object, that is, per capita value.
Specific steps are as follows:
1. Install R and connect to R in Tableau
Download R, and install:
install.packages(“Rserve”)
library(Rserve)
Rserve()
Tableau connects to R and enters the server and port number and confirms it.
2. Create a calculated field
Next, create the calculated field Cluster (as shown in the figure below). The function of this calculation field: We grouped these 31 provinces into 6 categories. The basis for clustering into 6 categories is: transportation and communication, medical care, household equipment supplies and services, housing, education, culture and entertainment services, miscellaneous goods and services, Clothing, food.
3. Create a chart
Drag "transportation and communication", "household equipment supplies and services", "education, culture and entertainment services", and "clothing" into the list;
Drag "healthcare", "residence", "miscellaneous goods and services", and "food" to the line;
Drag the calculated field Cluster into the "color" and "label";
Drag the area into the "tool tip" and modify the color, as shown in the figure below (Figure 1).
Drag the region into the column, and drag the calculation field Cluster into the row, color and text, as shown in the figure below (Figure 2).
From these two charts, we can see:
➤ Guangdong, Shanghai, Beijing, Zhejiang, four provinces and cities: "Medical and health care", "Education, culture and entertainment services", "Residence", "Food", "Miscellaneous goods and services", "Transportation and communications" are generally high , Belongs to the high consumer group.
➤ Tibet, Yunnan, Guizhou, Hainan, Anhui and other places: "Clothing", "Miscellaneous Commodity Services", "Residence", "Education, Culture and Entertainment Services", "Medical Care" are generally low, and the overall consumption level is low.
Have you gotten the Tableau skills in this issue? Give it a try!