A research team were investigating the key factors that enable Rotating Savings and Credit Associations (ROSCAs) to efficiently function in developed economies with advanced financial systems.
The team required analysis to be performed on a set of data obtained from participants filling out a survey.
Survey results required analysis to understand correlations between each subject area and question of the survey.
Analysis needed to be presented to stakeholders in a readable format without excessive technical jargon.
The survey consisted of 26 multiple-choice questions, some positioned with negative and positive sentiments. Each question was assigned a sentiment and categorised into 5 variables:
Demographic
Regulation
Technology
Risk
Product
The method of analysis chosen was Principal Component Analysis (PCA) as this could measure the degree of correlation between multiple variables and plot this using a conventional 2-D graph.
Ranks for each aspect of the survey were categorised as shown in the image to the right. Each completed survey was updated with the ranks for processing in the PCA model.
The processed data can then be used to formulate a picture describing the aggregated relationships between survey questions.
From the image on the left, the blue/green squares show positively correlated questions. Conversely, the regions in gold/brown identify the negative correlations. Only 2 survey questions are analysed for each square.
The diagonal squares from the top left to the bottom right of the image show exact positive correlations which is expected since it is a comparison between the same question.
Upon further inspection, certain patterns can be identified that highlight the clusters of questions with positive and negative correlations. These can be viewed by the blue and red regions.
Since there are multiple questions, we must identify themes from the survey respondents. Themes are essential to uncover the collective understanding and highlight the underlying relationships between each of the 5 variables.
The image on the right provides the resulting PCA circle where the answers for each survey are aggregated and plotted against each other by directional arrows. The longer the arrow, the greater the influence over the dataset.
Although slightly complex, this view reveals the cluster of codes that are positively and negatively correlated with each other, furthering the analysis seen from the heatmap.
PCA Circle Correlations
Positive
Cluster 1, consisting of T1, T2, Ri5 and R8 reveals that participants who believed technology can play a crucial role in enhancing efficiency, transparency and accountability also believed credit risk would be a key component in ROSCA operations that would need to be regulated to function in developed economies.
Cluster 2 including T3, Ri1, T4 and R1 identifies that ROSCAs are not included within the scope of regulation in the participant's region. However, participants believe that risk management strategies can be implemented to protect contributions where technology could be leveraged to identify and mitigate risk while supporting the scalability and growth of ROSCAs.
Whereas cluster 3 reveals the relationship between Ri2, R2, R7, R6, Ri3, R3, R5 and P2 where participants believed that ROSCAs could be integrated into the financial system benefitting from clear regulatory frameworks. Furthermore, participants agreed risk mitigation measures such as transparency and accountability would improve the socially responsible standards within the ROSCA. Finally, participants also highlighted that ROSCAs provide individuals with access to credit and savings vehicles that they would otherwise not have with financial institutions in their country.
Negative
Cluster 4 shows the mix of negatively correlated codes where demographic information, D1 and D2, hold a strong negative correlation against P3, P4 and R4.
Intriguingly, with the absence of demographic variables, all variables are located toward the right side of the PCA Circle with the exception of Ri7. Comparing this with the Heatmap, a pattern emerges where this variable tends to negatively correlate with all other variables explaining why it is separated from the rest of the projectiles.
PCA Biplot
The biplot diagram to the left provides the final part of the analysis exhibiting the resulting PCA, plotting each participants’ overall response shown by the black points overlayed by the projections from the PCA circle. This enables us to understand the contributing factor each variable holds on the observations.
From this visual, it is easier to determine outliers in the observations and inspect for further understanding or assess the clusters to determine relationships and patterns. For instance, observation 43, 26 and 15 seem to be outliers as they are further away from the cluster of observations.