2020&2021: Factbook Automation Project
#R #MS Excel #ggPlot2 #automation #data visualization
§ Section 1. Project Title and Overview
Factbook Automation Project
§ Section 2. Purpose and Need
Research and develop an automated program to help reproduce the CSSEA Factbook with R and ggplot2.
§ Section 3. Business Divers and Significance
To produce various Factbooks on the data that CSSEA collects is one of the tasks of CSSEA each year. An automated program will save CSSEA great time and errors from human mistakes as the factbook involves a large number of charts and numbers.
It also improves the reliability and efficiency in producing charts and graphs with R compared to the traditional method with Excel. R can process a larger amount of data at a relatively faster speed than Excel.
I also write a user guide to help others use the program in the future.
§ Section 4. Benefits and Costs
Benefits:
Save labor from repeat works each year
Would be able to reuse the same program
Increase efficiency and reliability
Quick and fast
Costs:
Me working on the automation project
One analyst maintains and modifies the program
§ Section 5. Implementation Method
I created the program mainly with R, specifically, GGplot2. I managed to duplicate the same charts and graphs as the past year’s factbook with the same dataset and used that for the current year’s factbook with the latest dataset. The program will automatically work with the provided dataset. It cleans the data, find the variables needed, renames the variables, produces graphs and charts, accordingly, assembles graphs and charts to one page, and assembles all the pages into one complete pdf that is ready to print.
§ Section 6. Timeline
I worked with my supervisor on the project. He helped me better understand the charts and graphs, as well as supplied me with the right datasets. I independently write the whole automation program. It took me one month to work from home to finish developing and testing the program.
Power in Numbers
60
Pages
250
Charts
3500
lines