
Day 5 - File Management and Workflow
Software Session
Using R Projects
here
packageFolder Structure
Takeaways
R Projects
We used the setwd()
command till now to trace the files we need in our work. As your work expands, projects will have multiple datasets to be loaded, different subsidiary scripts to be used, and multiple outputs to be saved.
A first order problem related to both file management and reproducability of code is the usage of file paths. Using absolute paths, like ~/User/MyName/Documents/.....
becomes cumbersome and also inhibits efficiency of reproducability. Every time someone else runs the script, they will have to change the file paths in all the instances in Rscripts or .qmd
file to locate the related datasets as well as other objects. Similarly, there would be issues with saving objects in new places. A partially efficient way we used till now involved using setwd()
to direct R to a new working directory; this is also called usage of relative paths
R Projects is a built-in mechanism in RStudio for seamless file management and usage of relative paths.
Letβs start by creating a new project. Click File > New Project
. Name the new project govt-8001-dataessay
.

here
package
An efficient file and folder management system is going to be crucial as we move into working with serious projects. As stressed earlier, keeping and using all the files associated with a project in a comprehensible folder system is facilitated by R Projects. You would ideally want to create your own template for folder management that you follow across proejcts. For starters, the folder structure below is the one created for your data essay assignment in Govt 8001 or Quant 1.
You can use the point-and-click fucntionality in your computers to create this strcuture. Later today, we will briefly go through an R script that do this programmatically.
π¦ govt-8001-dataessay
ββ govt-8001-dataessay.RProj
ββ 000-setup.R
ββ 001-eda.qmd
ββ 002-analysis.qmd
ββ 003-manuscript.qmd
ββ Data
β ββ Raw
β β ββ Dataset1
β β β ββ dataset1.csv
β β β ββ codebook-dataset1.pdf
β β ββ Dataset2
β β ββ ...dta
β β ββ codebook-dataset2.pdf
β ββ Clean
β ββ Merged-df1-df2.csv
ββ Scripts
β ββ R-scripts
β β ββ plotting-some-variable.R
β β ββ exploring-different-models.R
β ββ Stata-Scripts
β β ββ seeing-variable-labels.do
β ββ Python-Scripts
β ββ scraping-data-from-website.py
ββ Outputs
ββ Plots
β ββ ...jpeg
β ββ ...png
ββ Tables
β ββ .csv
ββ Text
ββ ...txt
Suggested folder structure for a Quant-1 project
While we learnt how to create or associate an .RProj
with a folder, integrating it with here()
function from the here
package, makes things further smoother. Letβs do it with the following exercise.
Make it a habit of using R Prohects and here()
function in your scripts for writing portable code.
You can read this quick and informative blogpost on using these two here.
Folder Structure
Letβs look at the other opened RStudio window. This is the one associated with govt-8001-dataessay
.
We ideally want a folder structure that is easily understandable to us and others.
π¦ govt-8001-dataessay
ββ govt-8001-dataessay.RProj
ββ 000-setup.R
ββ 001-eda.qmd
ββ 002-analysis.qmd
ββ 003-manuscript.qmd
ββ Data
β ββ Raw
β β ββ Dataset1
β β β ββ dataset1.csv
β β β ββ codebook-dataset1.pdf
β β ββ Dataset2
β β ββ ...dta
β β ββ codebook-dataset2.pdf
β ββ Clean
β ββ Merged-df1-df2.csv
ββ Scripts
β ββ R-scripts
β β ββ plotting-some-variable.R
β β ββ exploring-different-models.R
β ββ Stata-Scripts
β β ββ seeing-variable-labels.do
β ββ Python-Scripts
β ββ scraping-data-from-website.py
ββ Outputs
ββ Plots
β ββ ...jpeg
β ββ ...png
ββ Tables
β ββ .csv
ββ Text
ββ ...txt
We can create this structure by using point and click system on our laptops. But since we might want to use the same folder structure repetitively it will make sense to be lazy and do it programmatically.
Takeaways
Hereβs a quick workflow for starting a new project or assignment or paper.
Make a new folder in your computer with apt name. Ideally,
govt-<coursecode>-<project>
.Start RStudio.
Create a new Rstudio Project by clicking
File > New Project
. Name itgovt-<coursecode>-<project
.Check if now your RStudio Window shows the project name on top right corner. If not, go to folder and double-click the
.RProj
file.Paste the
000-setup.R
file in the main project folder. Open it in the same Rstudio window with the project and run the complete file. Your folder structure is created.Copy your raw data in
Data/Raw
folder. Similarly, your scripts inScripts/RScripts
folderStart your new
.qmd
file and save it in the main folder.Remember to use
here()
package extensively in both scripts and quarto file when loading or saving the data.You can always zip the whole project folder for sharing. The receiver will just need to unzip and run the code after starting the associated
.RProj
file, without changing file paths on their computer.