Session 1 - \(\LaTeX\) and Quarto

Suggested citation: Parushya. (2024). “Employing the Olympic Shooter Meme for Quarto Adoption”. In Hashem and Parushya (Eds.), Math Camp 2024

Today’s Lab

  1. Good Coding

  2. \(\LaTeX\)

  3. Quarto

  • YAML
  • Code Chunks
  • Markdown text
  1. R Projects
  • here package
  1. Folder Structure

Good Coding

Good programming or coding is closely related to the idea of Literate Statistical Programming. As Donald Knuth (1984) defines, it is a way to write programs that focuses on explaining to human readers what we want the computers to do, rather than just instructing the computers to do so.

Statistical Programming , hence, is about formalizing your thinking about how you treat the data and using functional programming to automate such formalized tasks to be done repetitively. It improves efficiency, enhances reproducibility, and boosts creativity when it comes to finding new patterns in your data.

Guidelines for data and statistical analyses:1

  1. Accuracy: Write a code that reduces the chances of making an error and lets you catch one if it occurs.
  2. Efficiency: If you are doing it twice, see the pattern of your decision-making and formalize it in your code. Difference between Excel and coding
  3. Replicate-and-Reproduce: Ability to repeat the computational process which reflects your thinking and decisions that you took along the way. Improves transparency and forces one to be deliberate and responsible about choices during analyses.
  4. Human Interpretability: Writing code is not just about analyzing but allowing yourself and then others to be able to understand your analytic choices.
  5. Public Good: Research is a public good. And the code allows your research to be truly accessible. This means you write a code that anyone else who understands the language can read, reuse, and recreate without you being present. We essentially ensure that by writing a readable and ideally publicly accessible code.

Further, writing good code could also benefit from some common guidelines used across coders. A good starting point is the tidyverse style guide.

\(\LaTeX\)

\(\LaTeX\) (pronounced “LAY-tek” or “LAH-Tek) is a typesetting tool for preparing high-quality professional documents. It is the preferred typesetting tool used in high-end scientific documentation task.It is not a word-processing tool. It is a simple tool without too many priors about how the document should look like.

\(\LaTeX\) gives us superior control over how your document look like, has enhanced capabilities to write technical specifications (Maths, stats, proofs, etc.), include code, and produces readily editable back-end documents.

There are many interfaces that allow you to work with \(\LaTeX\). Overleaf is a widely used online platform and Texmaker is a popular offline application.

However, RStudio has in-built capability to double as a \(\LaTeX\) editor. Previously RMarkdown and now Quarto have capabilities that you can harness to achieve professional and beautifully typeset documents.

Think of writing an equation like:
\[ Violence_{i,j} = \beta_0 + \beta_1EthnicFractionalization_i + \gamma_j + \epsilon_i \] In Latex, using quarto, you have to write something like the following:

$Violence_{i,j} = \\beta_0 + \\beta_1EthnicFractionalization_i + \\gamma_j + \\epsilon_i$

For a single line of text we encapsulate code by $ sign.

For multi-line code we use $$.

Read more about \(\LaTeX\) here

The box folder has some detailed resources for helping with typesetting in \(\LaTeX\).

To Do

Follow these instructions to install library(tinytex).

This can also happen, btw!

Quarto

Quarto is a literate statistical programming tool.

Quarto can include code from not just R, but also Python, Julia, Stata and many other languages/tools.

Quarto allows you to include the good coding guidelines that we discussed above. It provides you with capability to write code and perform data analysis using R, write text that is part of any professional communication, and include mathematical symbols and equations in a well typeset format. Essentially, it allows you to work on a manuscript with data analysis at one place.

Here is some cool stuff that you can do with quarto.

Exercise 1

  1. Open a new quarto document by File > New File > Quarto Document.
  2. Use Render button on top on scripts panel to save and get a .pdf output.

A Quarto document is saved as a .qmd file. You can edit this file in two ways: Programmatically by being in source button and visually by choosing the Visual button, both button on top left corner of the .qmd window. More details about workign with Quarto can be found on the quarto website here.

There are three building blocks in a .qmd file:

YAML

Short for Yet-Another-Markup-Languge

This is the part we see sandwiched between two --- at the strat of .qmd file. Here we define different global settings for the particular document.

Currently, we see

---
title: "Untitled"
format: html
---

We can add many more options here to modify the details to appear at the start of the document. Here’s an example from quarto reference site

---
title: "Toward a Unified Theory of High-Energy Metaphysics"
date: 2008-02-29
author:
  - name: Josiah Carberry
    id: jc
    orcid: 0000-0002-1825-0097
    email: josiah@psychoceramics.org
    affiliation: 
      - name: Brown University
        city: Providence
        state: RI
        url: www.brown.edu
abstract: > 
  The characteristic theme of the works of Stone is 
  the bridge between culture and society. ...
keywords:
  - Metaphysics
  - String Theory
license: "CC BY"
copyright: 
  holder: Josiah Carberry
  year: 2008
citation: 
  container-title: Journal of Psychoceramics
  volume: 1
  issue: 1
  doi: 10.5555/12345678
funding: "The author received no specific funding for this work."
---

Or, global settings for different formats of outputs like html or pdf, as follows

---
title: "My Document"
format: 
  html:
    fig-width: 8
    fig-height: 6
  pdf:
    fig-width: 7
    fig-height: 5
---

Code Chunks

You can start a new R code chunk by pressing cmd + option + I or ctrl + alt + I.

You can also do this with the Insert button icon in the editor toolbar or by manually typing the chunk delimiters ```{r} and ```.

Try to use the keyboard shortcut more often as it will save you a ton of time later.

R code chunks are surrounded by ```{r} and ```.

You can run each code chunk by clicking the Run icon (it looks like a play button at the top of the chunk), or by pressing Cmd/Ctrl + Shift + Enter.

#| eval: true # Do evaluate this chunk
#| echo: true # Do show this chunk in the final rendered document
#| output: true # Do show the output / results of this chunk in the rendered document

print("Dont run this code")

RStudio executes the code and displays the results below the code.

If you don’t like seeing your plots and output in your document and would rather make use of RStudio’s Console and Plot panes, you can click on the gear icon next to “Render” and switch to “Chunk Output in Console”.

A chunk should be relatively self-contained, and focused around a single task.

Exercise

  1. Add a code chunk at the bottom of the .qmd file you created.

  2. Add some simple mathematical operations.

  3. Run the code chunk separately, and then the whole file by pressing Render button from the top.

Code chunk options are included in a special comment at the top of the block (lines at the top prefaced with #| are considered options). More on code chunk options here

Options available for customizing output include:

Option Description
eval Evaluate the code chunk (if false, just echos the code into the output).
echo Include the source code in output
output Include the results of executing the code in the output (true, false, or asis to indicate that the output is raw markdown and should not have any of Quarto’s standard enclosing markdown).
warning Include warnings in the output.
error Include errors in the output (note that this implies that errors executing code will not halt processing of the document).
include Catch all for preventing any output (code or results) from being included (e.g. include: false suppresses all output from the code block).

You can also add these options as global options in the YAML by writing them under execute option like:

---
execute: 
  echo: true
  inlcude: false
---

The following table summarizes which types of output each option suppresses:2

Option Run code Show code Output Plots Messages Warnings
eval: false X X X X X
include: false X X X X X
echo: false X
results: hide X
fig-show: hide X
message: false X
warning: false X

Inline code

We can also embed R code into a Quarto document: directly into the text, with: ```{r} <code> ```.

For example: ```{r} (2+2)```.

Markdown Text

Markdown text is like any other text just with some special considerations.

You can see the help section from R to see some of the basic formatting tips.

R Markdown Help

These are some of the regularly used formatting options in RMarkdown/Quarto Titles and subtitles ————————————————————

# Title 1

## Title 2

### Title 3


Text formatting 
------------------------------------------------------------

*italic*  

**bold**   

`code`

Lists
------------------------------------------------------------

* Bulleted list item 1
* Item 2
  * Item 2a
  * Item 2b

1. Item 1
2. Item 2

Links and images
------------------------------------------------------------

Practice

Let’s try all that we learnt

Exercise

  1. Delete the existing code, except yaml on top, in the .qmd file that we created today.

  2. Add some simple mathematical operations like addition, subtraction, or mutliplication. Now, in the chunk set the options differently. You could play with different options that we learnt above and their values. Use <TAB> button to see different values that you can provide to chunk options.

```{r}
#| echo: true
#| output: asis

1 + 1
```

[1] 2

  1. Add two separate R chunks. In one, load the datset from the paper that you want to replicate. In second, add a simple select or filter functionality.
```{r}
#| echo: false 
#| message: false
#| warning: false

# Loading packages
library(tidyverse) # For tidyverse
library(janitor) # For Janitor

# Loading Dataset
vdem_df <- readRDS("Datasets-mathcamp/V-Dem-CY-Full+Others-v12.rds") %>% 
clean_names() 

# ` %>% ` is the piping operator from tidyverse universe

# `clean_names` cleans the names of columns and standardizes them | from Janitor package
```
```{r}
#| echo: false  # Toggle with options as well the paramters like true/false etc
#| message: false
#| warning: false
vdem_2021 <- vdem_df %>% 
  filter(year == 2021) %>%  # To filter values according to one variable
  select(year, country_name, v2x_libdem, e_gdppc, e_pop) # To select particular variables
```
  1. Write the model specification that is mentioned in your paper in \(\LaTeX\) in your quarto document. An example is given below. You can use the resources on latex from course Canvas page.

\(LiberalDemocracy_i = \alpha + \beta_1GDPpc_i + epislon_i\)

$LiberalDemocracy_i = \alpha + \beta_1GDPpc_i + epislon_i$


  1. Render the whole document into a .pdf with your name in the YAML and today’s date. You may use the following YAML with modifications.


Below is a yaml that you can use in your assignments and documents with modifications.

---
title: My First Latex Document
subtitle: Govt 8003
author: <Your Name>
date: today
format:
  pdf:
    highlight-style: kate
    citation_package: natbib
  docx: default
always_allow_html: true
geometry: margin=1.2in
fontsize: 12pt
linestretch: 1.5
linkcolor: blue
toc: true
link-citations: true
editor_options: 
  chunk_output_type: console
execute: 
  fig-height: 6
  fig-width: 8.5
  fig-pos: "!t"
keep_tex: true
whitespace: small
---

“If you think your thought is not making sense, write it in** \(\LaTeX\).

It will at now not make sense in a beautiful way

-Buddha (500 B.C.E.)

R Projects

We often use the setwd() command to trace the files we need in our work. As work expands, projects will have multiple datasets to be loaded, different subsidiary scripts to be used, and multiple outputs to be saved.

A first order problem related to both file management and reproducability of code is the usage of file paths. Using absolute paths, like ~/User/MyName/Documents/..... becomes cumbersome and also inhibits efficiency of reproducability. Every time someone else runs the script, they will have to change the file paths in all the instances in Rscripts or .qmd file to locate the related datasets as well as other objects. Similarly, there would be issues with saving objects in new places. A partially efficient way we use involves using setwd() to direct R to a new working directory; this is also called usage of relative paths

R Projects is a built-in mechanism in RStudio for seamless file management and usage of relative paths.

Let’s start by creating a new project. Click File > New Project. Name the new project govt-8003.

Figure 1: To create new project: (top) first click New Directory, then (middle) click New Project, then (bottom) fill in the directory (project) name, choose a good subdirectory for its home and click Create Project. source

Exercise

  1. Close the new project just created

  2. Go to the folder on your system, and click the .RProj file.

  3. Start a new qmd file like we did before. Delete existing code except for YAML. Run getwd() command in console and see the difference.

  4. Start a new R code chunk (cmd + option + I) and load replication dataset. Notice the change in behavior when you press TAB inside the quotes for selecting path.

here package

An efficient file and folder management system is going to be crucial as we move into working with serious projects. As stressed earlier, keeping and using all the files associated with a project in a comprehensible folder system is facilitated by R Projects. You would ideally want to create your own template for folder management that you follow across projects. For starters, the folder structure below is the one created for your data essay assignment in Govt 8001 or Quant 1.

You can use the point-and-click functionality in your computers to create this structure. Later today, we will briefly go through an R script that do this programmatically.

📦 govt-8003
├─ govt-8003.RProj
├─ 000-setup.R
├─ 001-eda.qmd
├─ 002-analysis.qmd
└─ 003-manuscript.qmd
├─ Data
│  ├─ Raw
│  │  ├─ Dataset1
│  │  │  ├─ dataset1.csv
│  │  │  └─ codebook-dataset1.pdf
│  │  └─ Dataset2
│  │     ├─ ...dta
│  │     └─ codebook-dataset2.pdf
│  └─ Clean
│     └─ Merged-df1-df2.csv
├─ Scripts
│  ├─ R-scripts
│  │  ├─ plotting-some-variable.R
│  │  └─ exploring-different-models.R
│  ├─ Stata-Scripts
│  │  └─ seeing-variable-labels.do
│  └─ Python-Scripts
│     └─ scraping-data-from-website.py
└─ Outputs
   ├─ Plots
   │  ├─ ...jpeg
   │  └─ ...png
   ├─ Tables
   │  └─ .csv
   └─ Text
      └─ ...txt

Suggested folder structure for a new academic project

While we learnt how to create or associate an .RProj with a folder, integrating it with here() function from the here package, makes things further smoother. Let’s do it with the following exercise.

Exercise

  1. Go the RStudio window with govt-8003 project. Check the extreme upper left corner to see if you are in the correct window.

  2. In the qmd file we were working in, add an R chunk.

  3. Load the library here with the following code. Run the code line by line

library(here)


 # See the output for each of the following lines | Use your own datasets
here()

# Make modification here after copying your dataset to this folder

here("Datasets-mathcamp","V-Dem-CY-Full+Others-v12.rds")

# syntax is

# here("First subfolder from the root folder", "second subfolder",...., "file")


vdem_new <- readRDS(here("Datasets-mathcamp","V-Dem-CY-Full+Others-v12.rds"))

This is a cleaner syntax which when coupled with usage of R projects saves time in typing file paths and avoids issues when the project is run on some other computer system.

Note: here() always notes the path from the main folder or the root directory where your .RProj file is located.

Save the files and close the govt-8003 project window

Make it a habit of using R Projects and here() function in your scripts for writing portable code.

You can read this quick and informative blogpost on using these two here.

Folder Structure

We ideally want a folder structure that is easily understandable to us and others.

📦 govt-8003
├─ govt-8003.RProj
├─ 000-setup.R
├─ 001-eda.qmd
├─ 002-analysis.qmd
└─ 003-manuscript.qmd
├─ Data
│  ├─ Raw
│  │  ├─ Dataset1
│  │  │  ├─ dataset1.csv
│  │  │  └─ codebook-dataset1.pdf
│  │  └─ Dataset2
│  │     ├─ ...dta
│  │     └─ codebook-dataset2.pdf
│  └─ Clean
│     └─ Merged-df1-df2.csv
├─ Scripts
│  ├─ R-scripts
│  │  ├─ plotting-some-variable.R
│  │  └─ exploring-different-models.R
│  ├─ Stata-Scripts
│  │  └─ seeing-variable-labels.do
│  └─ Python-Scripts
│     └─ scraping-data-from-website.py
└─ Outputs
   ├─ Plots
   │  ├─ ...jpeg
   │  └─ ...png
   ├─ Tables
   │  └─ .csv
   └─ Text
      └─ ...txt

We can create this structure by using point and click system on our laptops. But since we might want to use the same folder structure repetitively it will make sense to be lazy and do it programmatically.

Exercise

  1. Download the 000-setup.R from here

  2. Place it in the govt-8003 folder.

  3. Open it in the opened RStudio window.

```{r}
# Name: 000-setup.R
# Author: Parushya
# Purpose: Creates main folders, subfolders in the main project directory
# Will also ensure that you have basic packages required to run the repository
# Date Created: 2020/10/07



# Checking if packages are installed and installing


# check.packages function: install and load multiple R packages.
# Found this function here: https://gist.github.com/smithdanielle/9913897 on 2019/06/17
# Check to see if packages are installed. Install them if they are not, then load them into the R session.

check.packages <- function(pkg) {
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg)) {
    install.packages(new.pkg, dependencies = TRUE)
  }
  sapply(pkg, require, character.only = TRUE)
}

# Check if packages are installed and loaded:
packages <- c("janitor",  "tidyverse", "utils", "here")
check.packages(packages)


# Setting Directories and creating subfolders


# Creating Sub Folders

## Data
dir.create(file.path(paste0(here("Data")))) # Data Folder
dir.create(file.path(paste0(here("Data","Raw")))) # Raw Data sub-folder
dir.create(file.path(paste0(here("Data","Clean")))) # Clean Data sub-folder


# Scripts
dir.create(file.path(paste0(here("Scripts")))) # Scripts Folder
dir.create(file.path(paste0(here("Scripts","RScripts")))) # RScripts  sub-folder
dir.create(file.path(paste0(here("Scripts","Stata-Scripts")))) # Stata Scripts sub-folder
dir.create(file.path(paste0(here("Scripts","Python-Scripts")))) # Python Scripts sub-folder


# Output
dir.create(file.path(paste0(here("Outputs")))) # Outputs Folder
dir.create(file.path(paste0(here("Outputs","figures")))) # Figures sub-folder
dir.create(file.path(paste0(here("Outputs","tables")))) # Tables sub-folder
dir.create(file.path(paste0(here("Outputs","text")))) # Text sub-folder

```
  1. Run the file line-by-line. See the folder structure created in your main folder.

Plan Concept of a Plan

Here’s a quick workflow for starting a new project or assignment or paper.

  1. Create a new Rstudio Project by clicking File > New Project. Name it govt-<coursecode>-<project.

  2. Check if now your RStudio Window shows the project name on top right corner. If not, go to folder and double-click the .RProj file.

  3. Paste the 000-setup.R file in the main project folder. Open it in the same Rstudio window with the project and run the complete file. Your folder structure is created.

  4. Copy your raw data in Data/Raw folder. Similarly, your scripts in Scripts/RScripts folder

  5. Start your new .qmd file and save it in the main folder.

  6. Remember to use here() package extensively in both, scripts and quarto file, when loading or saving the data.

  7. You can always zip the whole project folder for sharing. The receiver will just need to unzip and run the code after starting the associated .RProj file, without changing file paths on their computer.


  1. Inspired by the summary provided by Prof Aaron Williams’ course on Data Analysis offered at McCourt School. Strongly recommended to learn good coding using R↩︎

  2. This section is copied from R4DS book↩︎