Introduction to Exploratory Data Analysis (EDA) using R
UGC - Malaviya Mission
Teacher Training Center
GJU of S&T, Hisar, Haryana
✅ R Software, https://cloud.r-project.org/
✅ RStudio IDE, https://posit.co/products/open-source/rstudio/?sid=1
✅ Quarto, https://quarto.org/docs/get-started/
✅ Create RStudio Project in your computer
✅ Create quarto document
✅ Create folder data & save file penguins.csv
penguinsLive on three island: Biscoe, Dream, & Torgersen.


“how to use visualization and transformation to explore your data in a systematic way”
Install R Packages only once:
You can import from CSV, Excel, or other sources.

Missing values
Duplicate rows
Inconsistent column names
“It is the tendency of the values of a variable to change from measurement to measurement or across different subjects or at different times.”
“Every variable has its own pattern of variation, which can reveal interesting information about how it varies between measurements on the same observation as well as across observations.”
What type of variation occurs within my variables?
What type of co-variation occurs between my variables?
Explore each variable individually
Use ggplot2 for more control
Explore relationships between variables
An outlier is a data point that differs markedly from the rest of the data.
(if needed)

925 315 2024
SARA Institute of Data Science,
Dr. Ambedkar Bhawan, Kakroi Road, Near Dayanand Hospital,
Sonipat - 131001, Haryana, India.
Thank-You!
