Introduction
Often, datasets you work with in R can have missing values. In this blog, we’ll show you how to check an R dataset for missing values. We'll also show you to use DataEditR if you need to fill in missing data.
Install Libraries
You can begin by installing these two libraries if you haven’t already:
install.packages("ggplot")
install.packages("DataEditR")
Access Dataset
Let’s call up the mammalian sleep dataset:
library(ggplot2)
View(msleep)
In RStudio, the dataset will now appear at the top left of your screen. If you wanted to edit the data interactively, assuming you already have DataEditR installed, try:
library(DataEditR)
data_edit(msleep)
Check for Missing Values
A good beginning is to see how many values are missing by variable:
colSums(is.na(msleep))
Here’s what you get:
You can now see how many values are missing for each variable. You could also get a count on the total number of missing values in the dataset by trying:
sum(is.na(msleep))
Finally, you might have specific ideas about which values are missing. Let’s say that you believe the 16th and 17th values of brainwt are missing. You could try:
is.na(msleep$brainwt[16:17])
You could see if the 20th value of sleep_cycle was missing:
is.na(msleep$sleep_cycle[20])
As you can see, R has many options for listing missing values.
BridgeText can help you with all of your statistical analysis needs.