When working with datasets in R, it is common to need to create a new dataset based on an existing one.
This can be useful in cases such as when you want to filter out specific data, create subsets for testing and training, or simply modify the structure of the data for better analysis. In this tutorial, we will demonstrate how to create a new dataset from an existing one using R.
Step 1: Importing the Dataset
First, let’s import an example dataset – the mtcars
dataset that comes with R. This dataset contains information about various types of cars, including their miles per gallon (MPG), number of cylinders, and more.
1 |
data(mtcars) |
Next, let’s have a look at the dataset using the head
function as shown below.
1 |
head(mtcars) |
Output:
Step 2: Selecting Specific Columns or Rows
One common way to create a new dataset from an existing dataset is by selecting specific columns or rows. To do this in R, you can use the subset
function.
For example, let’s create a new dataset containing only cars with 6 or more cylinders. Here’s how to do it:
1 |
dataset_cyl6 <- subset(mtcars, cyl >= 6)<br>head(dataset_cyl6) |
In this case, we are selecting all rows where the number of cylinders (cyl
) is 6 or more.
Output:
Step 3: Selecting Specific Columns
To create a new dataset with only specific columns, use the select
parameter in the subset
function. For instance, let’s create a dataset that contains only the mpg
, cyl
, and hp
columns.
1 |
dataset_select_cols <- subset(mtcars, select = c(mpg, cyl, hp))<br>head(dataset_select_cols) |
This will create a new dataset containing only the selected columns – miles per gallon, number of cylinders, and horsepower.
Output:
Step 4: Filtering Rows Based on Multiple Conditions
You can also filter the dataset based on multiple conditions. For example, let’s create a new dataset containing only cars with 4, 6, or 8 cylinders and with MPG greater than or equal to 20.
1 |
dataset_multi_cond <- subset(mtcars, (cyl %in% c(4, 6, 8)) & (mpg >= 20))<br>head(dataset_multi_cond) |
This new dataset contains cars meeting both the conditions specified.
Output:
Full Code
1 2 3 4 5 6 7 8 9 10 11 |
data(mtcars) head(mtcars) dataset_cyl6 <- subset(mtcars, cyl >= 6) head(dataset_cyl6) dataset_select_cols <- subset(mtcars, select = c(mpg, cyl, hp)) head(dataset_select_cols) dataset_multi_cond <- subset(mtcars, (cyl %in% c(4, 6, 8)) & (mpg >= 20)) head(dataset_multi_cond) |
Conclusion
In this tutorial, we covered how to create a new dataset from an existing dataset in R using the subset
function. We demonstrated how to select specific columns, rows, and even filter the dataset based on multiple conditions.
With these techniques, you can easily create new datasets from existing ones, tailored to your specific analysis needs.