How To Create A New Dataset From An Existing Dataset In R

When working with datasets in R, it is common to need to create a new dataset based on an existing one.

This can be useful in cases such as when you want to filter out specific data, create subsets for testing and training, or simply modify the structure of the data for better analysis. In this tutorial, we will demonstrate how to create a new dataset from an existing one using R.

Step 1: Importing the Dataset

First, let’s import an example dataset – the mtcars dataset that comes with R. This dataset contains information about various types of cars, including their miles per gallon (MPG), number of cylinders, and more.

Next, let’s have a look at the dataset using the head function as shown below.

Output:

Step 2: Selecting Specific Columns or Rows

One common way to create a new dataset from an existing dataset is by selecting specific columns or rows. To do this in R, you can use the subset function.

For example, let’s create a new dataset containing only cars with 6 or more cylinders. Here’s how to do it:

In this case, we are selecting all rows where the number of cylinders (cyl) is 6 or more.

Output:

Step 3: Selecting Specific Columns

To create a new dataset with only specific columns, use the select parameter in the subset function. For instance, let’s create a dataset that contains only the mpg, cyl, and hp columns.

This will create a new dataset containing only the selected columns – miles per gallon, number of cylinders, and horsepower.

Output:

Step 4: Filtering Rows Based on Multiple Conditions

You can also filter the dataset based on multiple conditions. For example, let’s create a new dataset containing only cars with 4, 6, or 8 cylinders and with MPG greater than or equal to 20.

This new dataset contains cars meeting both the conditions specified.

Output:

Full Code

Conclusion

In this tutorial, we covered how to create a new dataset from an existing dataset in R using the subset function. We demonstrated how to select specific columns, rows, and even filter the dataset based on multiple conditions.

With these techniques, you can easily create new datasets from existing ones, tailored to your specific analysis needs.