How To Extract Data From An XML File Using Python

In this tutorial, we’ll learn how to extract data from an XML file using Python. XML (eXtensible Markup Language) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.

XML is widely used for data exchange among various applications and systems over the internet. Let’s explore how we can extract data from an XML file using Python’s built-in ElementTree library.

Example XML file:

<movies>
  <movie id="1">
    <title>The Godfather</title>
    <year>1972</year>
    <director>Francis Ford Coppola</director>
  </movie>
  <movie id="2">
    <title>Pulp Fiction</title>
    <year>1994</year>
    <director>Quentin Tarantino</director>
  </movie>
</movies>

Step 1: Import Required Libraries

First, we will import the ElementTree library; it will be used for parsing the XML file and extracting the required data.

Step 2: Load and Parse the XML File

In this step, we will load the XML file and parse its contents using the ElementTree library.

The parse() function reads the XML file, and the getroot() function returns the root element of the XML tree.

Step 3: Extract Data from the XML File

Now that we have the root element of the XML tree, we can iterate through its child elements and extract the data.

The attrib attribute contains the element’s attributes as a dictionary. The find() function searches for the specified subelement and returns its first occurrence. The text attribute returns the text content of an element.

Full Code:

Output:

Movie ID: 1, Title: The Godfather, Year: 1972, Director: Francis Ford Coppola
Movie ID: 2, Title: Pulp Fiction, Year: 1994, Director: Quentin Tarantino

Conclusion

In this tutorial, we’ve learned how to extract data from an XML file using Python’s built-in ElementTree library. This method can be applied to various other XML files to extract information as required.