{ "cells": [ { "cell_type": "markdown", "id": "96468265", "metadata": {}, "source": [ "# File Handling\n", "\n", "In this episode we will cover some aspects of working with files in Python.\n", "\n", "\n", "## File\n", "\n", "A computer file is a collection of data that is stored as a single unit and is identified by a filename. The data within a file is organized in a specific format or structure, and this organization is defined by a file format.\n", "\n", "The term \"byte\" refers to a unit of digital information that consists of 8 bits. Files are composed of bytes, and the arrangement and interpretation of these bytes are dictated by the file format. Different types of files (such as text files, image files, or executable files) have different rules and structures for how the bytes are organized.\n", "\n", "I will categorize files into two distinct groups:\n", "1. Text files\n", "2. Binary files\n", "\n", "The key distinction between these two types lies in how the data is represented and stored within the files.\n", "\n", "\n", "1. Text Files:\n", " * **Content:** Text files contain human-readable text and are often used to store plain text documents.\n", " * **Character Encoding:** Text files use character encoding (e.g. ASCII, UTF-8) to represent characters.\n", " * Examples: `.txt`, `.html` `.css`\n", "2. Binary files:\n", " * **Content:** Binary files contain non-human readable data, which may include images, audio, video, or program executables.\n", " * **Representation:** Binary files store data in a format not directly interpretable by humans. They may include a combination of text and binary data.\n", " * **Examples:** `.jpg`, `.mp3`, `.exe`\n", " \n", "\n", "## File Path\n", "\n", "To work with files in Python, it is important to have some basic knowledge about paths within the operating system. Assuming that you are already familiar with this concept, let's proceed.\n", "\n", "\n", "\n", "## Opening and closing files in Python\n", "\n", "To open file inside of Python program we can use built-in function `open`.\n", "Let's create the text file `quote.txt` with following content:\n", "\n", "```{admonition} File Content\n", "It never ceases to amaze me:\\\n", "we all love ourselves more than other people,\\\n", "but care more about their opinion than our own.\n", "```\n", "\n", "Now to work with this file inside of Python we can use the following syntax:" ] }, { "cell_type": "code", "execution_count": null, "id": "079d7442-0c08-4ab8-8a2e-8202662c2a10", "metadata": {}, "outputs": [], "source": [ "file = open(\"./quote.txt\", \"r\")" ] }, { "cell_type": "markdown", "id": "7132bbb4", "metadata": {}, "source": [ "Here, the first argument specifies the relative path to the file, and the second argument specifies the mode used for reading the file.\n", "Here are the different modes available in Python:\n", "\n", "1. `r` - read file\n", "2. `w` - write file\n", "3. `rb` - read binary file\n", "4. `wb` - write binary file\n", "5. `a` - append to file\n", "\n", "\n", "```{warning}\n", "We've covered the process of opening files, and it's important to note that, before extracting data, understanding how to properly close files is essential. Closing files in Python is vital for efficient resource management. When a file is opened, system resources are allocated, and neglecting to close it can result in leaks and unpredictable behavior. It is crucial to adhere to proper file-handling practices to ensure optimal resource utilization and maintain the integrity and reliability of the code. Here you can read more about why it is important to close files.\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "8e4158f7", "metadata": {}, "outputs": [], "source": [ "file.close()" ] }, { "cell_type": "markdown", "id": "0efddd70", "metadata": {}, "source": [ "## `with` statement\n", "However, in Python there's a cool trick that lets you forget about remembering to close files.\n", "It is the `with` statement that provides a concise and clean way to work with external resources, such as files. \n", "Its primary purpose is to simplify resource management, ensuring that acquired resources are properly and automatically released, even if an exception occurs during the execution of the code block. \n", "When it comes to file handling, the with statement is particularly advantageous." ] }, { "cell_type": "code", "execution_count": null, "id": "167e4b85", "metadata": {}, "outputs": [], "source": [ "with open(\"./quote.txt\", \"r\") as inp_f:\n", " print(inp_f)" ] }, { "cell_type": "markdown", "id": "25ff3885", "metadata": {}, "source": [ "## Working with text files in Python\n", "\n", "### Reading\n", "\n", "Excellent! With our understanding of file manipulation in Python, we're ready to delve into extracting data from text files within the Python environment. Let's begin by exploring the methods available to achieve this goal:\n", "- The `read()` method returns the specified number of bytes from the file. Default is -1 which means the whole file.\n", "- The `readline()` method returns one line from the file. You can also specified how many bytes from the line to return, by using the size parameter.\n", "\n", "Let's take a closer look on both of them" ] }, { "cell_type": "code", "execution_count": null, "id": "37269cb8", "metadata": {}, "outputs": [], "source": [ "with open(\"./quote.txt\", \"r\") as inp_f:\n", " print(inp_f.read())" ] }, { "cell_type": "markdown", "id": "27ded4ca", "metadata": {}, "source": [ "In this instance, we utilize the `read` method to access the entire file. Feel free to experiment by specifying the size of the file in bytes that you wish to read, rather than reading the entire file." ] }, { "cell_type": "code", "execution_count": null, "id": "5086c0b1", "metadata": {}, "outputs": [], "source": [ "with open(\"./quote.txt\", \"r\") as inp_f:\n", " line_1 = inp_f.readline().strip()\n", " line_2 = inp_f.readline().strip()\n", " print(f\"line_1: {line_1}\")\n", " print(f\"line_2: {line_2}\")" ] }, { "cell_type": "markdown", "id": "08e4ecd3", "metadata": {}, "source": [ "Here I used the `strip()` method to trim the new line symbol which is by default returned in the `readline()` method.\n", "\n", "If you prefer to iterate over each line in your file, you can achieve this using a `for` loop in the following manner:" ] }, { "cell_type": "code", "execution_count": null, "id": "860e31ed", "metadata": {}, "outputs": [], "source": [ "with open(\"./quote.txt\", \"r\") as inp_f:\n", " for line in inp_f:\n", " print(f\"current line: {line.strip()}\")" ] }, { "cell_type": "markdown", "id": "de926790", "metadata": {}, "source": [ "```{note}\n", "Using a `for` loop to iterate over each line of a file is advantageous for several reasons. It promotes memory efficiency by processing one line at a time, making it suitable for large files. The lazy loading approach allows incremental processing without loading the entire file into memory. This method is more performant for certain operations and supports a streaming approach, enabling real-time data processing. In contrast, the `read` method loads the entire content into memory, which can be inefficient for large files. Therefore, when dealing with files, particularly those with substantial data, the `for` loop provides a more resource-friendly and responsive solution.\n", "```\n", "\n", "### Writing\n", "\n", "Now, let's explore the process of creating a new file and adding content to it:\n", "```python\n", "\"\"\"\n", "Embrace what you cannot change, \n", "and focus your energy \n", "on mastering what lies within your control.\n", "\"\"\"\n", "```\n", "\n", "```python\n", "text = \"\"\"\n", "Embrace what you cannot change, \n", "and focus your energy \n", "on mastering what lies within your control.\n", "\"\"\"\n", "\n", "with open(\"./out.txt\", \"w\") as out_f:\n", " out_f.write(text.strip())\n", "```\n", "After executing this code snippet, you will discover that a new file has been successfully created in your current working directory, named `out.txt`. The main distinction from previous examples lies in the use of the argument `\"w\"`, which signifies that the file is opened in write mode. This allows you write content in the file as needed.\n", "It's crucial to note that if the file already existed, executing this code will result in the loss of any information it previously contained. To demonstrate this, let's rerun our program with a slightly modified variable, `'text'`\n", "\n", "```python\n", "text = \"\"\"\n", "Have I been made for this, to lie under the blankets and keep myself warm?\n", "\"\"\"\n", "\n", "with open(\"./out.txt\", \"w\") as out_f:\n", " out_f.write(text.strip())\n", "```\n", "\n", "As you can see we completly lost our previous text.\n", "In case you want to append new information to existed file you can use the `\"a\"` argument.\n", "\n", "```python\n", "text = \"\"\"\n", "Embrace what you cannot change, \n", "and focus your energy \n", "on mastering what lies within your control.\n", "\"\"\"\n", "\n", "with open(\"./out.txt\", \"a\") as out_f:\n", " out_f.write(text.rstrip())\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 5 }