Python Load CSV Files As Namedtuple


  1. 說明

筆記如何使用 Python 讀取 csv 的時候轉換為 Models Namedtuple 方便使用 intellisense 以及模組化的管理資料型別。

logo

說明

準備好 Data folder 以及 Models folder,在 Data folder 放入所有要轉換與使用的 CSV 檔案。

執行以下的 script 之後,會自動在 Models folder 當中幫每一個 csv 檔案產生一個 namedtuple。

原理是從 CSV 檔案,載入 header,提供給 Namedtuple 作為 filed_names,其中因為之後要使用 dot 的方式去存取資料,

所以會自動將無法作為欄位名稱的符號 (例如 #$%^ ) 替換為 _

generate_model.py

import os
import re
import csv

template = """from collections import namedtuple

Columns = {columns}
{fileName} = namedtuple('{fileName}', Columns)
"""

def read_csv_with_encoding(data_folder, filename, encoding):
    try:
        with open(os.path.join(data_folder, filename), 'r', encoding=encoding) as f:
            csv_reader = csv.reader(f)
            first_row = next(csv_reader, None)
            first_row = [row.replace('\ufeff', '') for row in first_row]
            first_row = [re.sub('[\\\/\-\@\#\$\%\^]', '_', row) for row in first_row]
    except Exception as e:
        print(f"An error occurred: {e}")
        first_row = None

    return first_row

# Step 1: Define the data folder path
data_folder = 'Data'  # Replace with your actual data folder path

# Step 2: Create a "Models" folder if it doesn't exist
models_folder = 'Models'
if not os.path.exists(models_folder):
    os.makedirs(models_folder)

# Step 3: Read all CSV files from the data folder and generate Python files
for filename in os.listdir(data_folder):
    if filename.endswith('.csv'):
        # Generate a Python file name based on the CSV file name
        python_filename = os.path.splitext(filename)[0] + '.py'
        
        first_row = read_csv_with_encoding(data_folder, filename, 'utf8')
        if first_row is None:
            first_row = read_csv_with_encoding(data_folder, filename, 'big5')
        
        # Create and open the Python file in write mode
        with open(os.path.join(models_folder, python_filename), 'w', encoding='utf8') as f:
            filename = re.search('(\w*)\.py', python_filename).group(1)
            f.write(template.format(columns = first_row, fileName = filename))

        print(f"Generated {python_filename} from {filename}")

print("All CSV files processed and Python files generated.")

完成轉換後,可以透過以下方式使用,就可以享受到便利的 VSCode intellisense 以及模組化的管理資料型別囉 😀

from Models.MRT_Activity import MRT_Activity
import csv

with open('.\Data\MRT_Activity.csv', 'r', encoding='utf8') as f:
    reader = csv.reader(f)
    activities = [MRT_Activity(*line) for line in reader]
    
for activity in activities:
    print(f'{activity.地點}')

In this blog post, we’ve explored how to harness the power of NamedTuples in Python for loading and managing CSV files. This elegant approach not only enhances code readability but also improves data integrity and performance.

By adopting NamedTuples in your data processing workflows, you can streamline your data management tasks and unlock the full potential of Python for data analysis.

So, take a deep breath, dive into the world of NamedTuples, and elevate your Python data handling skills to new heights. Happy coding 😉