python批量写入图片到excel中
Published in:2025-02-28 |
Words: 2.8k | Reading time: 15min | reading:

python批量写入图片到excel中

技术

1. pandas

  • 用途:

    • 数据分析和处理的核心库,特别是处理表格型数据。
    • 提供高效、灵活的数据结构,例如 Series (一维数据) 和 DataFrame (二维表格数据)。
    • 支持数据的读取、写入、清洗、转换、聚合、合并等操作。
    • 广泛应用于数据科学、金融分析、统计分析、机器学习等领域。
  • 特点:

    • DataFrame: 类似电子表格的结构,可以存储各种数据类型 (数值、字符串、日期等)。
    • Series: 带有标签 (索引) 的一维数组,可以看作 DataFrame 的一列。
    • 数据对齐: pandas 自动根据索引进行数据对齐,处理缺失值。
    • 灵活的索引和选择: 方便根据标签或位置选择数据。
    • 分组和聚合: 可以轻松地对数据进行分组 (groupby) 并进行聚合计算 (sum, mean, count 等)。
    • 合并和连接: 支持各种数据合并 (merge) 和连接 (join) 操作。
    • 时间序列处理: 专门用于处理时间序列数据的功能。
    • 文件 I/O: 可以读取和写入多种格式的文件,例如 CSV, Excel, SQL 数据库等。
    • 性能优化: 基于 NumPy 构建,性能较高。

    2. Pillow

  • 用途:

    • 图像处理库,是 Python Imaging Library (PIL) 的一个分支。
    • 提供图像的打开、保存、修改、创建等功能。
    • 支持多种图像格式,例如 JPEG, PNG, GIF, TIFF, BMP 等。
    • 广泛应用于图像编辑、图像分析、计算机视觉等领域。
  • 特点:

    • 图像格式支持: 支持多种图像格式的读取和写入。
    • 图像处理操作: 提供丰富的图像处理操作,例如调整大小、裁剪、旋转、颜色转换、滤波等。
    • 绘图功能: 可以在图像上绘制线条、形状、文本等。
    • 图像序列处理: 可以处理动画 GIF 和多页 TIFF 文件。
    • 易于使用: API 设计简洁易懂。

    3. openpyxl

  • 用途:

    • 用于读写 Excel (.xlsx) 文件的库。
    • 可以创建、修改、读取 Excel 文件中的工作簿、工作表、单元格、样式等。
    • 广泛应用于数据报表生成、数据导出、自动化 Excel 处理等领域。
  • 特点:

    • 支持 Excel 文件格式: 完全支持 Excel 2010 及更高版本的 .xlsx 文件格式。
    • 单元格操作: 可以读写单元格的值、格式、公式等。
    • 工作表操作: 可以创建、删除、重命名工作表,设置工作表的属性。
    • 样式设置: 可以设置单元格的字体、颜色、对齐方式、边框等样式。
    • 图表支持: 可以创建各种图表,例如柱状图、折线图、饼图等。
    • 易于使用: API 设计友好,易于学习和使用。

应用

表格图片插入

  • code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def insert_image_with_resize(image_path: str, cell_width=80.79, cell_height=259.8):
"""Insert and resize image, handling errors and returning PIL Image."""
try:
img = Image.open(image_path)
width, height = img.size
width_ratio = (cell_width * 7) / width
height_ratio = cell_height / height
ratio = min(width_ratio, height_ratio)
new_width = int(width * ratio) # Calculate new_width
new_height = int(height * ratio) # Calculate new_height
img = img.resize((new_width, new_height), Image.LANCZOS)

base, ext = os.path.splitext(os.path.basename(image_path))
temp_img_path = os.path.join(TEMP_DIR, f"temp_{base}_{hash(image_path)}{ext}")
img.save(temp_img_path, optimize=True)

return img, temp_img_path
except FileNotFoundError:
logger.error(f"Image file not found: {image_path}")
return None, None
except (UnidentifiedImageError, OSError) as e:
logger.error(f"Image format error or corrupted file: {image_path} - {e}")
return None, None
except Exception as e:
logger.exception(f"Error processing image {image_path}: {e}")
return None, None

图片数据插入前处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143

MAX_IMAGE_PIXELS = 178956970 # Define the maximum pixel limit

def rgba_to_rgb_numpy(image, background_color=(255, 255, 255)):
"""使用 NumPy 将 RGBA/LA 图像转换为 RGB 图像,可以指定背景颜色。"""
try:
if image.mode == 'L': # 灰度图像
logger.debug(f"Converting grayscale image to RGB.")
return image.convert('RGB')

if image.mode == 'LA':
logger.debug(f"Converting LA image to RGB (grayscale with alpha).")
img_array = np.array(image)
l_channel = img_array[:, :, 0] # Luminance channel
alpha_channel = img_array[:, :, 1] #Alpha channel

# Create an RGB image where R=G=B=L
rgb = np.stack([l_channel, l_channel, l_channel], axis=2) # Replicate L to R, G, B

background = np.array(background_color, dtype=np.uint8)
new_rgb = (rgb * (alpha_channel / 255.0)[:, :, None] + background * (1 - (alpha_channel / 255.0)[:, :, None])).astype(np.uint8)

return Image.fromarray(new_rgb)

if image.mode == 'RGBA':
logger.debug(f"Converting RGBA image to RGB.")

# 强制加载图像数据
try:
image.load() # 确保图像数据已加载
except Exception as e:
logger.error(f"Failed to load image data: {e}")
return image # 无法加载,直接返回原始图像

img_array = np.array(image)
logger.debug(f"Image array shape: {img_array.shape}, mode: {image.mode}")

if len(img_array.shape) < 3:
logger.error(f"Image has insufficient dimensions (shape: {img_array.shape}), skipping conversion.")
return image

if img_array.shape[2] < 4: # 检查是否有足够的通道
logger.warning(f"Image has only {img_array.shape[2]} channels, assuming no alpha and skipping conversion.")
return image.convert("RGB") # 直接转换为RGB,不做alpha混合

alpha = img_array[:, :, 3] # Alpha channel
rgb = img_array[:, :, :3] # RGB channels

background = np.array(background_color, dtype=np.uint8)
new_rgb = (rgb * (alpha / 255.0)[:, :, None] + background * (1 - (alpha / 255.0)[:, :, None])).astype(
np.uint8)
return Image.fromarray(new_rgb)

else:
logger.debug(f"Converting non-RGBA/LA image to RGB.")
return image.convert("RGB")
except Exception as e:
logger.exception(f"Error during RGBA to RGB conversion: {e}")
return image


def is_image(file_path):
"""
Check if a file is a valid image (and resizes if needed) and its mode is RGB (or can be converted to RGB).

Returns:
True if the file is a valid image,
False if the file is not found or is not a valid image, or exceeds the limit.
"""
global MAX_IMAGE_PIXELS

if not file_path:
logger.error(f"Error: Path empty -{file_path}") # check all path first
return False

try:
if not os.path.exists(file_path):
logger.error(
f"Error: The file was Not found at the location - {file_path}"
) # check if all locations have files.
return False

try:
img = Image.open(file_path)
width, height = img.size
total_pixels = width * height

if total_pixels > MAX_IMAGE_PIXELS:
logger.warning(f"Image {file_path} exceeds pixel limit. Resizing.")
# Calculate new dimensions while preserving aspect ratio
ratio = (MAX_IMAGE_PIXELS / total_pixels) ** 0.5
new_width = int(width * ratio)
new_height = int(height * ratio)
img = img.resize((new_width, new_height), Image.LANCZOS) # Or another resampling filter
logger.info(f"Resized image to {new_width}x{new_height}")

img.load() # Force load to find problems

except (FileNotFoundError, UnidentifiedImageError, OSError) as e:
logger.error(f"Error opening image file {file_path}: {e}")
return False
except Exception as e:
logger.error(
f"Different from other error: it is an Unexpected error opening image: {file_path} {e}"
)
return False # It may have just not passed the test and is just a bad file path that is expected.

try:
# Verify the file and get the image format
# Verify a copy instead
img_copy = img.copy()
img_copy.verify()
img_copy.close() # Release resources

except Exception as e:
logger.error(
f"Error: Image verification failed - {file_path} {e}"
) # This can be all bad names, or some other error with image.
img.close() # Release resources
return False

# If RBGA was never called it was a fail
if img.mode == "RGBA":
# Check the image.
try:
img = rgba_to_rgb_numpy(
img
) # try and convert if RGBA is needed and all cases works
logger.warning(f"img: {file_path} , img: {img}, rgba to rgb...")
except Exception as e:
logger.error(f"convert error, detail: {e}, file path : {file_path}")
img.close() # Release Resources before returning
return False # If the image cannot load and convert, test that this is correct image

img.close() # Close when done if conversion was not needed.
return True # It was tested if all file loads.

except Exception as e:
logger.error(
f"Different from other error: it is an Unexpected error opening image: or image does not pass checks and there may not be image or corrupted with other issues - {file_path} {e}"
) # This case is just a bad path
return False # It may have just not passed the test and is just a bad file path that is expected.

数据批量写入excel表格

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
def process_image_group(identifiers, images_map_dir1, images_map_dir2, instruction_txt_path, start_index):
"""Processes a group of images."""
sheet_data = []
temp_files = []

for index, identifier in enumerate(identifiers, start=start_index + 1):
image1_path = images_map_dir1.get(identifier)
image2_path = images_map_dir2.get(identifier)

if image1_path and image2_path:
try:
original_filename = os.path.basename(image1_path)
processed_filename = os.path.basename(image2_path)
instruction_text = generate_modification_instruction(processed_filename)

logger.info(f"source image: {image1_path} instruction txt: {instruction_text}")

# Extract filename without instruction using a helper function
processed_filename_without_instruction = extract_filename_without_instruction(processed_filename,
instruction_text)

with concurrent.futures.ThreadPoolExecutor() as executor:
future1 = executor.submit(insert_image_with_resize, image1_path)
future2 = executor.submit(insert_image_with_resize, image2_path)
img1, temp_img1_path = future1.result()
img2, temp_img2_path = future2.result()

if temp_img1_path:
temp_files.append(temp_img1_path)
if temp_img2_path:
temp_files.append(temp_img2_path)

original_numbers = extract_numbers_from_filename(original_filename)
processed_numbers = extract_numbers_from_filename(processed_filename)

# logger.info(f"get source number image: {original_numbers}, processed numbers: {processed_numbers}!!!")

# 构建文件名
base_name, ext = os.path.splitext(processed_filename) # 获取原始扩展名
# process_path = os.path.split(image2_path)
new_image1_filename, ext = extract_directory_and_extension(image2_path)
new_image1_filename = os.path.join(new_image1_filename, str(processed_numbers[0]) + str(ext))

parent_dir_name = get_parent_directory_name(image2_path)
parent_dir_name = os.path.join(parent_dir_name, 'copy')

copied_image2_path = copy_and_rename(image2_path, parent_dir_name, new_image1_filename)
# copied_image2_path = copy_and_rename(image2_path, "./copy", new_image2_filename)


# logger.info(f"already copy image2: {image2_path} to new dir: {copied_image2_path}!!! name: {new_image1_filename}.")

if img1 and img2:
sheet_data.append({
'index': index,
'original_filename': original_numbers[0],
'img1': img1,
'temp_img1_path': temp_img1_path,
'processed_filename': processed_numbers[0],
'processed_filename_without_instruction': processed_filename_without_instruction, # Added field
'img2': img2,
'temp_img2_path': temp_img2_path,
'instruction_text': instruction_text
})
logger.info(f"writing image: {original_filename} and {processed_filename} to excel file!!!")
else:
logger.error(f"Skipping row: {original_filename}, {processed_filename}")

except Exception as e:
logger.error(f"Error processing: {image1_path} or {image2_path} - {e}")
else:
logger.warning(f"Error processing: {image1_path} or {image2_path} not get part.")

return sheet_data, temp_files

数据汇总输出Excel

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133

def write_images_to_excel(dir1, dir2, output_excel, progress_var, progress_bar, progress_label, ps_root,
instruction_txt_path):
"""Write images and filenames to Excel."""
files_dir1 = get_all_files(dir1)
files_dir2 = get_all_files(dir2)

images_map_dir1 = {}
images_map_dir2 = {}

for file in files_dir1:
if is_image(file):
identifier = extract_identifier_de(os.path.basename(file)) or os.path.basename(file)
images_map_dir1[identifier] = file
else:
logger.warning(f"file path not image!!! {file}")

for file in files_dir2:
if is_image(file):
identifier = extract_identifier_de(os.path.basename(file)) or os.path.basename(file)
images_map_dir2[identifier] = file
else:
logger.warning(f"file path not image!!! {file}")

if not images_map_dir1 and not images_map_dir2:
logger.warning("No image files found.")
return False

common_identifiers = list(set(images_map_dir1.keys()) & set(images_map_dir2.keys()))
total_images = len(common_identifiers)
logger.info(f"total image: {total_images}!!!")

# logger.info("start output all image content:!!!")
if dev_tools_mode:
logger.info("dev tools mode enable!!!")
compare_and_info_dicts(images_map_dir1, images_map_dir2)
# logger.info("end output all image content:!!!")

if not common_identifiers:
logger.warning("No common identifiers found.")
return False

common_identifiers.sort(
key=lambda x: (int(extract_identifier(x)) if extract_identifier(x).isdigit() else float('inf'), x))

output_dir = os.path.dirname(output_excel)
base_name, ext = os.path.splitext(os.path.basename(output_excel))
os.makedirs(output_dir, exist_ok=True)

with tqdm(total=total_images, desc="Processing Images", unit="pair", dynamic_ncols=True) as pbar:
for start_index in range(0, total_images, 500):
end_index = min(start_index + 500, total_images)
current_identifiers = common_identifiers[start_index:end_index]

sheet_data, temp_files = process_image_group(current_identifiers, images_map_dir1, images_map_dir2,
instruction_txt_path, start_index)

workbook = openpyxl.Workbook()
sheet = workbook.active
sheet.title = "Image Comparison"

headers = ["序号", "原图文件名", "原图", "修改图文件名", "修改图", "修改内容"]
for col_num, header in enumerate(headers, 1):
cell = sheet.cell(row=1, column=col_num, value=header)
cell.alignment = Alignment(horizontal='center')

row = 2
original_filenames = []
processed_filenames = []
modification_instructions = []

for item in sheet_data:
try:
sheet.cell(row=row, column=1, value=item['index']).alignment = Alignment(horizontal='center')
original_filenames.append(item['original_filename'])
sheet.cell(row=row, column=2, value=item['original_filename'])

if item['img1']:
excel_img1 = ExcelImage(item['temp_img1_path'])
sheet.add_image(excel_img1, f'C{row}')
sheet.row_dimensions[row].height = 259.8

processed_filenames.append(item['processed_filename'])
sheet.cell(row=row, column=4, value=item['processed_filename'])

if item['img2']:
excel_img2 = ExcelImage(item['temp_img2_path'])
sheet.add_image(excel_img2, f'E{row}')
sheet.row_dimensions[row].height = 259.8

modification_instructions.append(item['instruction_text'])
sheet.cell(row=row, column=6, value=item['instruction_text']).alignment = Alignment(vertical='top')

row += 1
except Exception as e:
logger.exception(f"Error adding row: {e}")

adjust_column_width(sheet, 'B', original_filenames)
adjust_column_width(sheet, 'D', processed_filenames)
adjust_column_width(sheet, 'F', modification_instructions)
adjust_column_width(sheet, 'A', range(start_index + 1, start_index + len(current_identifiers) + 1))
sheet.column_dimensions['C'].width = 80.79
sheet.column_dimensions['E'].width = 80.79

file_counter = start_index // 500 + 1
new_output_excel = os.path.join(output_dir, f"{base_name}_{file_counter}{ext}")

try:
workbook.save(new_output_excel)
logger.info(f"Saved batch: {new_output_excel}")
except Exception as e:
logger.error(f"Failed to save {new_output_excel}: {e}")
return False
finally:
for temp_file in temp_files:
try:
os.remove(temp_file)
except FileNotFoundError:
pass
except Exception as e:
logger.error(f"Error deleting {temp_file}: {e}")

pbar.update(len(current_identifiers))
progress_percentage = int((pbar.n / total_images) * 100)
progress_var.set(progress_percentage)
progress_label.config(text=f"{progress_percentage}%")
ps_root.update_idletasks()

progress_var.set(100)
progress_label.config(text="100%")
ps_root.update_idletasks()
return True

Next:
在macOS使用ntfs格式硬盘方法