python批量写入图片到excel中

技术

1. pandas

用途：
- 数据分析和处理的核心库，特别是处理表格型数据。
- 提供高效、灵活的数据结构，例如 Series (一维数据) 和 DataFrame (二维表格数据)。
- 支持数据的读取、写入、清洗、转换、聚合、合并等操作。
- 广泛应用于数据科学、金融分析、统计分析、机器学习等领域。
特点：
- DataFrame: 类似电子表格的结构，可以存储各种数据类型 (数值、字符串、日期等)。
- Series: 带有标签 (索引) 的一维数组，可以看作 DataFrame 的一列。
- 数据对齐: pandas 自动根据索引进行数据对齐，处理缺失值。
- 灵活的索引和选择: 方便根据标签或位置选择数据。
- 分组和聚合: 可以轻松地对数据进行分组 (groupby) 并进行聚合计算 (sum, mean, count 等)。
- 合并和连接: 支持各种数据合并 (merge) 和连接 (join) 操作。
- 时间序列处理: 专门用于处理时间序列数据的功能。
- 文件 I/O: 可以读取和写入多种格式的文件，例如 CSV, Excel, SQL 数据库等。
- 性能优化: 基于 NumPy 构建，性能较高。
2. Pillow
用途:
- 图像处理库，是 Python Imaging Library (PIL) 的一个分支。
- 提供图像的打开、保存、修改、创建等功能。
- 支持多种图像格式，例如 JPEG, PNG, GIF, TIFF, BMP 等。
- 广泛应用于图像编辑、图像分析、计算机视觉等领域。
特点:
- 图像格式支持: 支持多种图像格式的读取和写入。
- 图像处理操作: 提供丰富的图像处理操作，例如调整大小、裁剪、旋转、颜色转换、滤波等。
- 绘图功能: 可以在图像上绘制线条、形状、文本等。
- 图像序列处理: 可以处理动画 GIF 和多页 TIFF 文件。
- 易于使用: API 设计简洁易懂。
3. openpyxl
用途:
- 用于读写 Excel (.xlsx) 文件的库。
- 可以创建、修改、读取 Excel 文件中的工作簿、工作表、单元格、样式等。
- 广泛应用于数据报表生成、数据导出、自动化 Excel 处理等领域。
特点:
- 支持 Excel 文件格式: 完全支持 Excel 2010 及更高版本的 .xlsx 文件格式。
- 单元格操作: 可以读写单元格的值、格式、公式等。
- 工作表操作: 可以创建、删除、重命名工作表，设置工作表的属性。
- 样式设置: 可以设置单元格的字体、颜色、对齐方式、边框等样式。
- 图表支持: 可以创建各种图表，例如柱状图、折线图、饼图等。
- 易于使用: API 设计友好，易于学习和使用。

应用

表格图片插入

code

def insert_image_with_resize(image_path: str, cell_width=80.79, cell_height=259.8):
    """Insert and resize image, handling errors and returning PIL Image."""
    try:
        img = Image.open(image_path)
        width, height = img.size
        width_ratio = (cell_width * 7) / width
        height_ratio = cell_height / height
        ratio = min(width_ratio, height_ratio)
        new_width = int(width * ratio)  # Calculate new_width
        new_height = int(height * ratio)  # Calculate new_height
        img = img.resize((new_width, new_height), Image.LANCZOS)

        base, ext = os.path.splitext(os.path.basename(image_path))
        temp_img_path = os.path.join(TEMP_DIR, f"temp_{base}_{hash(image_path)}{ext}")
        img.save(temp_img_path, optimize=True)

        return img, temp_img_path
    except FileNotFoundError:
        logger.error(f"Image file not found: {image_path}")
        return None, None
    except (UnidentifiedImageError, OSError) as e:
        logger.error(f"Image format error or corrupted file: {image_path} - {e}")
        return None, None
    except Exception as e:
        logger.exception(f"Error processing image {image_path}: {e}")
        return None, None

图片数据插入前处理


MAX_IMAGE_PIXELS = 178956970  # Define the maximum pixel limit

def rgba_to_rgb_numpy(image, background_color=(255, 255, 255)):
    """使用 NumPy 将 RGBA/LA 图像转换为 RGB 图像，可以指定背景颜色。"""
    try:
        if image.mode == 'L':  # 灰度图像
            logger.debug(f"Converting grayscale image to RGB.")
            return image.convert('RGB')

        if image.mode == 'LA':
            logger.debug(f"Converting LA image to RGB (grayscale with alpha).")
            img_array = np.array(image)
            l_channel = img_array[:, :, 0]  # Luminance channel
            alpha_channel = img_array[:, :, 1] #Alpha channel

            # Create an RGB image where R=G=B=L
            rgb = np.stack([l_channel, l_channel, l_channel], axis=2)  # Replicate L to R, G, B

            background = np.array(background_color, dtype=np.uint8)
            new_rgb = (rgb * (alpha_channel / 255.0)[:, :, None] + background * (1 - (alpha_channel / 255.0)[:, :, None])).astype(np.uint8)

            return Image.fromarray(new_rgb)

        if image.mode == 'RGBA':
            logger.debug(f"Converting RGBA image to RGB.")

            # 强制加载图像数据
            try:
                image.load()  # 确保图像数据已加载
            except Exception as e:
                logger.error(f"Failed to load image data: {e}")
                return image  # 无法加载，直接返回原始图像

            img_array = np.array(image)
            logger.debug(f"Image array shape: {img_array.shape}, mode: {image.mode}")

            if len(img_array.shape) < 3:
                logger.error(f"Image has insufficient dimensions (shape: {img_array.shape}), skipping conversion.")
                return image

            if img_array.shape[2] < 4:  # 检查是否有足够的通道
                logger.warning(f"Image has only {img_array.shape[2]} channels, assuming no alpha and skipping conversion.")
                return image.convert("RGB")  # 直接转换为RGB，不做alpha混合

            alpha = img_array[:, :, 3]  # Alpha channel
            rgb = img_array[:, :, :3]  # RGB channels

            background = np.array(background_color, dtype=np.uint8)
            new_rgb = (rgb * (alpha / 255.0)[:, :, None] + background * (1 - (alpha / 255.0)[:, :, None])).astype(
                np.uint8)
            return Image.fromarray(new_rgb)

        else:
            logger.debug(f"Converting non-RGBA/LA image to RGB.")
            return image.convert("RGB")
    except Exception as e:
        logger.exception(f"Error during RGBA to RGB conversion: {e}")
        return image


def is_image(file_path):
    """
    Check if a file is a valid image (and resizes if needed) and its mode is RGB (or can be converted to RGB).

    Returns:
        True if the file is a valid image,
        False if the file is not found or is not a valid image, or exceeds the limit.
    """
    global MAX_IMAGE_PIXELS

    if not file_path:
        logger.error(f"Error: Path empty -{file_path}")  # check all path first
        return False

    try:
        if not os.path.exists(file_path):
            logger.error(
                f"Error: The file was Not found at the location - {file_path}"
            )  # check if all locations have files.
            return False

        try:
            img = Image.open(file_path)
            width, height = img.size
            total_pixels = width * height

            if total_pixels > MAX_IMAGE_PIXELS:
                logger.warning(f"Image {file_path} exceeds pixel limit. Resizing.")
                # Calculate new dimensions while preserving aspect ratio
                ratio = (MAX_IMAGE_PIXELS / total_pixels) ** 0.5
                new_width = int(width * ratio)
                new_height = int(height * ratio)
                img = img.resize((new_width, new_height), Image.LANCZOS)  # Or another resampling filter
                logger.info(f"Resized image to {new_width}x{new_height}")

            img.load()  # Force load to find problems

        except (FileNotFoundError, UnidentifiedImageError, OSError) as e:
            logger.error(f"Error opening image file {file_path}: {e}")
            return False
        except Exception as e:
            logger.error(
                f"Different from other error: it is an Unexpected error opening image: {file_path}  {e}"
            )
            return False  # It may have just not passed the test and is just a bad file path that is expected.

        try:
            # Verify the file and get the image format
            # Verify a copy instead
            img_copy = img.copy()
            img_copy.verify()
            img_copy.close()  # Release resources

        except Exception as e:
            logger.error(
                f"Error: Image verification failed - {file_path}  {e}"
            )  # This can be all bad names, or some other error with image.
            img.close()  # Release resources
            return False

        # If RBGA was never called it was a fail
        if img.mode == "RGBA":
            # Check the image.
            try:
                img = rgba_to_rgb_numpy(
                    img
                )  # try and convert if RGBA is needed and all cases works
                logger.warning(f"img: {file_path} , img: {img}, rgba to rgb...")
            except Exception as e:
                logger.error(f"convert error, detail: {e}, file path : {file_path}")
                img.close()  # Release Resources before returning
                return False  # If the image cannot load and convert, test that this is correct image

        img.close()  # Close when done if conversion was not needed.
        return True  # It was tested if all file loads.

    except Exception as e:
        logger.error(
            f"Different from other error: it is an Unexpected error opening image: or image does not pass checks and there may not be image or corrupted with  other issues - {file_path}  {e}"
        )  # This case is just a bad path
        return False  # It may have just not passed the test and is just a bad file path that is expected.

数据批量写入excel表格

def process_image_group(identifiers, images_map_dir1, images_map_dir2, instruction_txt_path, start_index):
    """Processes a group of images."""
    sheet_data = []
    temp_files = []

    for index, identifier in enumerate(identifiers, start=start_index + 1):
        image1_path = images_map_dir1.get(identifier)
        image2_path = images_map_dir2.get(identifier)

        if image1_path and image2_path:
            try:
                original_filename = os.path.basename(image1_path)
                processed_filename = os.path.basename(image2_path)
                instruction_text = generate_modification_instruction(processed_filename)

                logger.info(f"source image: {image1_path} instruction txt: {instruction_text}")

                # Extract filename without instruction using a helper function
                processed_filename_without_instruction = extract_filename_without_instruction(processed_filename,
                                                                                              instruction_text)

                with concurrent.futures.ThreadPoolExecutor() as executor:
                    future1 = executor.submit(insert_image_with_resize, image1_path)
                    future2 = executor.submit(insert_image_with_resize, image2_path)
                    img1, temp_img1_path = future1.result()
                    img2, temp_img2_path = future2.result()

                if temp_img1_path:
                    temp_files.append(temp_img1_path)
                if temp_img2_path:
                    temp_files.append(temp_img2_path)

                original_numbers = extract_numbers_from_filename(original_filename)
                processed_numbers = extract_numbers_from_filename(processed_filename)

                # logger.info(f"get source number image: {original_numbers}, processed numbers: {processed_numbers}!!!")

                # 构建文件名
                base_name, ext = os.path.splitext(processed_filename)  # 获取原始扩展名
                # process_path = os.path.split(image2_path)
                new_image1_filename, ext = extract_directory_and_extension(image2_path)
                new_image1_filename = os.path.join(new_image1_filename, str(processed_numbers[0]) + str(ext))

                parent_dir_name = get_parent_directory_name(image2_path)
                parent_dir_name = os.path.join(parent_dir_name, 'copy')

                copied_image2_path = copy_and_rename(image2_path, parent_dir_name, new_image1_filename)
                # copied_image2_path = copy_and_rename(image2_path, "./copy", new_image2_filename)


                # logger.info(f"already copy image2: {image2_path} to new dir: {copied_image2_path}!!! name: {new_image1_filename}.")

                if img1 and img2:
                    sheet_data.append({
                        'index': index,
                        'original_filename': original_numbers[0],
                        'img1': img1,
                        'temp_img1_path': temp_img1_path,
                        'processed_filename': processed_numbers[0],
                        'processed_filename_without_instruction': processed_filename_without_instruction,  # Added field
                        'img2': img2,
                        'temp_img2_path': temp_img2_path,
                        'instruction_text': instruction_text
                    })
                    logger.info(f"writing image: {original_filename} and {processed_filename} to excel file!!!")
                else:
                    logger.error(f"Skipping row: {original_filename}, {processed_filename}")

            except Exception as e:
                logger.error(f"Error processing: {image1_path} or {image2_path} - {e}")
        else:
            logger.warning(f"Error processing: {image1_path} or {image2_path} not get part.")

    return sheet_data, temp_files

数据汇总输出Excel


def write_images_to_excel(dir1, dir2, output_excel, progress_var, progress_bar, progress_label, ps_root,
                          instruction_txt_path):
    """Write images and filenames to Excel."""
    files_dir1 = get_all_files(dir1)
    files_dir2 = get_all_files(dir2)

    images_map_dir1 = {}
    images_map_dir2 = {}

    for file in files_dir1:
        if is_image(file):
            identifier = extract_identifier_de(os.path.basename(file)) or os.path.basename(file)
            images_map_dir1[identifier] = file
        else:
            logger.warning(f"file path not image!!! {file}")

    for file in files_dir2:
        if is_image(file):
            identifier = extract_identifier_de(os.path.basename(file)) or os.path.basename(file)
            images_map_dir2[identifier] = file
        else:
            logger.warning(f"file path not image!!! {file}")

    if not images_map_dir1 and not images_map_dir2:
        logger.warning("No image files found.")
        return False

    common_identifiers = list(set(images_map_dir1.keys()) & set(images_map_dir2.keys()))
    total_images = len(common_identifiers)
    logger.info(f"total image: {total_images}!!!")

    # logger.info("start output all image content:!!!")
    if dev_tools_mode:
        logger.info("dev tools mode enable!!!")
        compare_and_info_dicts(images_map_dir1, images_map_dir2)
    # logger.info("end output all image content:!!!")

    if not common_identifiers:
        logger.warning("No common identifiers found.")
        return False

    common_identifiers.sort(
        key=lambda x: (int(extract_identifier(x)) if extract_identifier(x).isdigit() else float('inf'), x))

    output_dir = os.path.dirname(output_excel)
    base_name, ext = os.path.splitext(os.path.basename(output_excel))
    os.makedirs(output_dir, exist_ok=True)

    with tqdm(total=total_images, desc="Processing Images", unit="pair", dynamic_ncols=True) as pbar:
        for start_index in range(0, total_images, 500):
            end_index = min(start_index + 500, total_images)
            current_identifiers = common_identifiers[start_index:end_index]

            sheet_data, temp_files = process_image_group(current_identifiers, images_map_dir1, images_map_dir2,
                                                         instruction_txt_path, start_index)

            workbook = openpyxl.Workbook()
            sheet = workbook.active
            sheet.title = "Image Comparison"

            headers = ["序号", "原图文件名", "原图", "修改图文件名", "修改图", "修改内容"]
            for col_num, header in enumerate(headers, 1):
                cell = sheet.cell(row=1, column=col_num, value=header)
                cell.alignment = Alignment(horizontal='center')

            row = 2
            original_filenames = []
            processed_filenames = []
            modification_instructions = []

            for item in sheet_data:
                try:
                    sheet.cell(row=row, column=1, value=item['index']).alignment = Alignment(horizontal='center')
                    original_filenames.append(item['original_filename'])
                    sheet.cell(row=row, column=2, value=item['original_filename'])

                    if item['img1']:
                        excel_img1 = ExcelImage(item['temp_img1_path'])
                        sheet.add_image(excel_img1, f'C{row}')
                        sheet.row_dimensions[row].height = 259.8

                    processed_filenames.append(item['processed_filename'])
                    sheet.cell(row=row, column=4, value=item['processed_filename'])

                    if item['img2']:
                        excel_img2 = ExcelImage(item['temp_img2_path'])
                        sheet.add_image(excel_img2, f'E{row}')
                        sheet.row_dimensions[row].height = 259.8

                    modification_instructions.append(item['instruction_text'])
                    sheet.cell(row=row, column=6, value=item['instruction_text']).alignment = Alignment(vertical='top')

                    row += 1
                except Exception as e:
                    logger.exception(f"Error adding row: {e}")

            adjust_column_width(sheet, 'B', original_filenames)
            adjust_column_width(sheet, 'D', processed_filenames)
            adjust_column_width(sheet, 'F', modification_instructions)
            adjust_column_width(sheet, 'A', range(start_index + 1, start_index + len(current_identifiers) + 1))
            sheet.column_dimensions['C'].width = 80.79
            sheet.column_dimensions['E'].width = 80.79

            file_counter = start_index // 500 + 1
            new_output_excel = os.path.join(output_dir, f"{base_name}_{file_counter}{ext}")

            try:
                workbook.save(new_output_excel)
                logger.info(f"Saved batch: {new_output_excel}")
            except Exception as e:
                logger.error(f"Failed to save {new_output_excel}: {e}")
                return False
            finally:
                for temp_file in temp_files:
                    try:
                        os.remove(temp_file)
                    except FileNotFoundError:
                        pass
                    except Exception as e:
                        logger.error(f"Error deleting {temp_file}: {e}")

            pbar.update(len(current_identifiers))
            progress_percentage = int((pbar.n / total_images) * 100)
            progress_var.set(progress_percentage)
            progress_label.config(text=f"{progress_percentage}%")
            ps_root.update_idletasks()

    progress_var.set(100)
    progress_label.config(text="100%")
    ps_root.update_idletasks()
    return True

Zhaoqi.Cao static blog

A static personal blog.

python批量写入图片到excel中

技术

应用

表格图片插入

图片数据插入前处理

数据批量写入excel表格

数据汇总输出Excel

about me 个人微信

wechat offical 微信公众号

你的赏识是我前进的动力