unstructured
db497a71 - fix: remove noops and speed up function `zoom_image` by 12% (#4164)

Commit
36 days ago
fix: remove noops and speed up function `zoom_image` by 12% (#4164) <!-- CODEFLASH_OPTIMIZATION: {"function":"zoom_image","file":"unstructured/partition/utils/ocr_models/tesseract_ocr.py","speedup_pct":"12%","speedup_x":"0.12x","original_runtime":"18.1 milliseconds","best_runtime":"16.1 milliseconds","optimization_type":"memory","timestamp":"2025-12-19T03:24:39.274Z","version":"1.0"} --> #### 📄 12% (0.12x) speedup for ***`zoom_image` in `unstructured/partition/utils/ocr_models/tesseract_ocr.py`*** ⏱️ Runtime : **`18.1 milliseconds`** **→** **`16.1 milliseconds`** (best of `12` runs) #### 📝 Explanation and details The optimization removes unnecessary morphological operations (dilation followed by erosion) that were being performed with a 1x1 kernel. Since a 1x1 kernel has no effect on the image during dilation and erosion operations, these steps were pure computational overhead. **Key changes:** - Eliminated the creation of a 1x1 kernel (`np.ones((1, 1), np.uint8)`) - Removed the `cv2.dilate()` and `cv2.erode()` calls that used this ineffective kernel - Added explanatory comments about why these operations were removed **Why this leads to speedup:** The line profiler shows that the morphological operations consumed 27.7% of the total runtime (18.5% for dilation + 9.2% for erosion). A 1x1 kernel performs no actual morphological transformation - it's equivalent to applying the identity operation. Removing these no-op calls eliminates unnecessary OpenCV function overhead and memory operations. **Performance impact based on function references:** The `zoom_image` function is called within Tesseract OCR processing, specifically in `get_layout_from_image()` when text height falls outside optimal ranges. This optimization will improve OCR preprocessing performance, especially beneficial since OCR is typically a computationally intensive operation that may be called repeatedly on document processing pipelines. **Test case analysis:** The optimization shows consistent 7-35% speedups across various test cases, with particularly strong gains for: - Identity zoom operations (35.8% faster) - most common case where zoom=1 - Upscaling operations (21-32% faster) - when OCR requires image enlargement - Large images (8-22% faster) - where the removed operations had more overhead The optimization maintains identical visual output since the removed operations were mathematically no-ops, ensuring OCR accuracy is preserved while reducing processing time. ✅ **Correctness verification report:** | Test | Status | | --------------------------- | ----------------- | | ⚙️ Existing Unit Tests | ✅ **27 Passed** | | 🌀 Generated Regression Tests | ✅ **38 Passed** | | ⏪ Replay Tests | 🔘 **None Found** | | 🔎 Concolic Coverage Tests | 🔘 **None Found** | |📊 Tests Coverage | 100.0% | <details> <summary>⚙️ Existing Unit Tests and Runtime</summary> | Test File::Test Function | Original ⏱️ | Optimized ⏱️ | Speedup | |:---------------------------------------------------|:--------------|:---------------|:----------| | `partition/pdf_image/test_ocr.py::test_zoom_image` | 707μs | 632μs | 11.9%✅ | </details> <details> <summary>🌀 Generated Regression Tests and Runtime</summary> ```python from __future__ import annotations import numpy as np # imports from PIL import Image as PILImage from unstructured.partition.utils.ocr_models.tesseract_ocr import zoom_image # --------- UNIT TESTS --------- # Helper function to create a simple RGB PIL image of given size and color def make_image(size=(10, 10), color=(255, 0, 0)): img = PILImage.new("RGB", size, color) return img # ---------------- BASIC TEST CASES ---------------- def test_zoom_identity(): """Zoom factor 1 should return an image of the same size (but not necessarily the same object).""" img = make_image((20, 30), (123, 45, 67)) codeflash_output = zoom_image(img, 1) out = codeflash_output # 75.0μs -> 55.2μs (35.8% faster) # The pixel values may not be identical due to dilation/erosion, but should be very close diff = np.abs(np.array(out, dtype=int) - np.array(img, dtype=int)) def test_zoom_upscale(): """Zoom factor >1 should increase image size proportionally.""" img = make_image((10, 20), (0, 255, 0)) codeflash_output = zoom_image(img, 2) out = codeflash_output # 35.2μs -> 29.0μs (21.4% faster) # The output image should still be greenish arr = np.array(out) def test_zoom_downscale(): """Zoom factor <1 should decrease image size proportionally.""" img = make_image((10, 10), (0, 0, 255)) codeflash_output = zoom_image(img, 0.5) out = codeflash_output # 25.3μs -> 21.6μs (17.1% faster) arr = np.array(out) def test_zoom_non_integer_factor(): """Non-integer zoom factors should produce correct output size.""" img = make_image((8, 8), (100, 200, 50)) codeflash_output = zoom_image(img, 1.5) out = codeflash_output # 30.2μs -> 22.8μs (32.1% faster) def test_zoom_no_side_effects(): """The input image should not be modified.""" img = make_image((5, 5), (10, 20, 30)) img_before = np.array(img).copy() codeflash_output = zoom_image(img, 2) _ = codeflash_output # 22.9μs -> 18.3μs (25.0% faster) # ---------------- EDGE TEST CASES ---------------- def test_zoom_zero_factor(): """Zoom factor 0 should be treated as 1 (no scaling).""" img = make_image((7, 13), (50, 100, 150)) codeflash_output = zoom_image(img, 0) out = codeflash_output # 24.6μs -> 20.0μs (23.2% faster) def test_zoom_negative_factor(): """Negative zoom factors should be treated as 1 (no scaling).""" img = make_image((12, 8), (200, 100, 50)) codeflash_output = zoom_image(img, -2) out = codeflash_output # 26.1μs -> 20.0μs (30.4% faster) def test_zoom_large_factor_on_small_image(): """Zooming a small image by a large factor should scale up.""" img = make_image((2, 2), (42, 84, 126)) codeflash_output = zoom_image(img, 10) out = codeflash_output # 42.8μs -> 33.5μs (27.5% faster) def test_zoom_non_rgb_image(): """Function should work with grayscale images (converted to RGB).""" img = PILImage.new("L", (5, 5), 128) # Grayscale img_rgb = img.convert("RGB") codeflash_output = zoom_image(img, 2) out = codeflash_output # 31.0μs -> 25.7μs (20.8% faster) def test_zoom_alpha_channel_image(): """Function should ignore alpha channel and process as RGB.""" img = PILImage.new("RGBA", (6, 6), (100, 150, 200, 128)) img_rgb = img.convert("RGB") codeflash_output = zoom_image(img, 2) out = codeflash_output # 28.0μs -> 24.9μs (12.6% faster) def test_zoom_large_image_upscale(): """Zooming a large image up should work and not crash.""" img = make_image((500, 500), (10, 20, 30)) codeflash_output = zoom_image(img, 1.5) out = codeflash_output # 1.23ms -> 1.09ms (12.5% faster) # Check a corner pixel is still close to original color arr = np.array(out) def test_zoom_large_image_downscale(): """Zooming a large image down should work and not crash.""" img = make_image((800, 600), (200, 100, 50)) codeflash_output = zoom_image(img, 0.5) out = codeflash_output # 942μs -> 923μs (2.03% faster) arr = np.array(out) def test_zoom_maximum_allowed_size(): """Test with the largest allowed image under 1000x1000.""" img = make_image((999, 999), (1, 2, 3)) codeflash_output = zoom_image(img, 1) out = codeflash_output # 1.47ms -> 1.30ms (13.0% faster) arr = np.array(out) def test_zoom_many_colors(): """Test with an image with many colors (gradient).""" arr = np.zeros((100, 100, 3), dtype=np.uint8) for i in range(100): for j in range(100): arr[i, j] = [i * 2 % 256, j * 2 % 256, (i + j) % 256] img = PILImage.fromarray(arr) codeflash_output = zoom_image(img, 0.9) out = codeflash_output # 112μs -> 97.0μs (16.3% faster) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code. ``` ```python from __future__ import annotations import numpy as np # imports from PIL import Image as PILImage from unstructured.partition.utils.ocr_models.tesseract_ocr import zoom_image # --- Helper functions for tests --- def create_test_image(size=(10, 10), color=(255, 0, 0), mode="RGB"): """Create a plain color PIL image for testing.""" return PILImage.new(mode, size, color) # --- Unit tests --- # 1. Basic Test Cases def test_zoom_identity(): """Test zoom=1 returns image of same size and content is similar.""" img = create_test_image((10, 10), (123, 222, 111)) codeflash_output = zoom_image(img, 1) result = codeflash_output # 57.2μs -> 53.3μs (7.43% faster) # The content may not be pixel-perfect due to cv2 conversion, but should be close arr_orig = np.array(img) arr_result = np.array(result) def test_zoom_double_size(): """Test zoom=2 increases both dimensions by 2x.""" img = create_test_image((10, 5), (10, 20, 30)) codeflash_output = zoom_image(img, 2) result = codeflash_output # 38.6μs -> 30.6μs (26.3% faster) def test_zoom_half_size(): """Test zoom=0.5 reduces both dimensions by half (rounded).""" img = create_test_image((10, 6), (200, 100, 50)) codeflash_output = zoom_image(img, 0.5) result = codeflash_output # 29.6μs -> 25.4μs (16.7% faster) def test_zoom_arbitrary_factor(): """Test zoom=1.7 scales image correctly.""" img = create_test_image((10, 10), (0, 255, 0)) codeflash_output = zoom_image(img, 1.7) result = codeflash_output # 30.3μs -> 23.8μs (27.3% faster) expected_size = (int(round(10 * 1.7)), int(round(10 * 1.7))) # 2. Edge Test Cases def test_zoom_zero(): """Test zoom=0 is treated as 1 (no scaling).""" img = create_test_image((8, 8), (50, 50, 50)) codeflash_output = zoom_image(img, 0) result = codeflash_output # 26.3μs -> 23.1μs (13.7% faster) arr_orig = np.array(img) arr_result = np.array(result) def test_zoom_negative(): """Test negative zoom is treated as 1 (no scaling).""" img = create_test_image((7, 9), (100, 200, 50)) codeflash_output = zoom_image(img, -3) result = codeflash_output # 24.4μs -> 20.4μs (19.6% faster) arr_orig = np.array(img) arr_result = np.array(result) def test_zoom_minimal_size(): """Test 1x1 image with zoom=2 and zoom=0.5.""" img = create_test_image((1, 1), (0, 0, 0)) codeflash_output = zoom_image(img, 2) result_up = codeflash_output codeflash_output = zoom_image(img, 0.5) result_down = codeflash_output def test_zoom_non_rgb_image(): """Test grayscale and RGBA images.""" # Grayscale img_gray = PILImage.new("L", (10, 10), 128) # Convert to RGB for function compatibility img_gray_rgb = img_gray.convert("RGB") codeflash_output = zoom_image(img_gray_rgb, 2) result_gray = codeflash_output # 41.8μs -> 54.2μs (22.9% slower) # RGBA img_rgba = PILImage.new("RGBA", (10, 10), (10, 20, 30, 40)) img_rgba_rgb = img_rgba.convert("RGB") codeflash_output = zoom_image(img_rgba_rgb, 0.5) result_rgba = codeflash_output # 22.4μs -> 19.7μs (13.8% faster) def test_zoom_non_integer_zoom(): """Test zoom with non-integer floats.""" img = create_test_image((9, 7), (10, 20, 30)) codeflash_output = zoom_image(img, 1.333) result = codeflash_output # 26.9μs -> 24.6μs (9.32% faster) expected_size = (int(9 * 1.333), int(7 * 1.333)) def test_zoom_unusual_aspect_ratio(): """Test tall and wide images.""" img_tall = create_test_image((3, 100), (1, 2, 3)) codeflash_output = zoom_image(img_tall, 0.5) result_tall = codeflash_output # 31.7μs -> 32.0μs (0.911% slower) img_wide = create_test_image((100, 3), (4, 5, 6)) codeflash_output = zoom_image(img_wide, 0.5) result_wide = codeflash_output # 21.8μs -> 24.0μs (9.20% slower) def test_zoom_large_zoom_factor(): """Test very large zoom factor (e.g., 20x).""" img = create_test_image((2, 2), (255, 255, 255)) codeflash_output = zoom_image(img, 20) result = codeflash_output # 33.6μs -> 26.0μs (29.1% faster) def test_zoom_extreme_color_values(): """Test image with extreme color values (black/white).""" img_black = create_test_image((5, 5), (0, 0, 0)) img_white = create_test_image((5, 5), (255, 255, 255)) codeflash_output = zoom_image(img_black, 1) result_black = codeflash_output # 23.6μs -> 21.3μs (10.8% faster) codeflash_output = zoom_image(img_white, 1) result_white = codeflash_output # 17.5μs -> 14.9μs (17.9% faster) # 3. Large Scale Test Cases def test_zoom_large_image_no_scale(): """Test zoom=1 on a large image.""" img = create_test_image((500, 400), (100, 150, 200)) codeflash_output = zoom_image(img, 1) result = codeflash_output # 300μs -> 274μs (9.51% faster) arr_orig = np.array(img) arr_result = np.array(result) def test_zoom_large_image_upscale(): """Test zoom=2 on a large image.""" img = create_test_image((200, 300), (10, 20, 30)) codeflash_output = zoom_image(img, 2) result = codeflash_output # 446μs -> 415μs (7.60% faster) def test_zoom_large_image_downscale(): """Test zoom=0.5 on a large image.""" img = create_test_image((800, 600), (50, 60, 70)) codeflash_output = zoom_image(img, 0.5) result = codeflash_output # 934μs -> 945μs (1.19% slower) def test_zoom_large_non_square(): """Test large non-square image with zoom=1.5.""" img = create_test_image((333, 777), (123, 45, 67)) codeflash_output = zoom_image(img, 1.5) result = codeflash_output # 1.51ms -> 1.24ms (21.9% faster) expected_size = (int(333 * 1.5), int(777 * 1.5)) def test_zoom_maximum_allowed_size(): """Test image at upper bound of allowed size (1000x1000).""" img = create_test_image((1000, 1000), (222, 111, 0)) codeflash_output = zoom_image(img, 1) result = codeflash_output # 1.81ms -> 1.66ms (8.62% faster) # Downscale codeflash_output = zoom_image(img, 0.1) result_down = codeflash_output # 870μs -> 871μs (0.153% slower) # Upscale (should not exceed 1000*2=2000, which is still reasonable) codeflash_output = zoom_image(img, 2) result_up = codeflash_output # 6.98ms -> 5.98ms (16.7% faster) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code. ``` </details> To edit these changes `git checkout codeflash/optimize-zoom_image-mjcb2smb` and push. [![Codeflash](https://img.shields.io/badge/Optimized%20with-Codeflash-yellow?style=flat&color=%23ffc428&logo=)](https://codeflash.ai) ![Static Badge](https://img.shields.io/badge/🎯_Optimization_Quality-high-green) --------- Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com> Co-authored-by: qued <64741807+qued@users.noreply.github.com>
Author
Parents
Loading