test(langchain): fix benchmark quality issues from code review
- Move middleware construction inside benchmarked lambdas for fresh instances
- Rework memory test to observation-only with print output (no hard assertion)
- Add deeply-nested Pydantic schema tool (RouteSchema) to LARGE_TOOLS (15 tools)
- Update docstrings to document '10 accesses per iteration' in schema benchmarks
- Fix bare `_ =` pattern in schema benchmarks (bare expressions)
- Mark memory test with @pytest.mark.benchmark to exclude from normal runs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>