Mapping explosion from dynamic fields
Unbounded dynamic mapping creates thousands of fields, bloats cluster state and degrades performance. We explicitly map all fields, disable dynamic mapping on ingestion indices.
Elasticsearch 8 OpenSearch Vector search GDPR
Elasticsearch powers search across three of our highest-traffic production systems — ANT's PropTech marketplace with multi-language property search and geo filters, REHAU's B2B portal with SAP-synced product catalog, AutoParts' fitment search with cross-store inventory. Faceted search, vector kNN, ILM cost tiering and zero-downtime reindexing — all standard in our Elasticsearch deployments.
We deliver Elasticsearch and OpenSearch engineering for product catalogs and marketplaces requiring faceted search, B2B portals with ERP-synced inventory, observability platforms ingesting structured logs and metrics, and AI teams adding hybrid BM25 + vector search for semantic retrieval. Mapping design, shard strategy, ILM cost tiering and zero-downtime reindexing are part of every engagement.
Challenges
Unbounded dynamic mapping creates thousands of fields, bloats cluster state and degrades performance. We explicitly map all fields, disable dynamic mapping on ingestion indices.
Too many small shards waste heap memory and slow search. We target 20–50GB per shard and use ILM rollover policies to prevent runaway shard proliferation.
Post-query facet aggregations reflect only the filtered result set, not the full corpus. We implement post-filter to keep facet counts accurate.
Mapping changes on live indices require reindex — which blocks production traffic naively. We use index aliases for zero-downtime reindexing with the reindex API.
Application dual-write misses failures silently and diverges from PostgreSQL. We replace dual-write with a CDC Debezium pipeline for guaranteed sync.
Logs and analytics indices grow unboundedly without ILM. We implement hot-warm-cold-delete tiering with automated phase transitions.
Solutions
Multi-language analysis, facets, geo search, inventory sync and relevance tuning for product catalogs.
ERP-synced product search with part number lookup, multi-tenant ACL filters and structured data extraction.
Lexical and semantic search combined with RRF for RAG retrieval and recommendation systems.
Observability pipelines ingesting structured logs, APM traces and metrics with Kibana dashboards and alerting.
Alias-based reindexing strategy for mapping migrations and analyser changes without search downtime.
Hot-warm-cold-delete tiering, shard consolidation and force-merge on read-only indices to cut storage costs.
Stack
Elasticsearch 8, OpenSearch 2, Amazon OpenSearch Service, Kibana, OpenSearch Dashboards, Logstash, Beats, Debezium, ILM, kNN vector search, RRF, Terraform.
Compliance
GDPR-aligned · SOC 2-capable · HIPAA-capable · CCPA-acknowledged
Shared: TLS + mTLS, role-based index ACLs, SBOM for client libraries.
Cases

Property marketplace web platform with listing CMS, search and B2B admin console for US and EU operators.

B2B e-commerce and product configurator for a global polymer manufacturer with multi-region pricing, stock and dealer workflows.

Laravel + React auto-parts storefront with real-time 1C ERP inventory sync — fitment search, card payments, personal account, US & EU launch.
Why YuSMP
We design explicit mappings before ingesting data — no dynamic field explosion, no retroactive reindexing surprises.
Hot-warm-cold tiering wired at cluster setup — storage costs do not surprise you after 12 months of index growth.
We implement combined BM25 and kNN pipelines with RRF — better relevance than either approach alone, tuned to your corpus.
FAQ
OpenSearch (AWS fork) for teams on AWS who want to avoid Elastic's proprietary licensing and integrate tightly with Amazon OpenSearch Service. Elasticsearch for teams using Elastic Cloud or the ELK stack where Kibana and APM integration matters. Both are API-compatible for most use cases — we design application code to be portable between them.
We separate keyword fields (exact-match filters, aggregations) from text fields (full-text search with analysis), use nested objects for variant attributes, set appropriate analyzers per language, and define index aliases for zero-downtime reindex operations. Mapping design is the most impactful single decision for search quality.
Terms aggregations for discrete facets (brand, category), range aggregations for numeric facets (price, rating), and nested aggregations for variant-aware facets. We post-filter after aggregation to keep facet counts accurate for the entire result set, not just the current filter selection.
ILM (Index Lifecycle Management) to tier indices: hot (SSD), warm (HDD), cold (frozen) and delete — based on query frequency and retention requirements. Shard count tuned to ~20–50GB per shard. Rollup indices for long-term aggregations. Force-merge on closed indices. We audit shard counts and index sizes in every engagement.
kNN vector search in Elasticsearch 8+ or OpenSearch is production-ready for semantic search, recommendation and RAG retrieval augmentation. We use it alongside BM25 lexical search in a hybrid RRF (Reciprocal Rank Fusion) pipeline for better relevance than either approach alone.
CDC via Debezium Kafka Connect pipeline for near-real-time sync. Elasticsearch Logstash JDBC input for batch sync where Kafka is overhead. Application-level dual-write for simple cases where the risk of divergence is acceptable. We avoid dual-write in regulated systems and prefer the CDC pipeline.
Response within 1 business day. NDA on request.