Streamline data discovery with precise technical identifier search in Amazon SageMaker Unified Studio

We’re excited to introduce a new enhancement to the search experience in Amazon SageMaker Catalog, part of the next generation of Amazon SageMaker—exact match search using technical identifiers. With this capability, you can now perform highly targeted searches for assets such as column names, table names, database names, and Amazon Redshift schema names by enclosing search terms in a qualifier such as double quotes (" "). This yields results with exact precision, dramatically improving the speed and accuracy of data discovery.

In this post, we demonstrate how to streamline data discovery with precise technical identifier search in Amazon SageMaker Unified Studio.

Solving real-world discovery challenges

In large, enterprise-scale environments, discovering the right dataset often hinges on pinpointing specific technical identifiers. Users frequently search for exact terms like "customer_id" or "sales_summary_2023" – but conventional keyword and semantic searches often return related results, instead of the exact match.

With the new qualified search capability, entering "customer_id" will surface only those assets whose technical name matches exactly—eliminating noise, saving time, and improving confidence in discovery. Whether you’re a data analyst seeking a specific metric or a data steward validating metadata compliance, this update delivers a more precise, governed, and intuitive search experience.

Built for complex, high-scale catalogs

This feature builds on existing keyword and semantic search capabilities in SageMaker Unified Studio and adds an important layer of control for customers managing complex data catalogs with intricate naming conventions. By reducing time spent filtering partial matches and improving the relevance of results, this enhancement streamlines workflows and helps maintain metadata quality across domains.

One such customer is NatWest, a global banking leader operating across thousands of assets:

“In our complex data ecosystem, discovering the right assets quickly is paramount. In a data-driven banking environment, the new exact and partial match search capabilities in SageMaker Unified Studio have been transformative. By enabling precise discovery of critical attributes like loan IDs and party IDs across thousands of data assets, we’ve dramatically accelerated insight generation while strengthening our metadata governance. This feature cuts through complexity, reduces search time, minimizes errors, and fosters unprecedented collaboration across our data engineering, analytics, and business teams.”

— Manish Mittal, Data Marketplace Engineering Lead, NatWest

Key benefits

With this new capability, SageMaker Catalog users can:

Quickly locate precise data assets – Search using known technical names—like "customer_id" or "revenue_code" – to immediately surface the right datasets without sifting through irrelevant results.
Reduce false positives and ambiguous matches – Alleviate confusion caused by keyword or semantic searches that return loosely matched results, improving trust in the search experience.
Accelerate productivity across data roles – Analysts, stewards, and engineers can find what they need faster—reducing delays in reporting, validation, and development cycles.
Strengthen governance and compliance – Surface and validate critical naming conventions and metadata standards (for example, columns prefixed with "pii_" or "audit_" will return all column names starting with pii or audit) to support policy enforcement and audit readiness.

Example use cases

This feature can help the following roles in different use cases:

Data analysts – A business analyst preparing a margin analysis report searches for "profit_margin" to locate the exact field across multiple sales datasets. This reduces time-to-insight and makes sure the right metric is used in reporting.
Data stewards – A governance lead searches for terms like "audit_log" or "classified_pii" to confirm that all required classifications and logging conventions are in place. This helps enforce data handling policies and validate catalog health.
Data engineers – A platform engineer performs a search for "temp_" or "backup_" to identify and clean up unused or legacy assets created during extract, transform, and load (ETL) workflows. This supports data hygiene and infrastructure cost optimization.

Solution demo

To demonstrate the exact match filter solution, we have ingested an individual asset loaded from the TPC-DS tables and also created data product bundling of assets.

The following screenshot shows an example of the data product.

The following screenshot shows an example of the individual assets.

Next, the data analyst wants to search all assets that have customer login details. The customer login is stored as the "c_login" field in the assets.

With the technical identifier feature, the data analyst directly searches the catalog with the identifier "c_login" to get the required results, as shown in the following screenshot.

The data analyst can verify that the login information is present in the returned result.

Conclusion

The addition of precise technical identifier search in SageMaker Unified Studio reinforces a step toward enhancing data discovery and usability in complex data ecosystems. By providing search capabilities based on technical identifiers, this feature addresses the needs of diverse stakeholders, enabling them to efficiently locate the assets they require.

As data continues to grow in scale and complexity, SageMaker Unified Studio remains committed to delivering features that simplify data management, improve productivity, and enable organizations to unlock actionable insights. Start using this enhanced search capability today and experience the difference it brings to your data discovery journey.

Refer to the product documentation to learn more about how to set up metadata rules for subscription and publishing workflows.

About the Authors

Ramesh H Singh is a Senior Product Manager Technical (External Services) at AWS in Seattle, Washington, currently with the Amazon SageMaker team. He is passionate about building high-performance ML/AI and analytics products that enable enterprise customers to achieve their critical goals using cutting-edge technology. Connect with him on LinkedIn.

Pradeep Misra is a Principal Analytics Solutions Architect at AWS. He works across Amazon to architect and design modern distributed analytics and AI/ML platform solutions. He is passionate about solving customer challenges using data, analytics, and AI/ML. Outside of work, Pradeep likes exploring new places, trying new cuisines, and playing board games with his family. He also likes doing science experiments, building LEGOs and watching anime with his daughters.

Rajat Mathur is a Software Development Manager at AWS, leading the Amazon DataZone and SageMaker Unified Studio engineering teams. His team designs, builds, and operates services which make it faster and easier for customers to catalog, discover, share, and govern data. With deep expertise in building distributed data systems at scale, Rajat plays a key role in advancing AWS’s data analytics and AI/ML capabilities.

Jie Lan is a Software Engineer at AWS based in New York, where he works on the Amazon SageMaker team. He is passionate about developing cutting-edge solutions in the big data and AI space, helping customers leverage cloud technology to solve complex problems.