Skip to main content
Artificial IntelligenceMuseumsStorytelling

AI-Readable vs. Machine-Readable Data: Impact on Dataset Creation Practices

By February 2, 2026February 4th, 2026No Comments

by Leonardo de Araújo, PhD

Modern AI systems seek to understand context, relationships, and meaning in data – not just retrieve isolated facts. This contrasts with traditional dataset design that favors rigid, structured fields (e.g. title, author, date, subject, etc) optimized for keyword search. Over-fragmenting information into strict metadata fields can make data machine-readable (easy for databases to filter and sort) without being truly AI-readable. In other words, when narrative context is stripped away, an AI may struggle to interpret the data’s full meaning. Recent discussions [1] highlight how practices optimized for databases can unintentionally hinder AI understanding. This recognition is driving changes in how datasets are created, especially in fields like cultural heritage, and prompting new technical strategies and institutional practices to make data more “AI-friendly.” Key impacts include a greater emphasis on preserving narrative context, using knowledge graphs and linked data, and updating metadata practices to capture richer relationships and meanings.

From “Thin” to “Thick” Information Systems

Traditional catalog databases were designed with a quantitative, efficiency-first mindset, recording isolated facts in standardized fields at the expense of contextual richness. This results in “thin information systems” that lack meaningful relationships, historical depth, and interpretative layers [2]. Today, cultural heritage projects are embracing “thick information systems” that capture not just facts but also context, interpretations, and narratives. For example, the ResearchSpace platform uses knowledge graphs to represent data as a network of meaningful relations, rather than discrete entries. This allows museums and archives to retain scholarly context and even differing interpretations of artifacts in digital form. As one analysis explains, narrative and researchers’ thought processes often get lost when converting rich textual descriptions into structured data, so new tools aim to preserve that complexity in [2] machine-processable ways .

Emphasizing Context and Relationships

Cultural heritage institutions are increasingly unifying siloed collections and metadata to highlight connections between people, objects, places, and events. Instead of data being locked in disconnected systems or single-item records, it’s linked by meaning and narrative context. A knowledge graph approach, for instance, can link an artifact to related historical figures, locations, and themes, providing AI with a web of relationships rather than a flat record. This shift is evident in projects like Smithsonian’s Revolution Crossroads, which aggregated data from multiple collections and published it on an open platform (HuggingFace). This moves cultural heritage data out of disconnected institutional silos and into a unified dataset where AI tools can explore cross-collection relationships and context. Such unified data hubs allow researchers and AI to perform cross-referenced knowledge navigation and semantic searches that were not possible with isolated catalogs. The end goal is for cultural data to “deliver meaning” – enabling storytelling, historical insight, and deep discovery – rather than just serve as a lookup table .

Including Narrative and Qualitative Metadata

There is a recognition that descriptive narratives and explanations must accompany structured data to be truly AI-readable. Cultural heritage metadata practices are beginning to include longer-form descriptions, curatorial notes, and even community-sourced interpretations as part of the dataset. For instance, in addressing biased or outdated terminology in museum records, experts found that simply swapping a term isn’t enough – one must add explanatory context in the metadata about why that term was used and what it means [3]. Workshop participants in a cultural heritage AI forum stressed the need to record explanations and historical nuance (e.g. why a derogatory term appears in an object’s description) alongside the data. By embedding such narratives and context, the dataset preserves cultural complexities and helps AI systems interpret sensitive content more accurately. This approach acknowledges that standardized fields or controlled vocabularies alone can “omit nuances and cultural complexities,” [3] especially for artifacts with complicated histories. Thus, an artifact’s digital record might now include its story or provenance narrative, not just an object name and date, giving AI additional text to learn the context from.

Linked Open Data and External Knowledge

Many cultural institutions are leveraging Linked Open Data (LOD) to enrich context. LOD allows museums to connect their data to external knowledge bases (like Wikidata or Getty vocabularies), effectively linking an object’s metadata to broader concepts and definitions. For example, the Rijksmuseum’s linked data connects a painting to the concept of “oil paint” in the Getty Art & Architecture Thesaurus [3]. This practice means an AI can draw on a wider graph of information (materials, techniques, historical biographies, etc.), rather than a standalone record. While LOD brings standardization and connectivity, professionals note it must be applied thoughtfully to avoid flattening complex historical contexts. Nonetheless, publishing data as LOD is becoming a common practice to make cultural datasets more interoperable and interpretable by AI, since the relationships between entities are explicitly encoded.

In summary, modern AI systems need more than rigid, field-based metadata: they perform best when data preserves narrative context and explicit relationships. Traditional cultural heritage catalogs often produce “thin” records—efficient for sorting and keyword search, but poor at conveying meaning, provenance, interpretation, and historical nuance. In response, institutions are moving toward “thick information systems” that retain scholarly and community context while remaining machine-processable. Key strategies include knowledge graphs and linked data to represent collections as networks of people, objects, places, and events; unifying siloed datasets to enable cross-collection discovery; and expanding metadata to include longer-form descriptions, curatorial notes, and explanations—especially where biased or outdated terminology appears. Linked Open Data further enriches records by connecting them to external vocabularies and knowledge bases, improving interoperability and interpretability. Overall, the shift reframes cultural datasets from lookup tables into meaning-rich resources that better support AI-driven research, storytelling, and semantic exploration.

[1] Giving Our Data a Hug (and a Home on Hugging Face)

[2] Bridging Narratives and Data: ResearchSpace 4.0.0 and the Power of Knowledge Graphs

[3] Learnings from ‘AI and heritage’: inclusive metadata requires more than erasing stereotyping terms