Risk Area: Technical
Technological and operational limitations
Many institutions face significant resource limitations, both in staffing and technological infrastructure, which hinder their ability to digitise and manage open access collections. Operational challenges such as inadequate metadata, insufficient procedural workflows, and technical vulnerabilities are commonly cited as barriers to effectively securing and maintaining accessible digital collections.
“Open datasets must be refined from all personal and institutional secret information before publishing. Unrefined datasets will compromise technical-system vulnerabilities (e.g. IP and digital address info). Last but not least, wrong-indexed datasets may cause plagiarism."
Research Institution, Turkey
Limited resources for digitisation, digital preservation and open access management including insufficient funding, staffing, skills to maintain, and provide reliable access to digital collections.
Challenges with metadata quality and technological infrastructure, such as inconsistent or incomplete metadata, legacy systems, limited interoperability, and other technical constraints.
Vulnerabilities in securing and maintaining digital collections including risks related to cybersecurity, data loss, system failures, and the ongoing effort required to ensure continuity over time.
High-volume, automated scraping by AI bots generating sudden, swarming traffic that exceeds system design assumptions, overwhelms servers, degrades performance or causes outages, distorts analytics, and creates escalating infrastructure costs.
Streamline metadata practices: Implement flexible, layered metadata frameworks that accommodate diverse cultural heritage works, support ongoing enhancement, and reflect ethical and community considerations, while applying standardised metadata practices to improve data quality and searchability, even where resources are constrained.
Use open-source software or third-party platforms: Where budget constraints exist, consider open-source solutions or third-party platforms (such as Flickr Commons, Wikimedia Commons, or KC Works for Institutions) for managing and displaying digital collections.
Allocate staff roles for digital collections: Designate specific staff or train existing personnel to manage open access collections, even if as a shared responsibility.
Plan for long-term digital stewardship: Adopt organisation-wide digital strategies that cover collections, platforms, staffing, funding, and partnerships – not just individual projects or systems. Digital preservation and resilience depends on continuity of people, skills, and decision-making as much as on infrastructure.
Ensure ongoing software maintenance: Allocate time and staffing for regular software and security maintenance, not only server patching but also application-level updates and code health. Extended periods without substantive maintenance (e.g. two years or more) should be treated as a risk indicator triggering review, resourcing, or migration planning.
Reduce single-person and single-vendor dependency: Ensure more than one staff member or contractor is familiar with core systems and workflows. Maintain documentation, handover plans, and succession arrangements to reduce vulnerability to staff loss or sudden budget cuts.
Align with Trustworthy Digital Repository principles: Where applicable, follow Trustworthy Digital Repository (TDR) guidance, or community frameworks such as COAR’s Good Practices in Repositories, including organisational infrastructure for maintenance, contingency planning, and continuous monitoring to determine when contingency plans should be executed.
Ensure regular security audits: Schedule regular reviews of digital infrastructure to identify and address technical vulnerabilities.
Treat AI training-related scraping as a persistent operational risk requiring improved monitoring, realistic infrastructure planning, and carefully balanced mitigation strategies, rather than relying on voluntary signals or restrictive access measures that undermine open access goals.
Explore pay-to-crawl technical systems to automate compensation for when digital cultural heritage content is accessed by machines.
Training and guidelines on how to build low-cost digitising facilities and how to maintain the data:
Mobile Digitising (MobiDig) project online platform offering an open and innovative training on the topic for librarians, archivists, managers of small organizations and Vocational Education and Training (VET) teachers in the field of library science.
Pavis, M., Wallace, A., Saunders, S. (2023) Doing Digitisation on a Budget: A Guide to Low-Cost Digital Projects, supported by The National Lottery Heritage Fund.
Training and guidelines on digital preservation:
Digital Preservation Southampton - Training: They offer regular workshops providing hands-on experience and introducing attendees to key frameworks in fields across digital preservation, records management, and digital forensics.
Digital Preservation Coalition’s resources for supporting digital preservation activities.
Open source collection management systems: There are options that offer cost-effective solutions for managing digital collections.
Access to Memory (AtoM) - web-based, open source application for standards-based archival description and access in a multilingual, multi-repository environment.
Omeka - open-source web publishing platforms for sharing digital collections and creating media-rich online exhibits.
CollectionSpace - web-based, open-source collections management software for cultural heritage organizations, museums & more.
Tainacan - open source, flexible and powerful repository platform for creating digital archives in WordPress.
Thoth - non-profit, open metadata management and dissemination platform.
Metadata standards and principles: Using consistent metadata standards and principles aids in interoperability and usability of online collections.
FAIR principles are a set of guidelines to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets. The principles emphasise machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in volume, complexity, and creation speed of data.
Dublin Core Metadata Terms is a general purpose metadata vocabulary for describing resources of any type. The Dublin Core Metadata Initiative (DCMI) is responsible for maintaining the Dublin Core vocabulary.
International Image Interoperability Framework (IIIF) is a set of open standards for delivering high-quality, attributed digital images and audio/visual files from servers to different environments on the Web where they can then be viewed and interacted with in many ways.
GLAM-E Lab (2024) Image and Metadata Handbook for Wikimedia Commons
GLAM-E Lab (2024) Sandbox Template for Wikimedia Commons Metadata Management
Collections Trust’s Digitising Collections Resources - What information should I record? includes a set of frameworks, guides and signposts of other resources.
CC Signals: Explore Creative Commons’ preference signals framework (a simple pact between creators and AI developers), review the implementation proposal, and share your feedback.
Pay-to-Crawl: Explore where CC Stands on Pay-to-Crawl.
Low-cost digitisation ideas from real life projects in Pavis, M., Wallace, A., Saunders, S. (2023) Doing Digitisation on a Budget: A Guide to Low-Cost Digital Projects, supported by The National Lottery Heritage Fund.
How this risk area connects to other risk areas
Technical capacity is a foundation for legal compliance, ethical stewardship, financial sustainability, and geopolitical resilience. Weak infrastructure, undocumented systems, or staff dependency increase exposure to data loss, service outages, legal non-compliance, and political disruption. Conversely, investments in open standards, preservation workflows, redundancy, and staff capacity strengthen the institution’s ability to sustain open access even under financial constraints or external pressure.
Use cases
Research data managers don’t make datasets open access, because inadequate or technically misleading data may spread rapidly and be misinterpreted. This can negatively impact the credibility of the research institution and result in difficulties securing future funding.
Digital archivists don’t make raw datasets open access, because unrefined data might contain sensitive personal or institutional information. This could lead to security breaches or exploitation of system vulnerabilities, affecting the safety and privacy of both the institution and its stakeholders.
Last updated