Currently many cultural heritage institutions (CHIs) choose not to digitise their copyright works and make them accessible online because the resources required to carry out manual searches for rights holders (so that they have an evidence base) are too onerous. Some might take a more risk managed approach – but the risks can perturb many of them from taking this approach. The net result is that millions of cultural assets remain warehoused and valuable cultural assets unsurfaced to the public. UK Government has estimated that there are approximately 91 million orphan works across CHIs.
‘Diligent searches’, are those which need to be performed to a standard where a rights holder will be identified if reasonably possible. The UK Government’s Intellectual Property Office (UKIPO) describes what it regards as acceptable, listing resources the applicant to its Orphan Works Licensing Scheme (OWLS) should have considered searching for each type of work. No two searches are identical, and the UKIPO will assess whether the applicant’s search is satisfactory before awarding a licence. The resources used to trace rights holders include ones where clear positive matched are possible, such as collecting societies and rights holder representatives, the use of reverse image searches, commercial image suppliers and more general information sources such as news archives and Google searches.
The human process involves manually undertaking each of those searches, logging into individual provider accounts or searching their archives, or sifting through internet search results. This manual search for rights holders, together with costs associated with the application for an OWLS or other management arrangements can cost in the region of £150 per work.
The OWH Search platform uses AI to manage a set of tools (agents) to deliver the orphan works diligent search process across multiple images. These tasks include interrogating the work to extract metadata (dates, locations, names etc) which will be used in subsequent assessments, evaluating whether a work is in copyright, image matching, interrogating partner resources via APIs etc and locating any potential rights holders. We are adding tasks and agents as we expand the range of media we can work in (from images and text, eventually to sound and moving images for example) and to increase the sophistication of searches.
There will be cases where we report that the search is confident that no copyright holder can be found yet one emerges later to assert their rights. We will off set this likelihood by offering an additional ‘OWH Maintain’ service for customers to pay for annual searches. Our searches will be better than any human-based search currently available (as well as costing 90% less!) but there will be claims and the ‘OWH Manage’ aspect of our model is designed to accommodate exactly that scenario, triage rights assertions and facilitate licence payments when legitimate claims are made. Clients understand that licence fees will sometimes have to be paid and our ‘OWH Connect’ service will be an add-on service that can be purchased for clients to be connected to rights holders.
OWH does not use your images or any of your data to train AI. We don’t retrain it with anything you upload. All we do is give it a bit of extra context when needed (for example, copyright duration rules), but your data never becomes part of the model. We use Azure-hosted LLMs, so your prompts and outputs are isolated to our Azure environment and are not shared with other customers or made available to model providers to train or improve their models.
Security-wise, OWH is hosted on Microsoft Azure, so we benefit from enterprise-grade controls: encryption in transit and at rest, tenant isolation, role-based access controls, monitoring and logging, and a strong compliance posture. We have also deployed in a UK Azure region ensuring that any data processing complies with UK GDPR.
Agentic AI means an LLM-powered application that has agency. The LLM is given an objective (“Is this an orphan work?”) and uses its ‘intelligence’ to determine which tools and agents it needs to use in which order to meet the objective. The tools and agents are code snippets, APIs, databases, text and other LLMs that the ‘orchestrator’ LLM has access to. As it carries out the analysis it reflects on how well it is working and adjusts the process accordingly. In the OWH application, we use the latest and largest available models from OpenAI to maximise the ‘reasoning’ capabilities and ensure it has carried out the necessary diligent search.
At the core of the application are two Large Language Models: the largest available model from OpenAI and its ‘mini’ equivalent. These are not ‘traditional AI’ in the sense of ‘narrow’ machine learning models that predict, cluster, etc, but ‘general-purpose’ models. The largest LLM has excellent vision capabilities, which we are exploiting to the full. The mini LLM is used for some of the simpler text-based prompts as it is quicker and cheaper. The vision capabilities are used, for example, to determine the type of image, the age and the location if there is no associated metadata available.
OWH Search is hosted on Microsoft Azure, and it is deployed in the UK to align with UK GDPR and the Data Protection Act 2018. Prompts, files, and results are processed within our Azure environment and are not made available to other customers or to model providers to train or improve their models.
The models are deployed on OWH’s Microsoft Azure environment on a serverless basis. OWH pays for their use per token (which is effectively the number of words or pixels it processes). As the models are on Azure, there is no data leakage outside of the OWH environment and, importantly, outside of the UK.
No. Customer content is used only to run your search and generate your report. Under Azure’s enterprise AI data protections, prompts and completions are isolated to the Azure service boundary and are neither shared with other customers, nor are used to train or improve foundation models.
No. Your data is used only used by the model to run your searches and generate your results (for example, producing the diligence report). It is not used to train, fine‑tune, or improve any AI models as part of delivering the service. Any AI processing happens for your request “in the moment” and does not persist as learning that changes the underlying model.
OWH uses enterprise AI services from OpenAI hosted in Microsoft Azure, meaning the models are served within Azure’s controlled environment with strong security and contractual safeguards. In some configurations we may also support alternative model providers, but we do not send your data to public consumer AI services. The third-party models are only used under terms and technical controls that prevent reuse of customer content for training and restrict processing to the agreed service boundary.
Third‑party access is restricted by both technical controls and contract terms. Model providers do not get open access to your files, prompts, or results; processing happens within a tenant‑isolated cloud environment. Access is limited to authorised personnel on a least‑privilege basis, with encryption, monitoring and audit logs in place. Where external lookup tools or APIs are used (for example, rights registries), we limit what is shared to the minimum necessary (and can pseudonymise where appropriate), and we are transparent about what is queried.
Data is encrypted when it moves between systems (in transit) and when stored (at rest) using Azure’s encryption controls. Where needed, we can also support additional customer requirements such as customer-managed keys (so you control the encryption key) depending on the final architecture and services selected.
Access is restricted to authorised OWH personnel on a least-privilege basis. In Azure we use role-based access control, logging/monitoring, and (where appropriate) private networking to limit exposure. Your content is logically isolated from other tenants/customers.
We only retain customer content for as long as needed to deliver the service and meet agreed operational requirements (for example, report delivery, quality assurance, and any contracted support/maintenance). Retention periods can be agreed contractually, and deletion can be requested in line with those terms.
Azure maintains a broad portfolio of independent certifications and compliance offerings (including ISO/IEC 27001 and SOC reports) and provides detailed audit artefacts via the Microsoft Service Trust Portal. This helps customers meet requirements under GDPR/UK GDPR alongside their own organisational controls.
No. Your content is used to provide the service and is not sold or shared. When we use Azure-hosted models, the processing stays within the Azure service boundary; prompts/outputs are not made available to model providers for training. If we integrate any external specialist data sources, we’ll be transparent about what is queried and, where needed, can minimise or pseudonymise what is sent.
We follow an incident response process aligned to cloud security best practices, including investigation, containment, remediation, and customer notification as required by contract and applicable law. Azure provides extensive monitoring, logging, and security tooling that supports rapid detection and response. Our Data Protection Officer will instigate a data breach incident procedure if required.
Yes. Depending on your requirements and the final deployment model, we can support patterns such as private networking, IP allow-lists, role-based access controls, detailed audit logs, and (where supported) customer‑managed encryption keys. We’ll agree the right controls during security review and onboarding.
Yes. Depending on your requirements and the final deployment model, we can support patterns such as private networking, IP allow-lists, role-based access controls, detailed audit logs, and (where supported) customer‑managed encryption keys. We’ll agree the right controls during security review and onboarding.
The models are not trained by OWH. They have been trained by OpenAI. We use a technique called ‘Retrieval Augmented Generation’ to provide the model with the necessary context (copyright regulations, etc) but the model itself remains unchanged. A future iteration may use a fine-tuned version of an LLM (where we change some of the model weights through supplementary training) but this training data will be owned, controlled and generated by OWH.
We have carried out comprehensive prompt engineering to ensure we mitigate the hallucinations as much as possible (hallucinations are an inherent challenge in all LLMs and will never be reduced to zero). The RAG approach mentioned above also significantly reduces the risk of hallucinations. We have tested the model’s responses using the domain expertise from Naomi Korn Associates and are happy that the model’s answers are acceptable and accurate.