Digital Asset Management (DAM) systems haven’t changed fundamentally over the last 15 years – at their heart they are still databases of digital files and associated metadata. The metadata serves two key purposes: it provides information about the files and it enables them to be found in searches.
Most of that metadata is still entered by humans. For example almost all DAM applications provide fields like title, description and keywords, which are populated by users during the upload process or shortly after. Typically, these users look at the images one-by-one and then type the data into fields on a web form.
It still takes time to add good metadata
Years ago, when I was helping clients plan how they would migrate their existing files into their new DAM application, I used to reckon that on average it took about 30 seconds for an experienced user to enter the metadata for an image. Sure, user interfaces have got better and pages load more quickly, but I’m guessing that the time this takes is still much the same today. When you look at how users interact with a DAM application, entering metadata is still one of the most time-consuming activities.
What if this time could be reduced to zero? That’s the holy grail of the DAM upload process, after all time is money. So it is no surprise that DAM vendors have been watching the emerging industry of Artificial Intelligence (machine learning) and visual recognition with keen interest.
How can machine learning help?
Visual recognition APIs that can suggest tags for images have been available online for some time but Google caused quite a stir when it announced its beta version of Google Cloud Vision last December. If anyone can produce a step-change in this area it’s Google, right?* Has AI come of age at last, at least in how it can be applied to help digital asset management?
This is the question we wanted to answer when we began a project to investigate how to best apply machine learning to a DAM system. Our vision was to develop a tool that adds machine learning capabilities to any DAM system that provides a decent REST API, starting with our own Asset Bank, removing the need for humans to enter any keywords manually. We knew this was optimistic, but we thought we would aim high and see how far we could get using existing visual recognition services.
Are computer-generated tags good enough?
It quickly became apparent that the tags provided by the APIs were very variable in quality, depending on the subject domain of the images. As you might expect, the results were impressive for images of generic subjects such as beaches, wildlife and landscapes but not so good for more complicated scenes, or those containing domain-specific subjects. Until these services provide effective feedback loops, i.e. the means to learn effectively and easily from a client’s own set of images and to prioritise its subject domain, they are not going to work well for organisations that want to tag images of specific subjects, such as their products. Hopefully this functionality will appear soon in these APIs as they are missing a trick. Most DAM applications are full of images and high-quality metadata, entered by humans – perfect learning opportunities.
What did we develop?
The first application we developed, Autotagger, automatically adds tags into one of the asset’s attribute fields after they are uploaded. This works well for clients who are generally happy with the suggested tags and, when not, want to change them by editing the assets in their DAM application.
We developed a second application, Quicktagger, once we realised that although visual recognition technology is not yet at the point where it can be left completely alone to tag images, when combined with human oversight it can speed up tagging dramatically. In Quicktagger we developed a user interface that makes it easy for a user to accept and reject the auto-suggested tags, and using the tags to group the images intelligently- so that a user can quickly apply their own tags to multiple images at once.
Both options can be used with Asset Bank on a regular basis during the upload process and to speed up initial data migration projects.
Which services did we use?
Both Autotagger and Quicktagger use Clarifai and Google Cloud Vision to provide the tags, as these gave the best results in our tests. Quicktagger has been developed so it can work with any online auto-tagging API, so we can plug in other services such as Imagga and IBM Watson if required.
Over to you
And what’s next for us?
We are currently investigating whether we can plug the gap we see in the visual recognition APIs by adding a layer of our own machine learning over the top of the auto-suggested tags, so we can provide a service that has learned from all the metadata already in a client’s DAM application.
We would be delighted to hear from other people who are exploring the potential of using artificial intelligence in DAM, or who have ideas for how we can improve Autotagger or Quicktagger. Please get in touch.
* Unfortunately, our findings are that it hasn’t – at least not yet. Google Cloud Vision doesn’t provide significantly better results than the best of the other APIs. In fact, we think Clarifai’s results are slightly better, at least for the images we and our clients have been uploading. This might change – all these image recognition APIs are continually learning and will change and improve over time.