Event: Hargreaves’ Review – Data Analytics / Text and Data Mining 01/11/11

This event was held in the Jubilee Room with Copyright for Knowledge, MPs, Lords and guests discussed the Hargreaves recommendation to create an exception in law for text and data mining amongst a group of policy makers and interested stakeholders. Six presentations, several with powerpoint illustrations, were followed by a panel discussion exploring the capabilities and functions of text and data analytics, the legal and copyright environments in which it currently operates and the arguments for and against the exception proposed by Professor Hargreaves in his recent Intellectual Property review. The speakers were: John McNaught – Deputy Director, National Centre for Text Mining, University of Manchest, Jeff Lynn – CEO and Cofounder of Seedrs and Chairman of the Coalition for the Digital Econom, Philip Ditchfield – Contracts and Licensing Manager, GlaxoSmithKline, Prof Lionel Bentley – Centre for Intellectual Property and Information Law, Cambridge UniversityPeter J. Stretton – European Patent Attorney, IBM Stephen Pinfield – Chief Information Officer, Nottingham University.

James Firth from the Open Digital organisation kindly blogged about this event, you kind read the full account in his blog Slighlty Right of Centre   but we have included some of his words here.

Last night’s Parliamentary ICT Forum (PICTFOR) event on copyright and data analytics (“mining” published works for trends and other nuggets of information) was one of the most enjoyable and useful events so far for the newly-merged Parliamentary Committee. There’s currently no copyright exemption (e.g. fair dealing-type justification) for computer processing of published works without permission from the copyright owner, and this can seriously impact academic study and areas such as medical research, we heard.

It can also make it very hard, from a legal perspective, for a rival to Google to emerge in the UK; although IBM’s legal advisor was very careful not to mention the ‘G’-word, instead focussing on the legal uncertainty around processing even unprotected web-based content. “We largely advise against doing it, because of the legal risk. We might even be accused of incitement to commit copyright infringement if we provide tools to enable others to analyse online content.” IBM have an fairly balanced view on data analysis, trying to tread a path between the right to analyse openly published online content and the right of publishers to limit access to other content Although much of the debate was technically involved speaking afterwards to several audience members they found this style of debate highly useful. The panel consisted of Cambridge professor of intellectual property law Lionel Bently; Philip Ditchfield, contacts and licensing manager at Glaxo Smith Kline; COADEC’s Jeff Lynn; John McNaught from the National Centre for Text Mining; Richard Mollet, CE of the Publisher’s Association; Stephen Pinfield, CIO at the University of Nottingham and IBM’s legal advisor on intellectual property Peter Stretton. There was no argument against the usefulness of data mining and “deep” semantic analysis of published works, especially academic journals – but there’s no consensus or simple categorisation of the types of works it would be useful to analyse, either now or in the future.  

For example, linguistics scholars may draw useful conclusions from analysing language used in works of fiction. This got me thinking whether world events such as war, terrorism and recession might influence the mood, themes and language used in fiction published during these times.The debate however focussed mainly on academic research, with a strong emphasis on science. We heard from the National Centre for Text Mining that up to 92% of the content of academic works remained largely invisible to standard academic search tools because subjects and themes were not captured in the abstract.Even “full text search” was inadequate for many purposes because some words and scientific terms are common, yielding tens of thousands of hits.  Analysing the context in which such terms appear can narrow the search and yield useful results – techniques broadly known as semantic analysis.Glaxo Smith Kline suggested that analysis of trends across hundreds or thousands of medical publications might help direct future research, or even yield a breakthrough. Smarter research would lead ultimately to better medicine.Richard Mollet of the Publisher’s Association countered that publishers weren’t averse to allowing controlled access for bona fide research organisations wishing to data-mine their entire works.Semantic analysis techniques [currently*] require the whole text to be made available so that the analyser can have full control over how the text is interpreted. This brings a risk that unscrupulous organisations might steal whole volumes of text, denying the publishers of their prime asset.

[* This is not always true. Some databases allow a level of analysis to be performed by third party algorithms without handing over a full copy of the input text.]

“Access needs to be controlled or regulated in some way. Some people like to paint publishers as bouncers, denying access. That’s not true, we want to help. I like to think of publishers more like maître d’s, guiding clientèle to their table.”

Richard dismissed Jeff Lynn’s (Coadec) suggestion that publishers want to protect an exclusive monopoly over their back catalogue, saying that most publishers had a policy of licensing academic work, and many indeed licensed other uses.

He also noted a lack of demand, saying that only 10-15 requests (per year?) were made to license analysis of published catalogues, but the University of Nottingham CIO Peter Pinfield provided some background for this, explaining that licensing was both complex and restrictive, and this discouraged many researchers from embarking on projects which relied upon data mining.

Jeff Lynn added that many small businesses or enthusiasts would never get the funding or meet the criteria to access the data, yet notable digital advances have come from small businesses or individuals. Open access would allow an army of smaller developers to develop new search and cataloguing techniques or dig for interesting trends.

Some interesting legal points were raised, including a discussion around whether intellectual property (IP) rights were analogous in law to physical property rights. Essentially, yes, said Professor Bently. Intellectual property was protected as a property right under Article 1 of the European Convention on Human Rights (and other treaties), but that didn’t necessarily mean there can’t be exemptions or compulsory (statutory) licensing conditions applied

In fact the absence of a compulsory licensing model could eventually work against the interests of publishers, as a statutory defence to copyright infringement exists when there is no lawful method of licensing content, although this is a complex area that would need to be established in court.

Peter Stretton also made an argument in this theme, noting that automated processes that currently crawled the web and other data sources searching for infringing material could themselves be infringing other people’s copyright.  Essentially tools useful to publishers in detecting infringement could themselves be unlawful under the current copyright regime.

During questions it was pointed out that the solution to many of the issues could lie in statutory licensing – fixed rate license fees for access, set by a tribunal in much the same way as performing a cover version of a song.

There are strong arguments that companies wishing to benefit from other’s IP, such as GSK (as with all other corporations profiting from science), should pay something. Statutory licensing would solve many of the problems highlighted by the University of Nottingham’s CIO.

But Stephen Pinfield countered that the University already pays around £5m a year to licence journals and other content, why should it pay again just to perform a computerised analysis of the content it already licensed?

Other representatives of the Publisher’s Association argued that a change in the law might bring unintended consequences (I read: detrimental to the publishing industry) and that a solution could be found through “a collaborative collective approach.” By collective it can be assumed he means royalty collection societies, in a similar role to the Performing Rights Society in music royalty collection.

But there was a mood in the audience both during the Q&A, and speaking privately afterwards, that publishers have historically been reluctant to act in this area, and would not go far enough under their own volition.


For comments or corrections please email editorial@slightlyrightofcentre.com or call 01252 560 426















Twitter Digg Delicious Stumbleupon Technorati Facebook Email

No comments yet... Be the first to leave a reply!

Leave a Reply

You must be logged in to post a comment.