Tag Archives: Open Data Pilot Project

Opening up Open Access: European Union, UK, Argentina, US, and Vancouver (Canada)

There is a furor growing internationally and it’s all about open access. It ranges from a petition in the US to a comprehensive ‘open access’ project from the European Union to a decision in the Argentinian Legislature to a speech from David Willetts, UK Minister of State for Universities and Science to an upcoming meeting in June 2012 being held in Vancouver (Canada).

As this goes forward, I’ll try to be clear as to which kind of open access I’m discussing,  open access publication (access to published research papers), open access data (access to research data), and/or both.

The European Commission has adopted a comprehensive approach to giving easy, open access to research funded through the European Union under the auspices of the current 7th Framework Programme and the upcoming Horizon 2020 (or what would have been called the 8th Framework Pr0gramme under the old system), according to the May 9, 2012 news item on Nanowerk,

To make it easier for EU-funded projects to make their findings public and more readily accessible, the Commission is funding, through FP7, the project ‘Open access infrastructure for research in Europe’ ( OpenAIRE). This ambitious project will provide a single access point to all the open access publications produced by FP7 projects during the course of the Seventh Framework Programme.

OpenAIRE is a repository network and is based on a technology developed in an earlier project called Driver. The Driver engine trawled through existing open access repositories of universities, research institutions and a growing number of open access publishers. It would index all these publications and provide a single point of entry for individuals, businesses or other scientists to search a comprehensive collection of open access resources. Today Driver boasts an impressive catalogue of almost six million taken from 327 open access repositories from across Europe and beyond.

OpenAIRE uses the same underlying technology to index FP7 publications and results. FP7 project participants are encouraged to publish their papers, reports and conference presentations to their institutional open access repositories. The OpenAIRE engine constantly trawls these repositories to identify and index any publications related to FP7-funded projects. Working closely with the European Commission’s own databases, OpenAIRE matches publications to their respective FP7 grants and projects providing a seamless link between these previously separate data sets.

OpenAIRE is also linked to CERN’s open access repository for ‘orphan’ publications. Any FP7 participants that do not have access to an own institutional repository can still submit open access publications by placing them in the CERN repository.

Here’s why I described this project as comprehensive, from the May 9, 2012 news item,

‘OpenAIRE is not just about developing new technologies,’ notes Ms Manola [Natalia Manola, the project’s manager], ‘because a significant part of the project focuses on promoting open access in the FP7 community. We are committed to promotional and policy-related activities, advocating open access publishing so projects can fully contribute to Europe’s knowledge infrastructure.’

The project is collecting usage statistics of the portal and the volume of open access publications. It will provide this information to the Commission and use this data to inform European policy in this domain.

OpenAIRE is working closely to integrate its information with the CORDA database, the master database of all EU-funded research projects. Soon it should be possible to click on a project in CORDIS (the EU’s portal for research funding), for example, and access all the open access papers published by that project. Project websites will also be able to provide links to the project’s peer reviewed publications and make dissemination of papers virtually effortless.

The project participants are also working with EU Members to develop a European-wide ‘open access helpdesk’ which will answer researchers’ questions about open access publishing and coordinate the open access initiatives currently taking place in different countries. The helpdesk will build up relationships and identify additional open access repositories to add to the OpenAIRE network.

Meanwhile, there’s been a discussion on the UK’s Guardian newspaper website about an ‘open access’ issue, money,  in a May 9, 2012 posting by John Bynner,

The present academic publishing system obstructs the free communication of research findings. By erecting paywalls, commercial publishers prevent scientists from downloading research papers unless they pay substantial fees. Libraries similarly pay huge amounts (up to £1m or more per annum) to give their readers access to online journals.

There is general agreement that free and open access to scientific knowledge is desirable. The way this might be achieved has come to the fore in recent debates about the future of scientific and scholarly journals.

Our concern lies with the major proposed alternative to the current system. Under this arrangement, authors are expected to pay when they submit papers for publication in online journals: the so called “article processing cost” (APC). The fee can amount to anything between £1,000 and £2,000 per article, depending on the reputation of the journal. Although the fees may sometimes be waived, eligibility for exemption is decided by the publisher and such concessions have no permanent status and can always be withdrawn or modified.

A major problem with the APC model is that it effectively shifts the costs of academic publishing from the reader to the author and therefore discriminates against those without access to the funds needed to meet these costs. [emphasis mine] Among those excluded are academics in, for example, the humanities and the social sciences whose research funding typically does not include publication charges, and independent researchers whose only means of paying the APC is from their own pockets. Academics in developing countries in particular face discrimination under APC because of their often very limited access to research funds.

There is another approach that could be implemented for a fraction of the cost of commercial publishers’ current journal subscriptions. “Access for all” (AFA) journals, which charge neither author nor reader, are committed to meeting publishing costs in other ways.

Bynner offers a practical solution, get the libraries to pay their subscription fees to an AFA journal, thereby funding ‘access for all’.

The open access discussion in the UK hasn’t stopped with a few posts in the Guardian, there’s also support from the government. David Willetts, in a May 2, 2012 speech to the UK Publishers Association Annual General Meeting had this to say, from the UK’s Dept. for Business Innovation and Skills website,

I realise this move to open access presents a challenge and opportunity for your industry, as you have historically received funding by charging for access to a publication. Nevertheless that funding model is surely going to have to change even beyond the positive transition to open access and hybrid journals that’s already underway. To try to preserve the old model is the wrong battle to fight. Look at how the music industry lost out by trying to criminalise a generation of young people for file sharing. [emphasis mine] It was companies outside the music business such as Spotify and Apple, with iTunes, that worked out a viable business model for access to music over the web. None of us want to see that fate overtake the publishing industry.

Wider access is the way forward. I understand the publishing industry is currently considering offering free public access to scholarly journals at all UK public libraries. This is a very useful way of extending access: it would be good for our libraries too, and I welcome it.

It would be deeply irresponsible to get rid of one business model and not put anything in its place. That is why I hosted a roundtable at BIS in March last year when all the key players discussed these issues. There was a genuine willingness to work together. As a result I commissioned Dame Janet Finch to chair an independent group of experts to investigate the issues and report back. We are grateful to the Publishers Association for playing a constructive role in her exercise, and we look forward to receiving her report in the next few weeks. No decisions will be taken until we have had the opportunity to consider it. But perhaps today I can share with you some provisional thoughts about where we are heading.

The crucial options are, as you know, called green and gold. Green means publishers are required to make research openly accessible within an agreed embargo period. This prompts a simple question: if an author’s manuscript is publicly available immediately, why should any library pay for a subscription to the version of record of any publisher’s journal? If you do not believe there is any added value in academic publishing you may view this with equanimity. But I believe that academic publishing does add value. So, in determining the embargo period, it’s necessary to strike a suitable balance between enabling revenue generation for publishers via subscriptions and providing public access to publicly funded information. In contrast, gold means that research funding includes the costs of immediate open publication, thereby allowing for full and immediate open access while still providing revenue to publishers.

In a May 22, 2012 posting at the Guardian website, Mike Taylor offers some astonishing figures (I had no idea academic publishing has been quite so lucrative) and notes that the funders have been a driving force in this ‘open access’ movement (Note: I have removed links from the excerpt),

The situation again, in short: governments and charities fund research; academics do the work, write and illustrate the papers, peer-review and edit each others’ manuscripts; then they sign copyright over to profiteering corporations who put it behind paywalls and sell research back to the public who funded it and the researchers who created it. In doing so, these corporations make grotesque profits of 32%-42% of revenue – far more than, say, Apple’s 24% or Penguin Books’ 10%. [emphasis mine]

… But what makes this story different from hundreds of other cases of commercial exploitation is that it seems to be headed for a happy ending. That’s taken some of us by surprise, because we thought the publishers held all the cards. Academics tend to be conservative, and often favour publishing their work in established paywalled journals rather than newer open access venues.

The missing factor in this equation is the funders. Governments and charitable trusts that pay academics to carry out research naturally want the results to have the greatest possible effect. That means publishing those results openly, free for anyone to use.

Taylor also goes on to mention the ongoing ‘open access’ petition in the US,

There is a feeling that the [US] administration fully understands the value of open access, and that a strong demonstration of public concern could be all it takes now to goad it into action before the November election. To that end a Whitehouse.gov petition has been set up urging Obama to “act now to implement open access policies for all federal agencies that fund scientific research”. Such policies would bring the US in line with the UK and Europe.

The people behind the US campaign have produced a video,

Anyone wondering about the reference to Elsevier may want to check out Thomas Lin’s Feb. 13, 2012 article for the New York Times,

More than 5,700 researchers have joined a boycott of Elsevier, a leading publisher of science journals, in a growing furor over open access to the fruits of scientific research.

You can find out more about the boycott and the White House petition at the Cost of Knowledge website.

Meanwhile, Canadians are being encouraged to sign the petition (by June 19, 2012), according to the folks over at ScienceOnline Vancouver in a description o f their June 12, 2012 event, Naked Science; Excuse: me your science is showing (a cheap, cheesy, and attention-getting  title—why didn’t I think of it first?),

Exposed. Transparent. Nude. All adjectives that should describe access to scientific journal articles, but currently, that’s not the case. The research paid by our Canadian taxpayer dollars is locked behind doors. The only way to access these articles is money, and lots of it!

Right now research articles costs more than a book! About $30. Only people with university affiliations have access and only journals their libraries subscribe to. Moms, dads, sisters, brothers, journalists, students, scientists, all pay for research, yet they can’t read the articles about their research without paying for it again. Now that doesn’t make sense.

….

There is also petition going around that states that research paid for by US taxpayer dollars should be available for free to US taxpayers (and others!) on the internet. Don’t worry if you are Canadian citizen, by signing this petition, Canadians would get access to the US research too and it would help convince the Canadian government to adopt similar rules. [emphasis mine]

Here’s where you can go to sign the petition. As for the notion that this will encourage the Canadian government to adopt an open access philosophy, I do not know. On the one hand, the government has opened up access to data, notably Statistics Canada data, mentioned by Frances Woolley in her March 22, 2012 posting about that and other open access data initiatives by the Canadian government on the Globe and Mail blog,

The federal government is taking steps to build the country’s data infrastructure. Last year saw the launch of the open data pilot project, data.gc.ca. Earlier this year the paywall in front of Statistics Canada’s enormous CANSIM database was taken down. The National Research Council, together with University of Guelph and Carleton University, has a new data registration service, DataCite, which allows Canadian researches to give their data permanent names in the form of digital object identifiers. In the long run, these projects should, as the press releases claim, “support innovation”, “add value-for-money for Canadians,” and promote “the reuse of existing data in commercial applications.”

That seems promising but there is a countervailing force. The Canadian government has also begun to charge subscription fees for journals that were formerly free. From the March 8, 2011 posting by Emily Chung on the CBC’s (Canadian Broadcasting Corporation) Quirks and Quarks blog,

The public has lost free online access to more than a dozen Canadian science journals as a result of the privatization of the National Research Council’s government-owned publishing arm.

Scientists, businesses, consultants, political aides and other people who want to read about new scientific discoveries in the 17 journals published by National Research Council Research Press now either have to pay $10 per article or get access through an institution that has an annual subscription.

It caused no great concern at the time,

Victoria Arbour, a University of Alberta graduate student, published her research in the Canadian Journal of Earth Sciences, one of the Canadian Science Publishing journals, both before and after it was privatized. She said it “definitely is too bad” that her new articles won’t be available to Canadians free online.

“It would have been really nice,” she said. But she said most journals aren’t open access, and the quality of the journal is a bigger concern than open access when choosing where to publish.

Then, there’s this from the new publisher, Canadian Science Publishing,

Cameron Macdonald, executive director of Canadian Science Publishing, said the impact of the change in access is “very little” on the average scientist across Canada because subscriptions have been purchased by many universities, federal science departments and scientific societies.

“I think the vast majority of researchers weren’t all that concerned,” he said. “So long as the journals continued with the same mission and mandate, they were fine with that.”

Macdonald said the journals were never strictly open access, as online access was free only inside Canadian borders and only since 2002.

So, journals that offered open access to research funded by Canadian taxpapers (to Canadians only) are now behind paywalls. Chung’s posting notes the problem already mentioned in the UK Guardian postings, money,

“It’s pretty prohibitively expensive to make things open access, I find,” she {Victoria Arbour] said.

Weir [Leslie Weir, chief librarian at the University of Ottawa] said more and more open-access journals need to impose author fees to stay afloat nowadays.

Meanwhile, the cost of electronic subscriptions to research journals has been ballooning as library budgets remain frozen, she said.

So far, no one has come up with a solution to the problem. [emphasis mine]

It seems they have designed a solution in the UK, as noted in John Bynner’s posting; perhaps we could try it out here.

Before I finish up, I should get to the situation in Argentina, from the May 27, 2012 posting on the Pasco Phronesis (David Bruggeman) blog (Note: I have removed a link in the following),

The lower house of the Argentinian legislature has approved a bill (en Español) that would require research results funded by the government be placed in institutional repositories once published.  There would be exceptions for studies involving confidential information and the law is not intended to undercut intellectual property or patent rights connected to research.  Additionally, primary research data must be published within 5 years of their collection.  This last point would, as far as I can tell, would be new ground for national open access policies, depending on how quickly the U.S. and U.K. may act on this issue.

Argentina steals a march on everyone by offering open access publication and open access data, within certain, reasonable constraints.

Getting back to David’s May 27, 2012 posting, he offers also some information on the European Union situation and some thoughts  on science policy in Egypt.

I have long been interested in open access publication as I feel it’s infuriating to be denied access to research that one has paid for in tax dollars. I have written on the topic before in my Beethoven inspires Open Research (Nov. 18, 2011 posting) and Princeton goes Open Access; arXiv is 10 years old (Sept. 30, 2011 posting) and elsewhere.

ETA May 28, 2012: I found this NRC Research Press website for the NRC journals and it states,

We are pleased to announce that Canadians can enjoy free access to over 100 000 back files of NRC Research Press journals, dating back to 1951. Access to material in these journals published after December 31, 2010, is available to Canadians through subscribing universities across Canada as well as the major federal science departments.

Concerned readers and authors whose institutes have not subscribed for the 2012 volume year can speak to their university librarians or can contact us to subscribe directly.

It’s good to see Canadians still have some access, although personally, I do prefer to read recent research.

ETA May 29, 2012: Yikes, I think this is one of the longest posts ever and I’m going to add this info. about libre redistribution and data mining as they relate to open access in this attempt to cover the topic as fully as possible in one posting.

First here’s an excerpt  from  Ross Mounce’s May 28, 2012 posting on the Palaeophylophenomics blog about ‘Libre redistribution’ (Note: I have removed a link),

I predict that the rights to electronically redistribute, and machine-read research will be vital for 21st century research – yet currently we academics often wittingly or otherwise relinquish these rights to publishers. This has got to stop. The world is networked, thus scholarly literature should move with the times and be openly networked too.

To better understand the notion of ‘libre redistribution’ you’ll want to read more of Mounce’s comments but you might also  want to check out Cameron Neylon’s comments in his March 6, 2012 posting on the Science in the Open blog,

Centralised control, failure to appreciate scale, and failure to understand the necessity of distribution and distributed systems. I have with me a device capable of holding the text of perhaps 100,000 papers It also has the processor power to mine that text. It is my phone. In 2-3 years our phones, hell our watches, will have the capacity to not only hold the world’s literature but also to mine it, in context for what I want right now. Is Bob Campbell ready for every researcher, indeed every interested person in the world, to come into his office and discuss an agreement for text mining? Because the mining I want to do and the mining that Peter Murray-Rust wants to do will be different, and what I will want to do tomorrow is different to what I want to do today. This kind of personalised mining is going to be the accepted norm of handling information online very soon and will be at the very centre of how we discover the information we need.

This moves the discussion past access (taxpayers not seeing the research they’ve funded, researchers who don’t have subscriptions, libraries not have subscriptions, etc.)  to what happens when you can get access freely. It opens up new ways of doing research by means of text mining and data mining redistribution of them both.

Public access to publicly funded research; a consultation in the US

There are two requests from the US White House’s Office of Science and Technology Policy (OSTP) for information about public access to publicly funded research. From the Nov.4, 2011 posting by David Bruggeman on his Pasco Phronesis blog,

In today’s Federal Register there are two requests for comment on the topic of public access to federally funded research.  They come from the Office of Science and Technology Policy (OSTP).  One focuses on the digital data produced by that research, the other concerns the publications that result from this research.  … part of the reauthorization of the America COMPETES Act.  The report is focused on determining standards and policies to help ensure long-term preservation and access to digital data and research publications produced from federally funded research.

So one request for information (RFI) is about open access to scientific data and the other is about open access to published research. The RFI for open access to scientific data is more detailed. Some 13 questions are asked, responders may choose to address their own open data access issues rather answering the questions. The questions are  split into two categories: (1) Preservation, Discoverability, and (2) Access and Standards for Interoperability, Re-Use and Re-Purposing. The deadline for responses on this request is January 12, 2012.

The RFI for public access to peer-reviewed, publicly funded research in scholarly publications is less detailed with eight questions being asked.  There’s this one for example,

(1) Are there steps that agencies could take to grow existing and new markets related to the access and analysis of peer-reviewed publications that result from federally funded scientific research? How can policies for archiving publications and making them publically accessible be used to grow the economy and improve the productivity of the scientific enterprise? What are the relative costs and benefits of such policies? What type of access to these publications is required to maximize U.S. economic growth and improve the productivity of the American scientific enterprise?

For this RFI, respondents need to meet a January 2, 2012 deadline.

Both of the RFIs ask questions about how open access can grow the economy. Although I didn’t see any reference to the economy when I was checking out a Canadian government pilot project ( Open Data Pilot Project) I expect we are just as interested in possible economic benefits as our US neighbour. (I mentioned the Canadian project in my March 13, 2011 posting.)

Canada’s Open Data Pilot Project

I’m a little confused (some claim that I’m perpetually so). Yesterday (March 17, 2011), I read an announcement (David Eaves blog) about the Canadian federal government’s launch of its Open Data Pilot Project and on checking out the website discovered a backgrounder with this,

Government of Canada Open Data Portal

The Government of Canada produces and acquires data in areas such as health, environment, agriculture, and natural resources. The goal of the GC Open Data Portal is to create socio-economic opportunities and promote informed participation by the public by expanding access to federal government data. [emphasis mine]

The GC Open Data Portal is a collaborative effort amongst Government of Canada departments and agencies to provide access to data managed by the government that can be leveraged by citizens, businesses, and communities for their own purposes. The government will work towards making public data that is not sensitive in nature (i.e. data which is NOT personal, secret, or confidential) broadly available in reusable formats.

Government of Canada’s Open Data Pilot Project

The GC Open Data pilot project will enhance access to Government datasets by providing a “single-window” to data already published by individual departments and agencies on their public Websites.

If I understand it rightly, the pilot project represents a one-stop shop for a limited number of datasets whereas the portal links you to a larger number of government datasets where each must be approached separately.

I noted the reference to creating “socio-economic opportunities,”  a topic which forms the subtext for a lot of the discourse about ‘innovation’ (I’ve written about innovation any numbers of times most recently in the context of a submission to a public consultation, March 15, 2011 posting.)

I like the idea of open data since I believe that the public should have access to the research paid for through taxes. One quibble, I’m not so sure about the claims being made about “socio-economic opportunties.” For example, David Eaves, a public policy entrepreneur, open government activist and negotiation expert, mentions this in his June 22, 2009 posting,

Look no further than the City of Washington DC. It created a publicly available database of city collected and created data and asked local individuals and companies to use it. The result? A $50,000 dollar investment in changing processes and offering prize money has so far yielded $2.3M in value. That’s a 46 times return on investment in one year. [emphasis mine]

I started* looking for the research/data supporting the claim of a 46-fold return on investment in one year. I followed the link Eaves provided and ended up at the Apps for Democracy About webpage,

In the fall of 2008, DC’s Office of the Chief Technology Officer asked iStrategyLabs how it could make DC.gov’s revolutionary Data Catalog useful for the citizens, visitors, businesses and government agencies of Washington, DC. The Data Catalog contains all manner of open public data featuring real-time crime feeds, school test scores, and poverty indicators, and is the most comprehensive of its kind in the world.

Our solution was to create Apps for Democracy – a contest that cost Washington, DC $50,000 and returned 47 iPhone, Facebook and web applications with an estimated value in excess of $2,600,000 to the city. [emphasis mine]

It sounds exciting but no links were provided to data that would support the claim or give me information on how the numbers were derived. Shockingly, I did not stop here. Next was an Oct. 8, 2010 article by Eaves in BC Business online where he discusses the importance of open data and cites some examples from the City of Vancouver’s initiative,

Businesses have been analyzing government data for decades, to help refine property valuations or decide where to open a new office or store. Open data reduces the transaction costs of getting this and other types of information. No more letters, phone calls or special requests; just visit the website and download what you need. Bing Thom Architects, for example, recently used public data about Vancouver’s shorelines to examine the impact of a rising sea level on development in the city. No permission or requests were ever sought; they just took what they needed.

Open data also means new business opportunities by adding value to government data. In Vancouver, for example, two local web developers, Luke Closs and Kevin Jones, launched Vantrash, a website that digitizes the garbage schedule and sends users an email the day before their garbage day. It’s a useful service in a city where garbage day shifts every month.

In Eaves’ first example, Bing Thom’s company saves money by eliminating a bureaucratic process and presumably, the government employee is freed to do other work. Precisely, how does ‘moi’, the taxpayer benefit? Did the money saved by Thom’s company get used to generate a new job? Did the government employee start doing something that improves the situation in the city? In short, where is the data?

Where Vantrash (Eaves’ second example)  is concerned, they accept donations for their website. The website does not offer any data about this ‘socio-economic opportunity’ to  suggest that it generates enough revenue to offer its developers a wage. Note: If garbage collection is an issue for you and you live in Vancouver, check them out.

As I stated earlier, I like the open data principle. Taxpayers should have access to the data they have funded.

The ‘open data’ discussion bears a lot of similarity to the ‘innovation’ discussion. Both these of these concepts are intended to drive Canadians to explore and generate ‘socio-economic opportunities’. Both concepts generate a lot of excitement. And, I’d like to understand both concepts a little better.

* ‘start’ changed to ‘started’ for better grammar on Sept. 12, 2014.