Frequently Asked Questions

The following lines provide answers to some frequently asked questions about different aspects of the reproducibility checks and our data and code availability policy. Click on each question to display the answer.

Scope of Reproducibility Checks at EJ

  • What is the exact nature of the reproducibility checks carried out at the Economic Journal?
    Answer The purpose of the reproducibility checks carried out at the Economic Journal is to verify three aspects of the replication package: (i) it is complete, in the sense of producing each table, figure, and in-text number in the paper and its appendices, including those online; (ii) it is self-contained, in the sense of not requiring a subprogram or module not included in the package; and (iii) the data and code are adequately documented for other researchers to be able to use them to replicate the results in the paper. When the data are accessible (included in the package or, in case of exemptions, via temporary access by the reproducibility team), the checks ensure that the code exactly reproduces the results in the paper and its appendices. In the case of a data exemption, authors may provide simulated or synthetic data to check that the code runs and produces all output, but the exact results cannot be checked.
    Reproducibility checks (not replication checks) are conducted. This means that our checks do not screen for coding errors, discrepancies between what the paper claims the code does and what it actually does, econometric errors, or whether the empirical approach followed in the paper can be reproduced in other environments or other datasets.
  • Are the reproducibility checks implemented on online appendices?
    Answer Yes, the replication package should produce each table, figure, and in-text number in the paper and its appendices, including those online. All these codes are checked for their ability to produce the results in the paper and appendices.
  • Why is the Economic Journal running reproducibility checks? Why not replication checks?
    Answer We firmly believe that reproducibility and replicability are the main pillars of science. The nature of replication checks requires time, effort, and resources that journals typically do not have: the publication process should be speedy for science to advance at the right pace. Our reproducibility checks provide a necessary first step: to ensure that authors publish all available data and the codes that generate the results they present in the papers we publish, and, importantly, to check that these codes and data run and produce the published results. The certification that we provide enhances transparency, since it assures that other researchers can reproduce the published research and test it against other datasets, assumptions, methods, etc. It also provides an additional service to the authors, as we often detect small errors that are better amended before publication than in an erratum afterwards.

Data and Code Availability Policy and Exemptions

  • My paper uses publicly available data. Is it enough to indicate how to get them or should I provide my datasets as part of the replication package?
    Answer Even publicly available data should be included in the replication package to ensure they remain available in the future for anyone who wants to replicate your results. The only exception is when your exact extract is published in a "trusted" repository (see the following list for guidance) with a permanent DOI. This is important, because datasets are often updated (or removed) by the provider, and your version of the data may no longer be available to researchers in the future.
  • My paper uses publicly available data. Does it imply that I certainly have the right to re-publish my dataset along with the replication package? If not, how can I obtain permission to publish the data?
    Answer Each provider offers a different policy regarding re-distribution of original and transformed datasets. Some providers, for example, allow re-distribution as long as your extract is deposited in a specific repository. You should make sure about the restrictions to publish your data before the first submission. You should also make sure to seek permission from the original owner of the data to publish them, and make sure to cite the original source accordingly.
  • Can I request an exemption to publish my data?
    Answer Yes, you can request an exemption on the grounds that the data are restricted-access. The request should be made at the time of initial submission, in a cover letter addressed to the Editor. The Editor in charge of your submission will determine whether your request is justified before submitting the paper to referees. If the Editor decides against the exemption, the manuscript will not be sent to referees, and you will be requested to either accept the data and code availability policy or otherwise the paper will be rejected. Submission fees will not be returned in that case. When an exemption is needed for a dataset that is incorporated to the analysis during the editorial process, the exemption should be requested at the first iteration in which the new data are incorporated.
  • Can I request an exemption that affects only a part of my data?
    Answer Yes, provided that the request is made at the time of initial submission.
  • If my main dataset is available to publish, but there is a small portion of my data that I am not allowed to share, should I request a data exemption?
    Answer Yes. If you do not require a data exemption at the time of your first submission, you will be required to publish all the data used in your paper.
  • If the only data I am not allowed to share is only used in the online appendix, should I request a data exemption?
    Answer Yes. The data to produce all results in the paper and appendices, including those online, should be shared unless an exemption is requested and granted at the time of first submission.
  • If the data I use are publicly available to everyone, but I do not have permission to re-publish it, should I request a data exemption?
    Answer Yes. Unless you are granted an exemption at the time of first submission, you will be required to publish in the replication package all data to produce all results in the paper and appendices, including those online.
  • Can I request an exemption later than the first initial submission?
    Answer In general, no. Later exemptions can only be requested for new data that is incorporated into the analysis during the editorial process. If your data cannot be published and you did not request the exemption at the time of initial submission, your paper may be rejected for publication at Economic Journal.
  • If my data are free of charge and available to any researcher who requests it from the data provider, but I don’t have the right to publish it with the replication package, should I request an exemption?
    Answer Yes. Whenever the data used for the analysis in the paper cannot be published with the replication package (or in an open-access "trusted" repository, see the following list for guidance on what constitutes a "trusted" repository) an exemption needs to be requested at the time of first submission. Only if the exact extract that was used in the study is published in the repository and it is readily available in the exact format that is called by the code, an exemption will not be requested.
  • Some data providers only allow authors to distribute the data in specific open repositories (for example, the Panel Study of Income Dynamics only allows to distribute the data using the OpenICPSR Repository). Do I need to request an exemption in such cases?
    Answer No. Data archived in "trusted" open repositories (see the following list for guidance) is acceptable in the replication package provided what is published is the extract that was used in the study and it is readily available in the exact format that is called by your code. The Data Editor will evaluate the suitability of the repository.
  • If I published my data in an open repository, do I need to include it in the replication package?
    Answer Data archived in "trusted" open repositories (see the following list for guidance) is acceptable in the replication package provided what is published is the extract that was used in the study and it is readily available in the exact format that is called by the code. The Data Editor will evaluate the suitability of the repository and whether or not there is the need of publishing a copy with the package on the journal’s repository.
  • If I publish my replication package on my website (or similar), do I need to submit a replication package?
    Answer Yes. Personal websites are not considered "trusted" open repositories, because there is no guarantee that the package will be systematically archived. See the following list for guidance on what constitutes a "trusted" repository.
  • Can I request an exemption to publish my data because I collected these data and I want to keep exclusivity rights for future research?
    Answer No. The goal of our data and code availability policy is to ensure transparency and reproducibility of research, and this requires publishing the data you collected. If others can use your data, your research will gain visibility.
  • Can I apply for data exemption if my data come from a commercial data provider (Datastream, Orbis, …)?
    Answer Yes. Restricted access data is generally discouraged, but when the nature of your research largely relies on a specific dataset and cannot be conducted on an open alternative, those data are eligible for an exemption. However, you may be requested to provide a certification from the provider indicating that the data will be archived and made available to other users following the same procedure to request access to it.
  • Can I request an exemption for the experimental data I collected?
    Answer In general, no. Data should be anonymized to ensure that subjects cannot be identified. Only when the nature of the study impedes such anonymization, the authors can request a data exemption, which will cover only the required minimum to ensure the anonymity of the experimental subjects.
  • Can I request an exemption to publish my code?
    Answer No.
  • Can I use any software, proprietary and open source?
    Answer Yes. Open source software is encouraged, but licensed software is allowed. If the authors use software which is rather uncommon and requires special licenses, we ask for their cooperation to find a solution (which might entail providing remote access to the authors’ machine to our replicators in extreme cases.).
  • Do I need to publish packages and libraries that are used by my code but not part of the standard distribution of the software used?
    Answer Whenever possible, yes. If these packages or libraries are available in open repositories (e.g. most Stata packages), a clear indication on how to download and use them is sufficient. If the libraries cannot be included in the packages and are not publicly available, the Data Editor will be in contact with the authors to coordinate on a feasible way to implement the checks.

Procedures when Exemptions Are Granted

  • If I was granted a data exemption, how should I proceed with the replication package?
    Answer If you were granted a data exemption, your paper would still need to go through reproducibility checks before final acceptance. In order to do so, you can either (i) grant temporary (distance or physical) access to the data to the reproducibility team for the sole purpose of the checks (the data will be destroyed or access terminated after the checks), or (ii) supply simulated or synthetic dataset(s) instead of the one(s) used in the analysis.
  • What is the difference between simulated data and synthetic data?
    Answer A simulated dataset is generated by a model (ideally, your model). A synthetic dataset is a scrambling or perturbation of the actual dataset to ensure anonymity.
  • Is it better to provide temporary access to the restricted data or to provide a simulated/synthetic dataset?
    Answer Whenever feasible, we strongly recommend providing temporary access to restricted data. There are numerous advantages of this approach: (i) it saves the effort of producing synthetic or simulated datasets; (ii) the certification provided by the journal is stronger in the sense that we certify that we have been able to reproduce the results published in the paper as opposed to only checking that the code is complete, runs, and produces output for all tables, figures, and in-text numbers published in the article and its printed and online appendices; (iii) we can detect if the results cannot be reproduced, which gives the authors a chance to fix any errors before publication.
  • What is the procedure followed by the Economic Journal when I supply restricted datasets for the sole purpose of the reproducibility checks?
    Answer The reproducibility team will treat the data with the highest ethical standards, preventing any violations of confidentiality, and using them exclusively to run the reproducibility checks. The restricted datasets will be destroyed as soon as the checks are performed and, therefore, they will not be published.
  • What shall I do if I am not allowed to provide temporary access to the confidential data, but the data provider can run the code to implement the reproducibility checks?
    Answer Even if you cannot provide direct access to the reproducibility team, this option is preferred to the simulated/synthetic dataset alternative as long as the checks can be executed in a reasonable amount of time. In this case, you need to supply the replication package to the journal and the contact of the data provider. The reproducibility team will send the code to the provider and the provider will send the output back to the team, who will check the results.
  • What can I do if I am not allowed to provide temporary access to the confidential data, but a certification agency (e.g. cascad) can run the code in the original data source?
    Answer This option is still generally preferred to the simulated/synthetic dataset alternative. However, you should seek approval by the Data Editor before making any commitments with the certification agency. The Economic Journal, however, will NOT be able to cover the cost of certification.
  • If my restricted-access data provider has a public use testing sample (smaller sample, or perturbed dataset), can I provide this sample instead of a simulated/ synthetic dataset?
    Answer If this option is available, it is generally preferred to the simulated/synthetic dataset (but less preferred to providing temporary access to the original data) as long as the testing sample can be published with your package. Otherwise, a simulated/synthetic dataset that can be published with the package is preferred.
  • What is the procedure followed by the Economic Journal if I supply simulated/synthetic datasets?
    Answer The simulated/synthetic dataset will be published with the replication package. Even if these are not the real data, their structure, which by design will largely mimic the actual dataset, will give readers a better sense of your data. Please make sure the manipulations used to produce the synthetic/simulated datasets are described in the ReadMe file.
  • Why am I requested to supply a simulated/synthetic data?
    Answer Our view is that, when reproducibility checks cannot be performed on real data, there is still an advantage of running them on such simulated/synthetic datasets: they are still useful to make sure the code is complete and self-contained, and that it runs without errors.
  • My article estimates a non-linear model. The algorithm does not converge with randomly generated data. What shall I do?
    Answer In this case, we strongly recommend simulating data using your model as data generating process. If that is not feasible, please contact the Data Editor explaining in detail why this is the case. The Data Editor will either assist you in the process, and, eventually, s/he will make a proposal to your original Editor about how to handle the situation.
  • How do I decide whether to produce a simulated or a synthetic dataset?
    Answer In order to generate a dataset that mimics the same characteristics as the original one, the synthetic option may be easier. There are many open source routines that do it for you. However, there are also two main disadvantages: (i) you need to make sure that your scrambling/perturbation algorithm ensures correct anonymization of the data; and (ii) non-linear estimation routines may not converge on synthetic data, whereas they are more likely to converge in an artificial dataset generated by the model that you are estimating.
  • How should I produce a synthetic dataset?
    Answer There are multiple ways to generate it. You can find some useful links with helpful resources, mostly in R, here, here, here, and here

Implementation of the Reproducibility Checks

  • How long do the reproducibility checks take?
    Answer We usually provide the outcome of our reproducibility checks in less than two weeks. If the package is not complete or the code does not run, more than one iteration may be required, in which case the processing time might be increased. Articles that require a relatively long running time may take longer. The processing time also depends on how responsive the authors are to our requests.
  • How do the reproducibility checks work?
    Answer The reproducibility checks are handled by our Data Editor and our reproducibility team: a team of advanced Ph.D. students that have been hired to carry out the checks under the supervision of the Data Editor. Once an article is conditionally accepted for publication at the Economic Journal, the authors are requested to submit the replication package along with other production files. Upon submission, the Data Editor assigns the package to one or several members of the reproducibility team. The reproducibility team provides the Data Editor with a report summarizing the outcome of the checks. After reviewing it, the Data Editor contacts the authors informing them about the outcome of the replication checks, and eventually requests them to amend the package if needed. Once the replication checks are completed, the article is transferred back to the original Editor, who is in charge of final acceptance. If results in the paper need to be modified as a result of the checks, the original Editor in charge will be responsible for approving these changes before acceptance. If these changes imply a modification of the message of the paper, the original Editor can decide to reject the paper. Final acceptance is conditional on full reproducibility.
  • Will the Economic Journal run my code?
    Answer Yes. Upon submission, the Data Editor assigns the package to one or several members of the reproducibility team, who will run your code and check the output generated. The reproducibility team provides the Data Editor with a report summarizing the outcome of the checks. In some instances, the code is too demanding to be run in a reasonable amount of time. In such cases, the Data Editor will be in contact with you with a recommendation for supplying a simplified version of the code that allows testing the essential parts of the code.
  • What happens if my code is highly demanding computationally?
    Answer If the code is too demanding to be run in a reasonable amount of time, the Data Editor will be in contact with you with a recommendation for supplying a simplified version of the code that allows testing the essential parts of the code. For example, this can entail a reduced number of replications of a simulation exercise, the code that solves a structural model for a given set of parameters, a simplified function to test an optimization routine, etc. Such a simplified "testing" version will be published along with the original code in your replication package. This is so because we believe that these testing versions are extremely useful for other researchers that want to understand and use your code for replication or their related research, enhancing transparency and increasing the visibility of your research.
  • What happens if the results fail to reproduce?
    Answer If the data and code that you provided fail to replicate the results in the paper, the Data Editor will be in contact with you to identify the source of the discrepancy. Once the reproducibility checks are completed, if the discrepancy implies a change in the results presented in the paper or online appendices, even if minor, the Data Editor will notify it to the original Editor in charge. The Editor in charge will be responsible for approving these changes before acceptance. If these changes imply a modification of the message of the paper, the original Editor can decide to reject the paper. Final acceptance is conditional on full reproducibility.
  • What happens if the replication package I provided is not complete?
    Answer The Data Editor will be in contact with you indicating the amendments and additions that need to be done to the replication package to pass the reproducibility checks. Once amended, the revised package will go through the checks again.
  • What happens if the file I provided is not complete?
    Answer The Data Editor will be in contact with you indicating the amendments and additions that need to be done to the replication package to pass the reproducibility checks. Once amended, the revised package will go through the checks again.
  • Why do I need to resubmit the entire package (instead of only the revised part of it) when I incorporate the feedback received from the Data Editor and the reproducibility team?
    Answer We need you to submit the entire package again because updating the replication package ourselves increases the potential risk that the files you intend to submit for possible publication may be mishandled.

Content of the Replication Package

  • What should be included in the replication package? Please see here.
  • How do I provide physical access to the replication team to my restricted-access data when I have been granted a data exemption?
    Answer Whenever possible, the easiest way is to provide a physical copy of your data by including it in a separate folder labeled "4 Confidential data not for publication" outside of the replication package. All replicators and the Data Editor have signed confidentiality agreements that prevent them to use the data for any other purpose than the reproducibility checks. When that option is not feasible, we recommend you to contact our Data Editor to arrange the best way to provide access to the reproducibility team.
  • Why do I need to submit a signed checklist?
    Answer To ensure that you do not forget all elements of the replication package. This avoids repeated iterations and speeds up the process.
  • Should I respect the folder structure dictated by the checklist, or is it only for orientation?
    Answer Yes, you should and it is very important to do so. When submitted to production, your package is handled by different people at the Economic Journal and at the publisher, not all of them familiarized with data and code. Respecting the folder structure ensures that your package is published correctly.
  • What information should be included in the ReadMe file? Please see here.
  • Should I submit the raw data files and the code that generates my final dataset from them?
    Answer Yes, this is requested by our Data and Code Availability Policy.
  • Why do I need to supply all text documents (ReadMe, IRB, etc.) in PDF format?
    Answer The PDF format is portable, which means that it can be transferred without having to worry about dependencies, fonts, etc. This ensures readability across platforms and users.
  • Why do I need to include a copy of all datasets in non-proprietary format (ASCII, csv, etc.)?
    Answer Some users of your replication package may be not have access to the specific proprietary software that you used for your study. This ensures that they can have access to your data without problems. It also minimizes compatibility issues (e.g., old versions of Stata cannot open files saved by newer versions).

Data Citations

  • What data should I cite?
    Answer All datasets used in the paper (with no exceptions) should be cited both in the paper and in a dedicated section of the ReadMe file.
  • If I mention my datasets in the Online Appendix or in the ReadMe file, should I cite them?
    Answer Yes, all datasets used in the paper (with no exceptions) should be listed in the references section of the paper in the same way that we cite other papers, and a copy of these citations should appear in a dedicated section of the ReadMe file.
  • How should I cite my data?
    Answer You should cite all datasets used in the paper (with no exceptions) in the references section of the paper in the same way that we cite other papers, and a copy of these citations should appear in a dedicated section of the ReadMe file. You can find some examples in page 7 of this document. More specific guidance on data citations is available here.
  • Why should I cite my data?
    Answer Data citations are as fundamental as citations to other papers, if not more. Giving proper credit to data providers is in line with all scientific ethical standards. Moreover, giving proper credit to data providers ensures that they can keep receiving external funding to make their datasets publicly available for research.