AfriData Logo AfriData

Data Standards

Guidelines for high-quality, interoperable datasets

Supported File Formats

AfriData supports a wide range of file formats to ensure maximum accessibility and interoperability. Here are our supported formats and recommendations:

Format Extension Support Level Use Case Max Size
CSV
.csv Recommended Structured tabular data 100MB
JSON
.json Recommended Structured data, APIs 50MB
Excel
.xlsx, .xls Supported Spreadsheet data with multiple sheets 25MB
XML
.xml Supported Hierarchical structured data 25MB
Parquet
.parquet Beta Large-scale analytics 500MB
GeoJSON
.geojson Recommended Geographic data 50MB
Shapefile
.shp + .shx + .dbf Supported GIS vector data 100MB
Proprietary
.sav, .dta, .mat Convert Required Statistical software formats N/A

Best Practice

For maximum compatibility and longevity, we recommend using open, non-proprietary formats like CSV, JSON, and GeoJSON. These formats are widely supported and accessible across different platforms and tools.

Metadata Standards

Metadata is crucial for finding and using datasets. Our metadata schema follows international standards while accommodating African-specific context.

Basic Information

  • Dataset title and description
  • Creator/author information
  • Creation and modification dates
  • Version information
  • License and usage rights

Geographic Context

  • Country/region coverage
  • Coordinate system (if applicable)
  • Administrative boundaries
  • Urban/rural classification
  • Language(s) used

Temporal Coverage

  • Data collection period
  • Temporal resolution
  • Update frequency
  • Historical context
  • Seasonal considerations

Methodology

  • Data collection methods
  • Sampling techniques
  • Quality control measures
  • Processing steps
  • Limitations and biases

Example Metadata Schema (JSON)

// AfriData Metadata Schema
{
  "title": "Kenya Agricultural Survey 2024",
  "description": "Comprehensive survey of agricultural practices...",
  "creator": {
    "name": "Dr. Jane Doe",
    "affiliation": "University of Nairobi",
    "email": "jane.doe@uonbi.ac.ke"
  },
  "geographic_coverage": {
    "country": "Kenya",
    "regions": ["Central", "Eastern", "Western"],
    "coordinate_system": "WGS84"
  },
  "temporal_coverage": {
    "start_date": "2024-01-01",
    "end_date": "2024-12-31",
    "frequency": "Annual"
  },
  "methodology": {
    "collection_method": "Survey",
    "sample_size": 5000,
    "sampling_method": "Stratified random sampling"
  },
  "license": "CC BY 4.0",
  "version": "1.0",
  "created_date": "2024-03-15",
  "file_format": "CSV",
  "file_size": "45.7 MB"
}

Data Quality Standards

High-quality data is essential for reliable research and analysis. Our quality standards ensure datasets meet international best practices.

Completeness

Ensure your dataset is as complete as possible:

  • Missing values < 5% per column
  • Clear indication of null/missing data
  • Explanation for missing values
  • Complete geographic coverage
Target 95%+

Accuracy

Verify data accuracy through validation:

  • Cross-validation with external sources
  • Outlier detection and handling
  • Unit consistency checks
  • Temporal consistency validation
Target 98%+

Timeliness

Ensure data is current and relevant:

  • Regular update schedule
  • Clear versioning system
  • Timestamp for data collection
  • Deprecation notices for old data
Target 90%+

Consistency

Maintain consistent data formats:

  • Standardized naming conventions
  • Uniform data types
  • Consistent units of measurement
  • Harmonized categorical values
Target 95%+

Quality Assurance

All datasets undergo automated quality checks upon submission. Datasets not meeting minimum quality thresholds will be flagged for review before publication.

Documentation Standards

Comprehensive documentation ensures your dataset can be understood and used effectively by other researchers and practitioners.

Data Dictionary

  • Column/variable descriptions
  • Data types and formats
  • Valid ranges and constraints
  • Relationships between variables
  • Coding schemes for categorical data

Methodology Document

  • Research objectives and questions
  • Sampling methodology
  • Data collection procedures
  • Quality control measures
  • Known limitations and biases

Technical Documentation

  • Processing scripts and code
  • Software versions and dependencies
  • Hardware specifications
  • Computational environment details
  • Reproducibility instructions

Usage Guidelines

  • Intended use cases
  • Appropriate analysis methods
  • Citation requirements
  • Contact information for questions
  • Update and maintenance schedule

Example Data Dictionary Entry

// Variable: household_income
{
  "name": "household_income",
  "description": "Monthly household income in Kenyan Shillings",
  "type": "numeric",
  "format": "integer",
  "unit": "KES",
  "range": {
    "min": 0,
    "max": 500000
  },
  "missing_values": "-999",
  "notes": "Self-reported income; may include informal sources"
}

Ethics and Privacy Standards

Ethical data sharing is fundamental to AfriData. All datasets must comply with ethical guidelines and privacy regulations.

Privacy Protection

Personal and sensitive data must be properly anonymized or aggregated. Direct identifiers should be removed, and indirect identifiers should be assessed for re-identification risks.

Informed Consent

Data subjects must have provided informed consent for data collection and sharing. If consent was not explicitly obtained for public sharing, data must be sufficiently anonymized.

Fairness and Non-discrimination

Datasets should not perpetuate or amplify existing biases. Consider the potential for discriminatory use and provide appropriate warnings or safeguards.

Cultural Sensitivity

Respect cultural contexts and sensitivities. Engage with local communities and stakeholders when appropriate, especially for data about indigenous or marginalized populations.

Anonymization Requirements

  • Remove direct identifiers (names, IDs, addresses)
  • Assess quasi-identifiers (age, location, profession)
  • Apply k-anonymity (k≥5) for sensitive data
  • Use differential privacy for high-risk datasets
  • Document anonymization methods

Ethical Review

  • Institutional Review Board (IRB) approval
  • Ethics committee review documentation
  • Data sharing agreements
  • Consent forms and protocols
  • Risk assessment documentation

Ethics Review Required

All datasets containing human subjects data must undergo ethics review before publication. Contact our ethics committee if you're unsure about requirements.

Submission Process

Follow these steps to ensure your dataset meets our standards and is successfully published on AfriData.

1
Prepare Your Dataset

Ensure your data is in a supported format, properly structured, and cleaned. Remove any sensitive information and create comprehensive documentation.

2
Complete Metadata

Fill out all required metadata fields using our online form. Provide detailed descriptions, geographic coverage, and methodology information.

3
Upload Files

Upload your dataset files, documentation, and any supplementary materials. Ensure file sizes are within limits and formats are supported.

4
Quality Check

Our automated system will run quality checks on your dataset. Review any flagged issues and make necessary corrections.

5
Peer Review

Your dataset will undergo peer review by domain experts. This typically takes 2-4 weeks depending on complexity and reviewer availability.

6
Publication

Once approved, your dataset will be published with a DOI and made available to the research community. You'll receive a notification with the publication details.

Need Help?

Our curation team is available to assist with the submission process. Contact us at info.jhub@jkuat.ac.ke for guidance.

Additional Resources

Explore these resources to learn more about data standards and best practices for African research contexts.

Templates and Tools

Coming Soon

Metadata Template

JSON schema for dataset metadata

Coming Soon

Data Dictionary Template

Excel template for variable definitions

Coming Soon

Data Quality Checker

Automated validation tool

Coming Soon

Anonymization Guide

Best practices for privacy protection

Training Materials

Coming Soon

Data Management

Comprehensive course

Coming Soon

Research Ethics

Ethics workshop materials

Coming Soon

Metadata Best Practices

Guidelines for tagging and description

Coming Soon

Open Science

Open science principles

External Standards

Support

Community Forum

Connect with other researchers

FAQ

Frequently Asked Questions

Coming Soon

Technical Support

Get help with technical issues

Coming Soon

Monthly Webinars

Live training sessions