Supported File Formats
AfriData supports a wide range of file formats to ensure maximum accessibility and interoperability. Here are our supported formats and recommendations:
| Format | Extension | Support Level | Use Case | Max Size |
|---|---|---|---|---|
| CSV | .csv | Recommended | Structured tabular data | 100MB |
| JSON | .json | Recommended | Structured data, APIs | 50MB |
| Excel | .xlsx, .xls | Supported | Spreadsheet data with multiple sheets | 25MB |
| XML | .xml | Supported | Hierarchical structured data | 25MB |
| Parquet | .parquet | Beta | Large-scale analytics | 500MB |
| GeoJSON | .geojson | Recommended | Geographic data | 50MB |
| Shapefile | .shp + .shx + .dbf | Supported | GIS vector data | 100MB |
| Proprietary | .sav, .dta, .mat | Convert Required | Statistical software formats | N/A |
Best Practice
For maximum compatibility and longevity, we recommend using open, non-proprietary formats like CSV, JSON, and GeoJSON. These formats are widely supported and accessible across different platforms and tools.
Metadata Standards
Metadata is crucial for finding and using datasets. Our metadata schema follows international standards while accommodating African-specific context.
Basic Information
- Dataset title and description
- Creator/author information
- Creation and modification dates
- Version information
- License and usage rights
Geographic Context
- Country/region coverage
- Coordinate system (if applicable)
- Administrative boundaries
- Urban/rural classification
- Language(s) used
Temporal Coverage
- Data collection period
- Temporal resolution
- Update frequency
- Historical context
- Seasonal considerations
Methodology
- Data collection methods
- Sampling techniques
- Quality control measures
- Processing steps
- Limitations and biases
Example Metadata Schema (JSON)
// AfriData Metadata Schema
{
"title": "Kenya Agricultural Survey 2024",
"description": "Comprehensive survey of agricultural practices...",
"creator": {
"name": "Dr. Jane Doe",
"affiliation": "University of Nairobi",
"email": "jane.doe@uonbi.ac.ke"
},
"geographic_coverage": {
"country": "Kenya",
"regions": ["Central", "Eastern", "Western"],
"coordinate_system": "WGS84"
},
"temporal_coverage": {
"start_date": "2024-01-01",
"end_date": "2024-12-31",
"frequency": "Annual"
},
"methodology": {
"collection_method": "Survey",
"sample_size": 5000,
"sampling_method": "Stratified random sampling"
},
"license": "CC BY 4.0",
"version": "1.0",
"created_date": "2024-03-15",
"file_format": "CSV",
"file_size": "45.7 MB"
}
Data Quality Standards
High-quality data is essential for reliable research and analysis. Our quality standards ensure datasets meet international best practices.
Completeness
Ensure your dataset is as complete as possible:
- Missing values < 5% per column
- Clear indication of null/missing data
- Explanation for missing values
- Complete geographic coverage
Accuracy
Verify data accuracy through validation:
- Cross-validation with external sources
- Outlier detection and handling
- Unit consistency checks
- Temporal consistency validation
Timeliness
Ensure data is current and relevant:
- Regular update schedule
- Clear versioning system
- Timestamp for data collection
- Deprecation notices for old data
Consistency
Maintain consistent data formats:
- Standardized naming conventions
- Uniform data types
- Consistent units of measurement
- Harmonized categorical values
Quality Assurance
All datasets undergo automated quality checks upon submission. Datasets not meeting minimum quality thresholds will be flagged for review before publication.
Documentation Standards
Comprehensive documentation ensures your dataset can be understood and used effectively by other researchers and practitioners.
Data Dictionary
- Column/variable descriptions
- Data types and formats
- Valid ranges and constraints
- Relationships between variables
- Coding schemes for categorical data
Methodology Document
- Research objectives and questions
- Sampling methodology
- Data collection procedures
- Quality control measures
- Known limitations and biases
Technical Documentation
- Processing scripts and code
- Software versions and dependencies
- Hardware specifications
- Computational environment details
- Reproducibility instructions
Usage Guidelines
- Intended use cases
- Appropriate analysis methods
- Citation requirements
- Contact information for questions
- Update and maintenance schedule
Example Data Dictionary Entry
// Variable: household_income
{
"name": "household_income",
"description": "Monthly household income in Kenyan Shillings",
"type": "numeric",
"format": "integer",
"unit": "KES",
"range": {
"min": 0,
"max": 500000
},
"missing_values": "-999",
"notes": "Self-reported income; may include informal sources"
}
Ethics and Privacy Standards
Ethical data sharing is fundamental to AfriData. All datasets must comply with ethical guidelines and privacy regulations.
Privacy Protection
Personal and sensitive data must be properly anonymized or aggregated. Direct identifiers should be removed, and indirect identifiers should be assessed for re-identification risks.
Informed Consent
Data subjects must have provided informed consent for data collection and sharing. If consent was not explicitly obtained for public sharing, data must be sufficiently anonymized.
Fairness and Non-discrimination
Datasets should not perpetuate or amplify existing biases. Consider the potential for discriminatory use and provide appropriate warnings or safeguards.
Cultural Sensitivity
Respect cultural contexts and sensitivities. Engage with local communities and stakeholders when appropriate, especially for data about indigenous or marginalized populations.
Anonymization Requirements
- Remove direct identifiers (names, IDs, addresses)
- Assess quasi-identifiers (age, location, profession)
- Apply k-anonymity (k≥5) for sensitive data
- Use differential privacy for high-risk datasets
- Document anonymization methods
Ethical Review
- Institutional Review Board (IRB) approval
- Ethics committee review documentation
- Data sharing agreements
- Consent forms and protocols
- Risk assessment documentation
Ethics Review Required
All datasets containing human subjects data must undergo ethics review before publication. Contact our ethics committee if you're unsure about requirements.
Submission Process
Follow these steps to ensure your dataset meets our standards and is successfully published on AfriData.
Prepare Your Dataset
Ensure your data is in a supported format, properly structured, and cleaned. Remove any sensitive information and create comprehensive documentation.
Complete Metadata
Fill out all required metadata fields using our online form. Provide detailed descriptions, geographic coverage, and methodology information.
Upload Files
Upload your dataset files, documentation, and any supplementary materials. Ensure file sizes are within limits and formats are supported.
Quality Check
Our automated system will run quality checks on your dataset. Review any flagged issues and make necessary corrections.
Peer Review
Your dataset will undergo peer review by domain experts. This typically takes 2-4 weeks depending on complexity and reviewer availability.
Publication
Once approved, your dataset will be published with a DOI and made available to the research community. You'll receive a notification with the publication details.
Need Help?
Our curation team is available to assist with the submission process. Contact us at info.jhub@jkuat.ac.ke for guidance.
Additional Resources
Explore these resources to learn more about data standards and best practices for African research contexts.
Templates and Tools
Metadata Template
JSON schema for dataset metadata
Data Dictionary Template
Excel template for variable definitions
Data Quality Checker
Automated validation tool
Anonymization Guide
Best practices for privacy protection
Training Materials
Data Management
Comprehensive course
Research Ethics
Ethics workshop materials
Metadata Best Practices
Guidelines for tagging and description
Open Science
Open science principles
External Standards
FAIR Data Principles
Findable, Accessible, Interoperable, Reusable
Dublin Core Metadata
International metadata standard
ISO 19115
Geographic Information Metadata
CESSDA Guide
Data Management Guide
Support
Community Forum
Connect with other researchers
FAQ
Frequently Asked Questions
Technical Support
Get help with technical issues
Monthly Webinars
Live training sessions