Node.js is awesome because it’s an ecosystem. It’s even more awesome when used together with some popular libraries such as D3.js and Lodash.
The goal
I’ve had to pre-process a bunch of csv files in order to work with a single dataset file for convenience. Node.js is the platform of choice for this type of tasks now and I couldn’t be more satisfied of it.
The process
Here a walk through of this little script that saves to me a lot of time.
Import our weapons:
const d3 = require('d3')
const fs = require('fs')
const _ = require('lodash')
Reading a folder to get the file list calling a function for each file:
var files = fs.readdirSync(`${__dirname}/data`)
_.each(files, filename => process(filename))
Read the csv content and parse it with D3.js:
var process = name => {
var raw = fs.readFileSync(`data/${name}`, 'utf8')
var csv = d3.csvParse(raw)
}
Wrangling some values before commit to the final array:
var process = name => {
...
var parse = d3.timeParse('%m/%d/%y')
csv.forEach(d => {
d.timestamp = parse(d.Dates)
})
}
Create an unique array with all the csv files merged together (thanks Lodash):
var db = []
var process = name => {
...
var newdb = _.unionBy(db, csv, 'Dates')
db = newdb
}
Save the final dataset as JSON file:
var process = name => {
...
fs.writeFileSync('db.json', JSON.stringify(db))
}
The whole script generates a json file with all the entries. A perfect starting point for an explorative session with D3.js.
Feel good.
Spotted a typo or (likely) a grammar error? Send a pull request.