Merging csv files with Node.js and D3.js

Node.js is awesome because it’s an ecosystem. It’s even more awesome when used together with some popular libraries such as D3.js and Lodash.

The goal

I’ve had to pre-process a bunch of csv files in order to work with a single dataset file for convenience. Node.js is the platform of choice for this type of tasks now and I couldn’t be more satisfied of it.

The process

Here a walk through of this little script that saves to me a lot of time.

Import our weapons:

const d3 = require('d3')
const fs = require('fs')
const _ = require('lodash')

Reading a folder to get the file list calling a function for each file:

var files = fs.readdirSync(`${__dirname}/data`)
_.each(files, filename => process(filename))

Read the csv content and parse it with D3.js:

var process = name => {
  var raw = fs.readFileSync(`data/${name}`, 'utf8')
  var csv = d3.csvParse(raw)
}

Wrangling some values before commit to the final array:

var process = name => {
  ...
  var parse = d3.timeParse('%m/%d/%y')
  csv.forEach(d => {
    d.timestamp = parse(d.Dates)
  })
}

Create an unique array with all the csv files merged together (thanks Lodash):

var db = []
var process = name => {
  ...
  var newdb = _.unionBy(db, csv, 'Dates')
  db = newdb
}

Save the final dataset as JSON file:

var process = name => {
  ...
  fs.writeFileSync('db.json', JSON.stringify(db))
}

The whole script generates a json file with all the entries. A perfect starting point for an explorative session with D3.js.

Feel good.

Spotted a typo or (likely) a grammar error? Send a pull request.

The goal

The process

You May Also Enjoy: