December 9, 2019
How to Script the Web For Free
September 10, 2018
How to Sort Java-generated JSON
- Spring has an application.properties/application.yml setting -- http.mappers.json-sort-keys -- which does not work with Jackson.
- Jackson's ObjectMapper may be configured to sort keys using the ORDER_MAP_ENTRIES_BY_KEYS property. This worked, but there has got to be an easier way to dot this, which is...
- Java itself has a TreeMap, whose default key comparator is lexical, meaning that you get this functionality out of the box. This is the approach I ended up using.
November 24, 2015
How to get IMAP email as JSON
I've added imap support (both with and without SSL) to the units endpoint, /imap. If your email login is example, with password foo, and your non-SSL IMAP server is example.com, the URL to retrieve your email as json is http://units.d8u.us/imap/example/example.com?password=foo&ssl;=false. Output is compressed and cannot be any other way, because IMAP inboxes tend to be rather large. It is also for this reason that I haven't added any testing (yet). If you get timeouts, please leave a comment, as I'm trying to determine the optimum timeout for this -- it takes about 45 seconds for my secondary gmail box; I would be interested in more data, though. Output is, in keeping with the other features, pretty-printed. Frequently asked questions, followed by the source code:
- My server only supports POP, what do I do? Well, POP is meant to be stored client-side, so you already have the mail on your system, presumably. This is for those of us who want to backup our server side mail.
November 22, 2015
How to Search Github Sanely
I have a lot of open-source projects checked out on my machine. However, I am soon going to remove all of them that are hosted on Github. No, dear reader, I have not lost my mind. Rather, I've devised a way to search all of github for code and return results in JSON. How? Read on:
And the results look like this:
November 5, 2015
How to Do Unit Conversion
September 12, 2015
How to Log Activity from Your Apps
The code above implements the Logging server. I'm putting it up here such that others may use it and make suggestions on improvements. So, go ahead, rip it apart.
June 29, 2015
March 11, 2015
How to get Post Titles in a Subreddit
I'm fairly up-front about the fact that I use reddit. This afternoon, though, I found a post asking for a way to extract subtitles to Excel. Well, kind user, your solution is below:
Sample run:
% python reddit2csv.py --subreddit openbsd Using OpenBGPD to distribute pf table updates to your servers [xpost from /r/sysadmin] "Make httpd TLSv1.2-only by default. Some older browsers, like IE 10, will be incompatible with this change." OpenBSD newbie // Strange behaviour with irssi 802.11a USB options Keyboard doesn't work after resume on HP Elitebook 2560p with OpenBSD 5.6 LibreSSL 2.1.4 Summer of Code 2015 Project Ideas Announced Errata for X Server Infoleak The Security of OpenBSD: Milk or Wine? (2006) Improving browser security Network Shell? Episode 078: From the Foundation (Part 2) OpenBSD Foundation 2014/2015 News & Fundraising BSDNow Episode 076: Time for a Change s2k15 Hackathon Report: mpi@ on network stack SMP OpenBSD Just Works Jazz concert with OpenBSD synths s2k15: Authenticated TLS 'constraints' in ntpd(8) Build a workstation from parts, minimize dependence on binary blobs? iwm driver, for Intel 7260 wifi cards, now in tree Setting console font on raster display Missing dependencies on -current? Troubleshooting Libvirt/virtualmachinemanager (post install) Is the official OpenBSD website open-source?
January 24, 2015
How to Bulk-Import JSON to Postgres
The script above imports JSON into postgresql using the python library, pg8000. It scales, I just used it to import 589mb of JSON on a 1Gb netbsd virtual machine.
November 4, 2014
How to Keep Track of Tasks #2
I did add multiple user support and updated the application serverside. The next few steps are to add full JSON support and allow for Google and Facebook login. The source code has been updated and checked in. The tl;dr of the changes is that devise was used for multiuser support.
November 2, 2014
How to keep Track of Tasks
I just wrote and open-sourced a task management system. You can download and set it up on your system by the following:
- Install the latest ruby 2.0, sqlite, and git.
- gem install bundler --no-ri --no-rdoc
- git clone http://bitbucket.org/hd1/todos.git
- gem install rails --no-ri --no-rdoc
- in the todos directory, run bundle and it will install all necessary dependencies.
- Now try rails server
- Point your browser here and you'll see it ask you for a username/password (the username is "hd1" and password is "December" by default).
- Oh, and, if you want to process the data programmatically as json, you can grab a read-only view of it here, following authentication. These instructions are tested on Mac, Windows7 and Unix. If you should have any problems, do leave a comment.
September 24, 2014
How to Script the Web
A tall order, but this is the first step... Converting HTML to JSON, using python, naturally:
#!/usr/bin/python
from HTMLParser import HTMLParser
import logging
import sys
import urllib2
class HTMLtoJSONParser(HTMLParser):
def __init__(self, raise_exception=True):
HTMLParser.__init__(self)
self.doc = {}
self.path = []
self.cur = self.doc
self.line = 0
self.raise_exception = raise_exception
@property
def json(self):
return self.doc
@staticmethod
def to_json(content, raise_exception=False):
parser = HTMLtoJSONParser(raise_exception=raise_exception)
parser.feed(content)
return parser.json
def handle_starttag(self, tag, attrs):
self.path.append(tag)
attrs = {k: v for k, v in attrs}
if tag in self.cur:
if isinstance(self.cur[tag], list):
self.cur[tag].append({"__parent__": self.cur})
self.cur = self.cur[tag][-1]
else:
self.cur[tag] = [self.cur[tag]]
self.cur[tag].append({"__parent__": self.cur})
self.cur = self.cur[tag][-1]
else:
self.cur[tag] = {"__parent__": self.cur}
self.cur = self.cur[tag]
for a, v in attrs.items():
self.cur["#" + a] = v
self.cur[""] = ""
def handle_endtag(self, tag):
if tag != self.path[-1]:
print('{} not closed, on line {0}'.format(tag, self.line))
if self.raise_exception:
raise Exception("html is malformed around line: {0} (it might be because of a tag <br>, <hr>, <img .. > not closed)".format(self.line))
del self.path[-1]
memo = self.cur
self.cur = self.cur["__parent__"]
self.clean(memo)
def handle_data(self, data):
self.line += data.count("\n")
if "" in self.cur:
self.cur[""] += data
def clean(self, values):
keys = list(values.keys())
for k in keys:
v = values[k]
if isinstance(v, str):
logging.debug("clean: {}, {}".format(k, [v]))
c = v.strip(" \n\r\t")
if c != v:
if len(c) > 0:
values[k] = c
else:
del values[k]
del values["__parent__"]
if __name__ == '__main__':
logging.basicConfig(level=logging.FATAL)
logging.debug(sys.argv[0])
url = urllib2.urlopen(sys.argv[1])
print(HTMLtoJSONParser.to_json(url.read()))
September 1, 2014
How to Record Data
I know it's Labor Day and what-nots and I'm supposed to be celebrating the end of Burning man, being at a bar-be-queue, but I'm not. Instead, I'm tweaking things, bringing me to what I just accomplished -- a flask-based REST API to data, in python, naturally:
from backports import lzma
import cStringIO as StringIO
import csv
import datetime
from flask import Flask, request, Response
import json
# TODO force SSL for post -- http://flask.pocoo.org/snippets/111/
DATA_FILE = 'sanguine.csv.xz'
app = Flask(__name__)
@app.route('/', methods = ['GET'])
def index():
with lzma.LZMAFile(DATA_FILE, 'r') as data:
output = StringIO.StringIO()
reader = csv.DictReader(data, fieldnames=['Timestamp','User','Latitude','Longitude'], quoting=csv.QUOTE_MINIMAL, lineterminator='\r\n')
reader.next() # skip header line
return(Response(json.dumps(list(reader)), mimetype='application/json'))
@app.route('/', methods=['POST'])
def newdatapiece():
with lzma.LZMAFile(DATA_FILE, 'a') as data:
writer = csv.DictWriter(data, fieldnames=['Timestamp','User', 'Latitude','Longitude'], quoting = csv.QUOTE_MINIMAL, lineterminator='\r\n')
row = {}
row['Timestamp'] = datetime.datetime.now().strftime('%s')
row['User'] = request.form['user_id']
row['Latitude'] = request.form['lat']
row['Longitude'] = request.form['lon']
writer.writerow(row)
return '', 201
@app.route('/analyze', methods=['GET'])
def analysis():
lines = []
with lzma.LZMAFile(DATA_FILE, 'r') as data:
lines = data.readlines()
return(Response(lines, mimetype='application/csv'))
if __name__ == '__main__':
with lzma.LZMAFile(DATA_FILE, 'w') as data:
writer = csv.DictWriter(data, fieldnames=['Timestamp','User', 'Latitude','Longitude'], quoting = csv.QUOTE_MINIMAL, lineterminator='\r\n')
writer.writeheader()
app.run(host='0.0.0.0', port=8080, debug=True)
August 26, 2014
How to Watch webpages for Changes
Today, I encountered a webpage for which there is no rss feed, nor atom feed. I whipped up something to watch the page myself and report on any changes in python. Hey, Guido, if you integrate requests, I won't have any non-stdlib requirements to the script. What do you say? please? Not that the BDFL reads my blog but anyway, here's the code:
#!/Users/hdiwan/.virtualenvs/globetrekker/bin/python
import argparse
import hashlib
import json
import logging
import pprint
import requests
import smtplib
def send_mail(msg, user, password):
server = smtplib.SMTP('smtp.gmail.com', 587)
server.ehlo()
server.starttls()
server.ehlo()
server.login(user, password)
server.sendmail(user, user, msg)
if __name__ == '__main__':
argparser = argparse.ArgumentParser(description='Check a website for changes')
argparser.add_argument('-n', '--url', type=str, default=None, help='Add URL to watcher', action='store')
argparser.add_argument('-l', '--list', action='store_true')
argparser.add_argument('-u', '--user', type=str, default='[email protected]', help='Your username', action='store')
argparser.add_argument('-p', '--password', type=str, help='Your password', action='store')
argparser.add_argument('-v', '--verbose', action='store_false')
parsed = argparser.parse_args()
if not parsed.verbose:
logging.basicConfig(level=logging.DEBUG)
else:
logging.basicConfig(level=logging.FATAL)
if parsed.url:
new_hash = {parsed.url: 0}
output = json.dumps(new_hash)
logging.debug(output)
try:
with open('/var/tmp/.globetrekker.txt', 'a') as fin:
data = json.load(fin)
data[parsed.url] = 0
logging.debug(data)
json.dump(data, fin)
except:
with open('/var/tmp/.globetrekker.txt', 'w') as fout:
json.dump(new_hash, fout)
exit()
with open('/var/tmp/.globetrekker.txt', 'r') as fin:
stored_hash_json = json.load(fin)
logging.debug(stored_hash_json)
if parsed.list:
for k in stored_hash_json:
print(k)
exit()
new_hashes = []
stored_hash = stored_hash_json
logging.debug(stored_hash)
for url in stored_hash:
logging.debug('{} is our URL'.format(url))
browser = requests.get(url)
encoding = 'utf-8'
logging.debug('page retrieved -- {}'.format(url[0]))
text = browser.content
encoded = text.encode(encoding, errors='xmlcharrefreplace')
logging.debug(encoded)
decoded = encoded.decode(encoding, errors='xmlcharrefreplace')
logging.debug(decoded)
new_hash = hashlib.sha1(decoded).hexdigest()
logging.debug('Calculated hash code: {}'.format(new_hash))
logging.debug('Stored hash: {}'.format(stored_hash[url]))
if new_hash != stored_hash[url]:
logging.debug('{} changed'.format(url))
if stored_hash[url] != 0:
send_mail(u'Subject: {} Change detected\r\n\r\n--H'.format(url), parsed.user, parsed.password)
stored_hash[url] = new_hash
with open('/var/tmp/.globetrekker.txt', 'w') as fout:
json.dump(stored_hash, fout)
August 17, 2014
How to Produce JSON Properly from Spring
Not that the default spring JSON is that bad. It looks like this:
[{ 2.62779739789553,1556.68506945,'El Pollo Loco'},{4.087178144481979,1632.109670148,'Paper or Plastik Cafe'}
I don't like this and want something more like:
[{'azimuth': 1.3775424158235956, 'distance': 625.924396521, 'name': 'Starbucks'}, {'azimuth': 1.628478725514169, 'distance': 646.038250929, 'name': 'Asian Cuisine'}]
And I figured it out:
for (results.next(); results.isAfterLast() == false; results.next()) {
Spot spot = new Spot();
spot.setAzimuth(results.getDouble("bearing"));
spot.setDistance(results.getDouble("distance"));
spot.setName(results.getString("name"));
LOGGER.debug(spot.toString());
spots.add(spot);
}
Yes, by making it at list of a bean I wrote, instead of retrieving it directly into a collection of them, it seems I can force a hash as output from spring.
July 13, 2014
How to Reformat Logback Output
Spring defaults to using logback for logging. It spits the logs out on standard output, which cannot be persisted. So, we must first send the log output to a file. This is done by leveraging the FileAppender class, as follows:
<appender name="FILE" class="ch.qos.logback.core.FileAppender">
<file>/home/hdiwan/around.log</file>
<encoder>
<pattern>"%date" "%level" "[%thread]" "%logger" "%file : %line" "%msg"%n</pattern>
</encoder>
</appender>
Now, you'll be getting logs to the file indicated, make sure the LOG_FILE at the top of the script matches the configuration:
import argparse
import cgi
import csv
import cStringIO as StringIO
import json
import logging
from lxml import etree
if __name__ == '__main__':
LOGFILE_PATH = '/home/hdiwan/around.log'
logger = logging.basicConfig(level=logging.FATAL)
web = cgi.FieldStorage()
format_ = web.getfirst('format', default='csv')
csv.register_dialect('arounddialect')
logging.debug(csv.list_dialects())
if format_ == 'csv':
print('Content-Type: application/csv\n')
elif format_ == 'xml':
print('Content-Type: text/xml\n')
elif format_ == 'json':
print('Content-Type: application/json\n')
with open(LOGFILE_PATH,'rb') as fin:
reader = csv.reader(fin, dialect='arounddialect')
out = StringIO.StringIO()
if format_ == 'csv':
writer = csv.writer(out)
writer.writerows(list(reader))
elif format_ == 'xml':
document = etree.Element('log')
for r in list(reader):
logging.debug(len(r))
node = etree.SubElement(document, 'entry')
timestamp = etree.SubElement(node, 'timestamp')
timestamp.text = etree.CDATA(r[0])
level = etree.SubElement(node, 'level')
level.text = etree.CDATA(r[1])
thread = etree.SubElement(node, 'thread')
try:
thread.text = etree.CDATA(r[2])
except IndexError,e:
thread.text = etree.CDATA('')
class_ = etree.SubElement(node, 'class')
try:
class_.text = etree.CDATA(r[3])
except IndexError, e:
class_.text = etree.CDATA('')
msg = etree.SubElement(node, 'message')
try:
msg.text = etree.CDATA(r[4])
except IndexError, e:
msg.text = etree.CDATA('')
out.write(etree.tostring(document, encoding='utf-8', xml_declaration=True, pretty_print=True))
elif format_ == 'json':
out.write(json.dumps(list(reader)))
print out.getvalue()
The other novel part here is the use of lxml to generate the XML, which alleviates the need to use cgi.escape and friends to get the xml properly formatted and pretty prints it automatically.
June 6, 2014
How to Persist your Location
I had 3 different tries at doing this. Earlier this week, I got very big on spring-boot -- I still am, incidentally. Then I tried it, briefly, in PHP, before returning home, to Python, which is resembling Java more and more with every release.
Without further ado:
#!~/.virtualenvs/around-web/bin/python
import binascii
import cStringIO as StringIO
import csv
import json
import logging
import time
from flask import Flask, request, session, redirect, url_for, abort, render_template, flash
from sqlalchemy import create_engine, Column, Integer, Sequence, String, DateTime, Float, BIGINT, Table, MetaData, func
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql import select, expression, text
from sqlalchemy.types import UserDefinedType, _Binary, TypeDecorator
Base = declarative_base()
# Python datatypes
class GisElement(object):
"""Represents a geometry value."""
def __str__(self):
return self.desc
class BinaryGisElement(GisElement, expression.FunctionElement):
"""Represents a Geometry value expressed as binary."""
def __init__(self, data):
self.data = data
self.name = 'ST_GeomFromEWKB'
self.is_literal = True
expression.FunctionElement.__init__(self, "ST_GeomFromEWKB", data,
type_=Geometry(coerce_="binary"))
@property
def desc(self):
return self.as_hex
@property
def as_hex(self):
return binascii.hexlify(self.data)
class TextualGisElement(GisElement, expression.FunctionElement):
"""Represents a Geometry value expressed as text."""
name = 'ST_GeomFromText'
table = None
def __init__(self, desc, srid=-1):
self.is_literal = True
logging.debug('TextualGISElement constructor')
desc = desc
expression.FunctionElement.__init__(self, func.ST_GeomFromText, desc, srid,
type_=Geometry)
# SQL datatypes.
class Geometry(UserDefinedType):
"""Base PostGIS Geometry column type."""
name = "GEOMETRY"
def __init__(self, dimension=None, srid=-1,
coerce_="text"):
self.dimension = dimension
self.srid = srid
self.coerce = coerce_
class comparator_factory(UserDefinedType.Comparator):
"""Define custom operations for geometry types."""
# override the __eq__() operator
def __eq__(self, other):
return self.op('~=')(other)
# add a custom operator
def intersects(self, other):
return self.op('&&')(other)
# any number of GIS operators can be overridden/added here
# using the techniques above.
def _coerce_compared_value(self, op, value):
return self
def get_col_spec(self):
return 'Geometry'
def bind_expression(self, bindvalue):
if self.coerce == "text":
return TextualGisElement(bindvalue)
elif self.coerce == "binary":
return BinaryGisElement(bindvalue)
else:
assert False
def column_expression(self, col):
if self.coerce == "text":
return func.ST_AsText(col, type_=self)
elif self.coerce == "binary":
return func.ST_AsBinary(col, type_=self)
else:
assert False
def bind_processor(self, dialect):
def process(value):
if isinstance(value, GisElement):
return value.desc
else:
return value
return process
def result_processor(self, dialect, coltype):
if self.coerce == "text":
fac = TextualGisElement
elif self.coerce == "binary":
fac = BinaryGisElement
else:
assert False
def process(value):
if value is not None:
return fac(value)
else:
return value
return process
def adapt(self, impltype):
return impltype(dimension=self.dimension,
srid=self.srid, coerce_=self.coerce)
# SQL datatypes.
class Geometry(UserDefinedType):
"""Base PostGIS Geometry column type."""
def __init__(self, dimension=None, srid=-1,
coerce_="text"):
self.dimension = dimension
self.srid = srid
self.name = 'GEOMETRY'
self.coerce = coerce_
class comparator_factory(UserDefinedType.Comparator):
"""Define custom operations for geometry types."""
# override the __eq__() operator
def __eq__(self, other):
return self.op('~=')(other)
# add a custom operator
def intersects(self, other):
return self.op('&&')(other)
# any number of GIS operators can be overridden/added here
# using the techniques above.
def _coerce_compared_value(self, op, value):
return self
def get_col_spec(self):
return self.coerce
def bind_expression(self, bindvalue):
if self.coerce == "text":
return TextualGisElement(bindvalue)
elif self.coerce == "binary":
return BinaryGisElement(bindvalue)
else:
assert False
def bind_processor(self, dialect):
def process(value):
if isinstance(value, GisElement):
return value.desc
else:
return value
return process
def result_processor(self, dialect, coltype):
if self.coerce == "text":
fac = TextualGisElement
elif self.coerce == "binary":
fac = BinaryGisElement
else:
assert False
def process(value):
if value is not None:
return fac(value)
else:
return value
return process
def adapt(self, impltype):
return impltype(dimension=self.dimension,
srid=self.srid, coerce_=self.coerce)
# other datatypes can be added as needed.
class Point(Geometry):
name = 'POINT'
# DDL integration
# Postgis historically has required AddGeometryColumn/DropGeometryColumn
# and other management methods in order to create Postgis columns. Newer
# versions don't appear to require these special steps anymore. However,
# here we illustrate how to set up these features in any case.
def setup_ddl_events():
@event.listens_for(Table, "before_create")
def before_create(target, connection, **kw):
dispatch("before-create", target, connection)
@event.listens_for(Table, "after_create")
def after_create(target, connection, **kw):
dispatch("after-create", target, connection)
@event.listens_for(Table, "before_drop")
def before_drop(target, connection, **kw):
dispatch("before-drop", target, connection)
@event.listens_for(Table, "after_drop")
def after_drop(target, connection, **kw):
dispatch("after-drop", target, connection)
def dispatch(event, table, bind):
if event in ('before-create', 'before-drop'):
regular_cols = [c for c in table.c if not
isinstance(c.type, Geometry)]
gis_cols = set(table.c).difference(regular_cols)
table.info["_saved_columns"] = table.c
# temporarily patch a set of columns not including the
# Geometry columns
table.columns = expression.ColumnCollection(*regular_cols)
if event == 'before-drop':
for c in gis_cols:
bind.execute(
select([
func.DropGeometryColumn(
'public', table.name, c.name)],
autocommit=True)
)
elif event == 'after-create':
table.columns = table.info.pop('_saved_columns')
for c in table.c:
if isinstance(c.type, Geometry):
bind.execute(
select([
func.AddGeometryColumn(
table.name, c.name,
c.type.srid,
c.type.name,
c.type.dimension)],
autocommit=True)
)
elif event == 'after-drop':
table.columns = table.info.pop('_saved_columns')
metadata = MetaData()
class Location_history(Base):# Table('history', metadata, Column('history_id', Integer, primary_key = True), Column('device_id', String), Column('device_timestamp', DateTime), Column('location', Geometry))
__tablename__ = 'history'
history_id = Column(Integer, primary_key = True)
device_id = Column(String)
device_timestamp = Column(DateTime)
latitude = Column(Float)
longitude = Column(Float)
def location(self):
return ' ({}, {})'.format(self.latitude, self.longitude)
def __str__(self):
return '%s @ %s at timestamp %s'.format(self.device_id, self.location(), self.device_timestamp)
def keys(self):
return ['device_timestamp','device_id','location']
app = Flask(__name__)
def results():
Session = sessionmaker()
engine = create_engine(u'postgres://pgsql@localhost/Around')
Session.configure(bind = engine)
session = Session()
rows = session.query(Location_history).all()
return rows
@app.route('/json', methods=['GET'])
def json_out():
directory = results()
results_ = []
for d in directory:
result = {}
result['Location'] = '({},{})'.format(d.latitude, d.longitude)
result['Time'] = d.device_timestamp.strftime('%c')
result['Device'] = d.device_id
results_.append(result)
logging.debug(results_)
return json.dumps(results_)
@app.route('/xml', methods=['GET'])
def xml_out():
directory = results()
out = StringIO.StringIO()
out.write('''''')
out.write('')
for entry in directory:
out.write('\t\n')
out.write('\t\t{} \n'.format(entry.device_timestamp))
out.write('\t\t{} \n'.format(entry.location()))
out.write('\t\t{} \n'.format(entry.device_id))
out.write('\t \n')
out.write(' \n')
return out.getvalue()
@app.route('/csv', methods=['GET'])
def csv_out():
directory = results()
out = StringIO.StringIO()
try:
writer = csv.DictWriter(out, fieldnames = directory[0].keys())
for entry in directory:
# ['device_timestamp',' device_id ','location']
writer.writerow({'device_timestamp' : entry.device_timestamp, 'device_id' : entry.device_id, 'location': entry.location()})
return out.getvalue()
except IndexError, e:
return "No values found"
@app.route('/new/', methods=['POST'])
def new():
device_id = request.args.get('device')
if device_id is None:
device_id = 'test'
latitude = request.args.get('latitude')
longitude = request.args.get('longitude')
timestamp = request.args.get('time')
if timestamp is None:
timestamp = time.time()
timestamp = long(timestamp)
latitude = float(latitude)
longitude = float(longitude)
location = Location_history()
engine = create_engine(u'postgres://pgsql@localhost/Around')
connection = engine.connect()
cmd = 'INSERT INTO history (device_timestamp, device_id, latitude, longitude) VALUES (to_timestamp(:timestamp), :device_id, :latitude, :longitude)'
connection.execute(text(cmd), device_id = device_id, timestamp = timestamp, latitude = latitude, longitude = longitude)
return 'Success!'
if __name__ == '__main__':
logging.basicConfig( level = logging.DEBUG )
app.run()
Ideally, I'd like to make the device_id part of the url and remove the unnecessary code. Also would like to use postgis instead of using a view, but this was a quick and dirty implementation and the functionality will probably be heavier on the client side.
May 7, 2014
How to Convert CSV to JSON
Following from my post on format conversion from python dictionary to XML, today brings a one-liner to convert comma-delimited values to JSON. Getting data into an appropriate format is the first step to making analysis happen. Without further ado:
#!/usr/bin/env python2.6
import json
import csv
if __name__ == '__main__':
print( json.dumps( [ row for row in csv.DictReader(open(sys.argv[1])) ]))
The python json module has 4 functions, namely:
- dump, dumps
- Exports the data as JSON to a buffer or a string
- load, loads
- Imports the data from json to a list of dictionaries.