A data scientist’s toolkit

I’ve had my current laptop for a while, so I sometimes take for granted (or forget) all the tools I have available locally. I recently went through the drill of cataloging what I have installed and will need to reinstall when I upgrade. (I also have the current setup imaged for use in a virtual environment or complete hard drive failure).

I’ll use this post as a running inventory and updated as needed

Databases and associated tools

Oracle 11 – Available for free with some restrictions

SQL Developer – Open source Oracle client tool. This one is free !

TOAD – Great client for Oracle.

MySQL – Open source database.

HeidiSQL – MySQL client. Great for the basics, and it’s open source

Postgres – Open source database

PostGIS – Add on to facilitate data based GIS analysis

Statistical

R – Open source statistical programming language

RStudio – Open source IDE for R

SAS – SAS Student Edition is now freely available. It is run under VMWare. It’s great to be able to have this locally to do small development and training/refreshing when a full fledged version is not available. It’s not full featured, but the basics are here LINK

Splus – I owned a copy of this from way back when in graduate school. S was R before R was cool (though I just use R now)

RatStats – I used this free tool when doing some claim analysis projects. It’s fairly limited in functionality, but what it does, it does well. It was created by the Office of the Inspector General, and is used a lot for creating random sampling. Nothing here you can’t get in R or SAS, but is generally accepted for claims review compliance audits http://oig.hhs.gov/compliance/rat-stats/

 

Scripting and general programming
Python – I do most of my general programming and scripting in Python

Ipython – For some reason, I used to work just in Notepad+ and IDLE as my development environment Never again ! I’d highly recommend anyone learning Python to go with IPython and Ipython notebooks from the beginning.

BeautifulSoup – Python package for web scraping. Very powerful scraper that has yet to fail me.

Perl – Powerful scripting, though I tend to use Python more often now and picking it back up can be bumpy

 

Web Development
Django – Python based web framework. Ive used this to develop nontrivial websites LINK

WordPress – Mostly have used as a pure CMS (this site is developed in it). Quick and easy

Javascript

JQuery

 

Other Tools

Git – Distributed open source version control
Cygwin – For when you need UNIX tools under Windows. I have a separate Linux box, but love this for being able to do grep, awk, sed and other utilities

VMWare – Virtualization software

Putty – Open source equivalent of telnet/SSH

FileZilla – Open source FTP tool

XAMPP stack – Tools are available separately, but this is a great packaged implementation of an Apache/MySQL/PHP and Perl stack