A data scientist’s toolkit
I’ve had my current laptop for a while, so I sometimes take for granted (or forget) all the tools I have available locally. I recently went through the drill of cataloging what I have installed and will need to reinstall when I upgrade. (I also have the current setup imaged for use in a virtual environment or complete hard drive failure).
I’ll use this post as a running inventory and updated as needed
Databases and associated tools
Oracle 11 – Available for free with some restrictions
SQL Developer – Open source Oracle client tool. This one is free !
TOAD – Great client for Oracle.
MySQL – Open source database.
HeidiSQL – MySQL client. Great for the basics, and it’s open source
Postgres – Open source database
PostGIS – Add on to facilitate data based GIS analysis
Statistical
R – Open source statistical programming language
RStudio – Open source IDE for R
SAS – SAS Student Edition is now freely available. It is run under VMWare. It’s great to be able to have this locally to do small development and training/refreshing when a full fledged version is not available. It’s not full featured, but the basics are here LINK
Splus – I owned a copy of this from way back when in graduate school. S was R before R was cool (though I just use R now)
RatStats – I used this free tool when doing some claim analysis projects. It’s fairly limited in functionality, but what it does, it does well. It was created by the Office of the Inspector General, and is used a lot for creating random sampling. Nothing here you can’t get in R or SAS, but is generally accepted for claims review compliance audits http://oig.hhs.gov/compliance/rat-stats/
Scripting and general programming
Python – I do most of my general programming and scripting in Python
Ipython – For some reason, I used to work just in Notepad+ and IDLE as my development environment Never again ! I’d highly recommend anyone learning Python to go with IPython and Ipython notebooks from the beginning.
BeautifulSoup – Python package for web scraping. Very powerful scraper that has yet to fail me.
Perl – Powerful scripting, though I tend to use Python more often now and picking it back up can be bumpy
Web Development
Django – Python based web framework. Ive used this to develop nontrivial websites LINK
WordPress – Mostly have used as a pure CMS (this site is developed in it). Quick and easy
Javascript
JQuery
Other Tools
Git – Distributed open source version control
Cygwin – For when you need UNIX tools under Windows. I have a separate Linux box, but love this for being able to do grep, awk, sed and other utilities
VMWare – Virtualization software
Putty – Open source equivalent of telnet/SSH
FileZilla – Open source FTP tool
XAMPP stack – Tools are available separately, but this is a great packaged implementation of an Apache/MySQL/PHP and Perl stack