The aftermath of storm Callum was insufficient to deter attendees from our London Hub on “Careers and Training in Health Data Science” event, which was hosted in October at the somewhat luxurious KPMG offices in Mayfair (ooooooh!).
So what was this all about? Well, with AI in healthcare investments growing daily, combined with an unlimited pool of opportunities to radically improve healthcare using Big Data, there is no better time to be a data scientist in this field. We have almost all heard the “fact” that it’s the sexiest job of the year. However, whilst the demand for world-class data scientists continues to grow, the speed of those entering and training in data science is not keeping pace; coupled with a widening gender gap. Indeed, roughly 50% of data science jobs will be unfilled by 2020.
But where do you even start if you’re trying to carve out a career in data science? What resources are available? And come to think of it, what even is “data science”? These are all questions we planned to address in this month’s London Hub event.
We started with a welcome from KPMG’s Head of Disruption, Shamus Rae, who unfortunately couldn’t stay for long as he had to “be disruptive somewhere else”, but gave a good lay of the land and background to KPMG’s latest report on winning the AI race. Dr. Rebecca Pope, Lead Data Scientist for Health & Life Sciences at KPMG UK kicked us off, adorned with her academic gown from yesteryear, to remind us about the trials and tribulations of moving from PhD to industry, via some post-docs, stint in the public sector, dabble in IBM, and now working on meaty problems at KPMG. Given the many PhD students in the room, this really struck a chord!
Next up we had Dr Amy Nelson (@amypknelson), both a junior doctor and researcher at UCL, working on everything from predicting “Did Not Attend” rates for MRI scans, to trying to find ways to computational decipher doctors’ handwriting (which we all agreed was probably going to be impossible). Amy told us both valuable and… well, practical stories -- from learning skills in data science by investing time and money into a General Assembly course, to finding technical partners who doubled up as dates on OK Cupid!
We stayed in UCL but shifted to a completely different approach as described by medical student Ivan Beckley (@ivanbeckley). Ivan’s resourcefulness was quite staggering, through his success in obtaining internships at both Outcomes Based Healthcare, and Ada Health, to raising the funds for his MSc in Health Data Science from DeepMind Health, culminating in a summer internship at DeepMind, including a trip to the Silicon Valley.
Last but not least, Dr Tempest van Schaik (@Dr_Tempest), who is a software engineer at Microsoft, shared her journey and some b-e-a-utiful nuggets, including a pie-chart that made up who she was as a data scientist. It’s not all PhDs folks!
Tempest also gave us great words of advice if we were feeling a bit overwhelmed on starting our data science journeys. Her tips?
- Break it down and find your niche! Start with types of data science (software engineering? Data viz? business intelligence?) and types of data (text, images, audio, time series etc)
- Recognise re-branding of “data science” and the transferable scientific skills most of us have
- Fill in the knowledge gaps
- Lifelong learning (Tempest has vowed to have a lifetime of MOOCs)
With a whistle-stop tour of our speakers’ backgrounds over, our ebullient chair Dr. Kirstie Whitaker (@kirstie_j), who is a Research Fellow at the Alan Turing Institute, broke us up into small discussion groups where we all really got under the bonnet of our challenges in navigating these data-y inflexion points, before bringing us together for a final wrap up. We compared being a data scientist in a big versus small company, working with restrictive health or non-health datasets, and what signals to send on our GitHub (hint: show you collaborate, contribute and take reproducibility seriously. But also, be aware of the privilege of those who have the time to contribute to Open Source, Kirstie reminded us. We wrapped up with the usual drinks and chatter, and by the end of it, hugs were being flung around like there’s no tomorrow. It was a great, intimate, honest and inquisitive community who came, and we can’t wait to see everyone soon.
Our speakers were kind enough to lend us their “cheat sheets” of tip-top resources. Check it out here à:
- https://cognitiveclass.ai/ FREE, no installation needed, videos, learning paths, and certificates from IBM on completion
- https://www.datacamp.com/ online UI, audit the course for FREE
- https://eu.udacity.com/ expensive, but very good learning materials and course options
- https://www.udemy.com/python-for-data-science-and-machine-learning-bootcamp/ low cost, great notebooks and videos
- https://www.kaggle.com/learn/overview great courses and online community and FREE
- One HealthTech (of course!)
- PyData London Meetup (hard to get a place)
- Royal Institution talks on AI
- Talking Machines podcast
- General Assembly Data Science
- Andrew Ng’s Coursera series
- Codecademy (Python)
- Pandas Tutorials
- Khan Academy – linear algebra and multivariate calculus
- 3Blue1Brown youtube channel
- Sentdex youtube channel
- Learn Python the Hard Way (free pdf)
- Introduction to Statistical Learning (free pdf)
- Pattern Recognition and Machine Learning
Online courses and resources
** = recommends
Getting to grips with some comp sci
- **Havard CS50- https://www.edx.org/course/introduction-computer-science-harvardx-cs50x
- **Standford CS101- http://online.stanford.edu/course/computer-science-101-self-paced
- Principles of computing - http://online.stanford.edu/course/principles-computing
- MIT Computer Science w/ Phyton - https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-0001-introduction-to-computer-science-and-programming-in-python-fall-2016/
Intro to R:
- Resources: https://swirlstats.com/faq.html
- Courses: Exploratory Data Analysis Using R& ** Free Introduction to R Programming Online Course
- **DataCamp- https://www.datacamp.com/tracks/data-scientist-with-python (the number one resource this alone is enough to get anyone started in DS)
- Coursera Data Science (phyton focus) -https://www.coursera.org/specializations/data-science-python
- Another Coursera Data Science -https://www.coursera.org/specializations/data-science - PAY
- Udacity Data Analyst -https://www.udacity.com/course/data-analyst-nanodegree--nd002
Statistics and probability
- Potential book= Introduction to Probability (Chapman & Hall/CRC Texts in Statistical Science) - Joe Blitzstein
- Khan Academy - https://www.khanacademy.org/math/statistics-probability
- **Princeton - Patrick Conway's Statisics One (great intro) - https://www.youtube.com/watch?v=VJlpQs4a5LI&list=PLgIPpm6tJZoTlY4A-xikgjXmlscqduP5k
- Havard - Statistics 101- http://projects.iq.harvard.edu/stat110/youtube - Joe Blitzstein
- Harvard - Stat 111-http://isites.harvard.edu/icb/icb.do?keyword=k101665&pageid=icb.page651024 - Joe Blitzstein
- CS109 Data Science -http://cs109.github.io/2015/pages/videos.html - Joe Blitzstein
- CS194-16 Introduction to Data Science Fall (Berkeley) https://bcourses.berkeley.edu/courses/1377158/pages/cs-194-16-introduction-to-data-science-fall-2015
- Mining datasets - http://online.stanford.edu/course/mining-massive-datasets-self-paced
- Stanford - No statement -http://online.stanford.edu/course/probability-and-statistics-self-paced
- Statistics in Medicine -http://online.stanford.edu/course/clone-statistics-medicine-self-paced-16
- MIT - Statistics for Applications -https://ocw.mit.edu/courses/mathematics/18-443-statistics-for-applications-spring-2015/index.htm
- Coursera Statistics - https://www.coursera.org/specializations/statistics
- Statistical reasoning
- **Coursera ML - https://www.coursera.org/learn/machine-learning
- Coursera Neural Networks -https://www.coursera.org/learn/neural-networks
- Machine learning engineer -https://www.udacity.com/course/machine-learning-engineer-nanodegree--nd009
- Statistic Learning Standard -https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/about
Databases & SQL
Resources on other Data Science courses –
- Writing in the sciences -http://online.stanford.edu/course/writing-sciences-self-paced-spring-2016
Tempest van Schaik’s:
- Andrew Ng’s Coursera courses on Machine Learning and Deep Learning: https://www.coursera.org/courses?query=andrew%20ng
- Datacamp’s cheat sheets for R (e.g. dplyr, ggplot) and Python (e.g. pandas, Keras): https://www.datacamp.com/community/data-science-cheatsheets
- Deep Learning with Python book, by Francois Chollet
- Stanford NLP lectures (Chris Manning/Richard Socher): https://www.youtube.com/watch?v=OQQ-W_63UgQand CNN lectures: https://www.youtube.com/watch?v=vT1JzLTH4G4&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk
- London Meetups: AIClubForGenderMinorities, RLadies London, PyLadies London, London Data Science