Reasoruces for Data People
Data & Software Engineer
Has your boss shared a tweet with you saying “Hey, look! The problem we’ve been trying to ‘manually’ solve can be easily tackled with this prompt from ChatGPT!, stop wasting your time, use ChatGPT!”?.
Well, this is a guide for when it’s time to have that conversation. Considering you’re a data person in 2023, it likely won’t be long till that happens.
LLMs are wonderful creations, capable of so much more than we currently understand, especially in the conversational AI space. They perform well on common NLP problems such as information extraction, classification, question answering, and more. However, their best NLP use is possibly additive to the current state of data science & engineering, rather than completely replacing it.
Why might in-house solutions of data transformation, cleansing, multiple models, be better than to just use LLM APIs?
A man is only as good as his tools. Emmert Wolf
I don’t think anyone adopts tools and shortcuts like programmers do. For some reason we’re obsessed with optimizing our work every step of the way. It’s probably the analytical mindset.
In this post I’ll be recommending some of my favorite IDE plugins that skyrocketed my productivity. As a bonus, I added a tool at the end that lets you share this setup around as part of a software team.
These recommendations are written around my IDE of choice: VS Code, but I believe that these plugins are either available for other IDEs or have alternatives.
Here are some tools that will help you write better code, faster 🚀.
I believe the first plugin anyone downloads in a new IDE is the support for their language of choice. For example for Python, VS Code offers a powerful language server called Pylance. Pylance in essence offers rich type information. This information is used for a multitude of features, like generating docstrings, suggesting parameters, auto-imports, error reporting and so many powerful features that enhance the developer’s experience. Writing with Pylance often feels like writing with a pair programmer (that points out errors, suggests params and types, etc). Another powerful feature is semantic highlighting. That’s different from syntax highlighting, the former highlights keywords, the latter highlights based on types. for example a class and a function imported will have different highlight colors. Details like these offer rich information about a word at a glance which saves time and effort to understand the context of variables, libraries, and frameworks.
A company’s documents, also referred to as a company’s knowledge base, or a Wiki, and for short: “docs”. Are the central, accessible, written information about the software this company builds, its business use-cases, future plans, historical or deprecated plans, post-mortems, environment setup… etc. Sometimes docs extend to a company’s business values, goals & mission and other fluffy stuff. You can think of docs as the internal internet for a company, or internal Wikipedia. In short, docs for a company are the central source of information on what they do, how it’s done, and -maybe- why. No, Slack threads aren’t docs.
A type of docs is the publicly accessible information about a service, or: Software Usage Docs. This is what us engineers usually call docs. Such as Streamlit.io’s API docs, or Notion’s guides. A common misconception is that such documents are all what “docs” are. However, they are only a subset of the general knowledge of building and maintaining software. What we commonly refer to as docs are either user-docs (how a user can use your service) or developer docs (more technical, but still cover how a type of user - developer - can use a service).
The other types of docs can be understood in comparison to Usage Docs. In gist, just as users of your service need docs to understand it and use it, your engineers need docs to understand how it’s built and contribute to it.