In addition to events, we aim to provide resources for those interested in self-learning. We’ve identified several key tools you’ll likely encounter during an analytics career. For each one, we’ve provided what it is, when to use it, how to access it, and–most importantly–a starter guide outlining resources you should leverage on your learning journey. These are broken into sections for beginners, intermediates, and advanced users.
Beyond using the guides on this site, one of the best ways to get started is exploring with ChatGPT or other generative AI. These models are fairly good at doing basic analytics and writing in common languages like Python, Java, HTML/CSS, R, and SQL. Work on finding what prompts get you the best results from the AI and remember it’s best to be as specific as possible. For example, you could ask ChatGPT:
“I have a dataset in a pandas dataframe, df, with columns “Entity,” “Code,” “Year,” and “Annual plastic production between 1950 and 2019.” I want you to give me Python code that creates several different graphs based on this data showing the change in plastic production over time. Break it down by the explanatory variables.”
With the right prompt, it’ll give you good code!
# Assuming df is your pandas DataFrame with columns: Entity, Code, Year,
# and Annual plastic production between 1950 and 2019
import pandas as pd
import matplotlib.pyplot as plt
# Convert 'Year' column to datetime type if it's not already
df['Year'] = pd.to_datetime(df['Year'], '%Y')
# Base Python colors
base_colors = ['b', 'g', 'r', 'c', 'm', 'y', 'k']
# Create multiple plots based on different explanatory variables
# Plot 1: Plastic production over time for each Entity
plt.figure(figsize=(12, 6))
for i, (entity, data) in enumerate(df.groupby('Entity')):
plt.plot(data['Year'], data['Annual plastic production between 1950 and 2019'],
'-', label=entity, color=base_colors[i % len(base_colors)]))
plt.title('Plastic Production Over Time for Each Entity')
plt.xlabel('Year')
plt.ylabel('Annual Plastic Production')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
# Plot 2: Plastic production over time for each Code
plt.figure(figsize=(12, 6))
for i, (code, data) in enumerate(df.groupby('Code')):
plt.plot(data['Year'], data['Annual plastic production between 1950 and 2019'],
'-', label=code, color=base_colors[i % len(base_colors)]))
plt.title('Plastic Production Over Time for Each Code')
plt.xlabel('Year')
plt.ylabel('Annual Plastic Production')
plt.legend()
plt.grid(True
plt.tight_layout()
plt.show()