Kahan Data Solutions

RSS

Helping Data & Analytics Engineers build reliable modern architectures.

www.ModernDataCommunity.com

View channel on YouTube

Switch Invidious Instance

Videos

Playlists

Community

Kahan Data Solutions

There are a lot of things to consider as you're building a data model.

Performance, naming, scheduling, etc.

But one thing you cannot overlook is establishing relationships between your tables.

Otherwise, you don't really have a model - you just have a bunch of disconnected data points.

And a major part in this process is determining the unique identifier in each of your warehouse tables.

It can determine how useful your model becomes and how well it can scale.

So in this video, we're going to talk about a "key" topic known as the Surrogate Key.

We'll talk about:
- What they are
- Why they're important
- And why you might want to consider using them for your own modeling, if you haven't already

Enjoy.

-----

Get Modern Data Essentials training (for free) & start building more reliable data architectures
www.ModernDataCommunity.com/

6 days ago | [YT] | 11

View 0 replies

Kahan Data Solutions

In data modeling, there's no shortage of opinions.

In fact, you'll likely even have different opinions within the same team.

So I recently created a post in The Modern Data Community with a sample scenario and asked others how they would reply.

I figured the results would vary and it would make for great discussion.

And in this video, I want to share that same scenario with you along with some potential solutions.

Plus, share why databases for Applications should be viewed differently than databases for Analytics.

Enjoy.

-----

Get Modern Data Essentials training (for free) & start building more reliable data architectures
www.ModernDataCommunity.com/

1 week ago (edited) | [YT] | 12

View 0 replies

Kahan Data Solutions

Most teams have good intentions when it comes to modernizing their architectures.

They spend months (or more) analyzing options.

And tons of money trying to get things just right.

Until it's finally off to the races to implement the new shiny architecture that'll last forever!

We all know this rarely goes perfectly - which is normal.

But at the same time, over the years I've noticed 3 common "Strategy Killers" teams wind up doing to themselves.

These can significantly impact the success of a data project long term despite all of the planning.

Or at the very least, end up with a lot of wasted time and effort.

So in this video, you'll learn about each of them and some tips to avoid them on your own team.

Enjoy.

-----

Get Modern Data Essentials training (for free) & start building more reliable data architectures
www.ModernDataCommunity.com/

2 weeks ago (edited) | [YT] | 9

View 0 replies

Kahan Data Solutions

If you want to go fast, go alone.
If you want to go far, go together.
Although quotes like this may sound cheesy, there's also some truth to them.

Sometimes speed is most important.

There's no time for future thinking and you just need to get things done NOW.

But I'd guess most data teams (and individual engineers) would rather go "far" with their work.

Because at the end of the day, this isn't a race.

Unfortunately many people in the data world I talk to are surprisingly isolated (going at it alone).

Even if not by choice.

For example, I often hear from people who are the only ones on the data team.

And there’s more work than they can handle.

Or they feel a lot of pressure to build things "the right way" but unsure how to actually do it.

But I'm also realizing many of us are working on very similar challenges - even if the data itself is different.

What if we could all get together and help each other rather than going solo all the time?

Well, all of that is a big reason why I created The Modern Data Community, and I’d like to invite you to join too.

(www.ModernDataCommunity.com/)

It’s completely free and I think you're going to love it.

There's already 600+ data professionals at all levels (students, juniors, leads, executives)... and growing daily.

Plus, I recently uploaded a new training course called Modern Data Essentials.

This is a 90 minute crash-course on some of the most critical concepts in modern data engineering.

Overall, I see this as a place to help you:

- Gain insights on what other data teams are doing

- Learn faster with structured material

- Overall build more effectively and/or advance your career

Even if you never speak up in a discussion thread, there's still so much to learn simply by being there.

So if you're interested, click the link below to learn more.

www.ModernDataCommunity.com/

I hope to see you on the inside!

-Michael

PS - Next week I’ll finally be back with a new YouTube video! And I’ll be in the community responding directly to peoples comments. So if you'd like to be involved with that, come join in now.

Have a great weekend

3 weeks ago (edited) | [YT] | 9

View 1 reply

Kahan Data Solutions

Tell me if this sounds familiar...

Your day starts with checking the status of pipelines and hoping nothing is broken.

If so - its off to putting out fires.

Next, check emails to see if any stakeholders are feeling spicy.

If so, its again off to putting out fires.

Then you have meetings (stand-ups, planning, etc.) and hopefully get some coffee in between.

THEN.. maybe get a chance to dive into some code before lunch.

If you're lucky, the afternoon gives some time to actually make progress on development.

Sprinkle in some more coffee and meetings and soon the day is over.

Rinse and repeat tomorrow.

...Honestly, I feel exhausted just writing this out.

But we're just getting started.

Because after all of that, you're then expected to keep learning and improve your skills.

Or come up with ways to improve the existing company architecture.

So you go online and run into WAY too much information.

So many tools, strategies, opinions, etc.

Go to YouTube and get lost in tutorial rabbit holes for hours.

Or enroll in online courses with 10+ hours of material that you start but may never even finish.

And I say this as somebody who offers this content & training.

I do believe courses are the best way to consolidate information and learn in a straight line fashion.

It's organized, provides bonus resources & goes deeper than random free online content ever could.

It's truly the best material I have to date.

But I find the problem still remains that it's difficult for people to get through them (due to the the daily grind listed above).

And not to mention the fact that we aren't robots and actually have personal lives too.

So why do I say all of this?

As I start to look towards the end of this year, I want to explore new approaches.

I don't want to create another training course and hope people find the time to make it through.

One idea I've been thinking about is smaller, 1-hour programs/workshops (vs extended long courses).

Something you can watch or complete start-to-finish in one sitting.

It'd be less daunting than a 10 hour course.

But also not hands-on like a normal full project-based course.

The thing is - I'm not sure if this is better or worse.

Or is truly solving a problem(s) you have.

So with that said, and before I do anything at all, I'd love to hear from you:

- Do 1 hour programs/workshops sound more or less appealing than a full course? (why/why not)
- To make these no-brainer offers for you to pay for, what would it have to include?
- On the other hand, what would prevent these from being worth your time?
- Are there any topics you'd love to see covered that don't require hands-on development?

I encourage you to be as direct & honest as possible. It's the best way for me to help you.

Hope to hear from you!

-Michael

2 months ago | [YT] | 19

View 6 replies

Kahan Data Solutions

When I talk to data team leaders who are new to dbt, it's usually starts with data transformation & SQL queries.

But then the conversation quickly turns to other key influences it can have on the entire architecture.

Things like version control, automation, environments, testing, etc.

It also can impact things outside the core modeling.

For example, how you choose to name source objects in your landing zone (before transformation).

Or how you organize, secure & name reporting models (after the main data modeling).

I've been trying to find a good word to describe this interconnectedness.

And the fact that a strong dbt implementation is often about more than just writing SQL queries.

So I'll ask you too.

Which word best describes that idea in your mind?

If you have another one, feel free to leave it as a comment!

3 months ago | [YT] | 4

View 6 replies

Kahan Data Solutions

If there is such thing as the "end" of a data pipeline, it's typically a report.

But if you've ever built one of these reports you know that it's never just a simple handoff.

There's almost always some sort of back and forth.

Or future requests for adjustments.

So what I want to talk about in this video is where you decide to actually make those logic changes.

And the secondary impacts of that decision.

The two common schools of thought here are:
1 - Making changes directly inside the report
2 - Keeping changes in the database (transformation layer code)

In this video, I want to make a case for why I personally think you're better off going with option 2.

But whether or not you agree with me...

Hopefully it'll encourage you to consider what's best for your team (& company) going forward.

Enjoy!

3 months ago | [YT] | 10

View 0 replies

Kahan Data Solutions

For those of you on teams that have Type 2 Slowly Changing Dimensions (SCDs), how often do you find the historical (old) records are used for analytics purposes?

Note: Type 2 SCD's keep track of the history of each change.

3 months ago | [YT] | 6

View 5 replies

Kahan Data Solutions

All data teams (large & small) have at least one thing in common.

Source data.

But not everyone handles it the same way in their pipelines.

For some, they'll reference raw source tables directly in many queries.

For others, they'll create ad-hoc custom tables to address subtle formatting changes.

But without any real over arching strategy or consistent naming behind it.

While a more popular topic is data modeling (ex. kimball, one big table, etc.)

I believe an equally more important area to consider is what you do BEFORE you start creating those core data models.

For many, this "before" layer doesn't exist at all.

In previous videos I've talked about a 3-Layered Data Model.

And today I want to focus solely on Layer 1, which addresses this concept.

It's called a "Staging" layer.

When done right, it can help you establish reliable pipelines from the very start.

Enjoy!

3 months ago | [YT] | 13

View 0 replies

Kahan Data Solutions

Every data team (at some point) needs to make a decision on tooling for their data stack.

Nobody gets to skip this step.

This is naturally most important at the very beginning.

But can be a huge factor when considering a data migration as well.

Unfortunately, this decision has become increasingly more difficult over the years as more tools have become available in the market.

In short, it's overwhelming.

But at the same time, you don't want to take this decision lightly or rush it.

It's going to directly impact your effectiveness as a team, your ability to scale and, depending on the decision you make, the bottom line.

So in this video I'll share some key considerations to keep in mind as you navigate this decision so you end up picking tools that are best for you.

Enjoy!

4 months ago | [YT] | 6

View 0 replies