Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
naninamu
Helper III
Helper III

Left outer Join - but only selected columns from both tables

Hi - I feel this should be easy but I can't work it out.

 

I have table A - it has let's say 50 columns, it's quite big.

I have table B - let's say it also has 50 columns.

 

I need to merge these with a Left Outer Join. However, currently it's taking all 50 columns from Table A and then as many columns as I select from Table B.

 

But say I only want to select 5 columns from Table A (as that's all I need and to help performance) - how do I actually do this??

 

Thanks in advance.

13 REPLIES 13
PwerQueryKees
Impactful Individual
Impactful Individual

Option b is definitely not advisable. From a performance perspective, but also from a code structure perspective.

Option a and c are similar in most respects I think. Although I would probably remove the unwanted columns before the merge, from a maintainability perspective. I don't think perfromance would be very different.

Option c would be my preference from a code structure an clarity perspective. 

Option d I don't know. I am not using PBI, only PW in excel sometimes combined with Power Pivot. 

naninamu
Helper III
Helper III

Hi @PwerQueryKees Honestly I don't really know. It was just something I read that a reference would still hit the table twice, but upon doing more reading I'm seeing contradictions.

 

Which of these approaches would you recommend from a performance point of view:

 

a) Merge Table A with Table B into a new table, remove unwanted columns

b) Duplicate Table A, merge with table B, remove unwanted columns

c) Reference Table A, merge with table B, remove unwanted columns

d) Create a calculated table in DAX and perform the merge in DAX in PBI

 

I have done Option a). I feel this is similar to Option c), as in due to the fact that I am merging Table A to create a new table, it is using all the steps Table A needed to get to that point.

 

I also tried Option d, which worked until I came to buld my chart, and it wouldn't accept anything in the Y axis field. No ideal why, obviously a problem with my data types.

 

Thanks in advance!

 

 

PwerQueryKees
Impactful Individual
Impactful Individual

Are you sure that creating a reference to your table and doing the join with the reference will hit the table twice? Normally query folding would just hit a table once, but there are conditions and the differ per connector and DBMS. Best way to see what happens is to give it a try... 

By the way: Making a reference is NOT copying. It does make any differene in performance if you have a query with 10 steps or 2 queries with 5 steps each (provided the steps remain the same off course...)

naninamu
Helper III
Helper III

Thanks Poojara. My issue is not the above - I know how to do that - but the fact that down the track I will need to use all the columns in Table A (the left table) so I cant delete them from the merge.

 

I could make a copy of the table, but that impacts performance as it hits the DB twice and has millions of rows.

Poojara_D12
Solution Sage
Solution Sage

Hi @naninamu 

To merge Table A and Table B with a Left Outer Join while selecting only specific columns:

  1. Remove Unnecessary Columns from Table A:

    • In Power Query, select Table A and remove columns you don't need (only keep the 5 columns you want).
  2. Merge the Tables:

    • Use Merge Queries to join Table A and Table B on the key column(s) with a Left Outer Join.
  3. Select Columns from Table B:

    • After merging, expand the new column from Table B and select only the columns you need.

This way, you limit the columns used in the merge, improving performance.

 

Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Poojara
Data Analyst | MSBI Developer | Power BI Consultant
Consider Subscribing my YouTube for Beginners/Advance Concepts: https://youtube.com/@biconcepts?si=04iw9SYI2HN80HKS 

Omid_Motamedise
Memorable Member
Memorable Member

Initially romve the columns you do not want and then apply the merge

But as I said in the above posts, I need some of the columns I'm removing down the track...

ThxAlot
Super User
Super User

If possible, finish joins operation at the RDB side (SQL Server, MySQL, Oracle ...); not only because those "traditional RDB" are more performant in joining, but what's more practical is that they support flexible Non-equi joins.

 

If all you can do is only PQ, try Tables.SelectColumns() to choose necessary columns; then join the resultant tables.



Expertise = List.Accumulate(


        {Days as from Today},


        {Skills and Knowledge},


        (Current, Everyday) => Current & Day.LeanAndPractise(Everyday)


)



Thanks. It's a good point. I do need to do more dev pre PQ. 😛

naninamu
Helper III
Helper III

Thank you - if performance isn't impacted my question is redundant then.

 

However, I was aware of only selecting the columns.

 

THe problem is , in Table A for instance,  what if I still need to refer to the full Table for certain visualisations, while the merged table doesn't need those columns? What I mean is, pratically, I'd go to Table A, pick the coumns I need, and then merge with Table B to create a new table. However, Table A now only has the reduced number of columns. What if one of the ones I removed to create the merge is one I actually need in Table A?

 

I see what you mean.

Make a new query as a reference to TableA, select your columns and do the merge.
This will keep your TableA as it is and will give you a new table with the merge result.

In real life,my Table A in  extremely large - 500m rows +. Even if I make a reference, it's still hitting the dB twice isn't it? I was trying to avoid doing this by only selecting the columns needed to at least save some performance - but if columns don't matter I guess that's a moot point.

 

What I did try was to copy the table in DAX and merge it from there. This was successful, although I ran into another issue as I then tried to use this table as part of a scatter chart, and it wouldn't let me drag anything into the Y axis field. Not sure if it's related to how my new table is built, or something completely unrelated... but in theory, is this methond a lot quicker than doing a duplicate/reference in PQ? Cheers, Andrew

PwerQueryKees
Impactful Individual
Impactful Individual

Performance of left outer join is not impacted by the number of columns.
BUT

In general, always start by selecting only the rows and the columns you need from every table you use.

Selecting the columns you need is easy:

  • Select the columns by ctrl-click on the ones you need
  • Right-Click on any of the selected columns and select "Remove other columns"
  • The new step that is created has a little cogwheel!
  • Click on the cogwheel if you change your mind and want to select more or fewer of the columns

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.