March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi - I feel this should be easy but I can't work it out.
I have table A - it has let's say 50 columns, it's quite big.
I have table B - let's say it also has 50 columns.
I need to merge these with a Left Outer Join. However, currently it's taking all 50 columns from Table A and then as many columns as I select from Table B.
But say I only want to select 5 columns from Table A (as that's all I need and to help performance) - how do I actually do this??
Thanks in advance.
Option b is definitely not advisable. From a performance perspective, but also from a code structure perspective.
Option a and c are similar in most respects I think. Although I would probably remove the unwanted columns before the merge, from a maintainability perspective. I don't think perfromance would be very different.
Option c would be my preference from a code structure an clarity perspective.
Option d I don't know. I am not using PBI, only PW in excel sometimes combined with Power Pivot.
Hi @PwerQueryKees Honestly I don't really know. It was just something I read that a reference would still hit the table twice, but upon doing more reading I'm seeing contradictions.
Which of these approaches would you recommend from a performance point of view:
a) Merge Table A with Table B into a new table, remove unwanted columns
b) Duplicate Table A, merge with table B, remove unwanted columns
c) Reference Table A, merge with table B, remove unwanted columns
d) Create a calculated table in DAX and perform the merge in DAX in PBI
I have done Option a). I feel this is similar to Option c), as in due to the fact that I am merging Table A to create a new table, it is using all the steps Table A needed to get to that point.
I also tried Option d, which worked until I came to buld my chart, and it wouldn't accept anything in the Y axis field. No ideal why, obviously a problem with my data types.
Thanks in advance!
Are you sure that creating a reference to your table and doing the join with the reference will hit the table twice? Normally query folding would just hit a table once, but there are conditions and the differ per connector and DBMS. Best way to see what happens is to give it a try...
By the way: Making a reference is NOT copying. It does make any differene in performance if you have a query with 10 steps or 2 queries with 5 steps each (provided the steps remain the same off course...)
Thanks Poojara. My issue is not the above - I know how to do that - but the fact that down the track I will need to use all the columns in Table A (the left table) so I cant delete them from the merge.
I could make a copy of the table, but that impacts performance as it hits the DB twice and has millions of rows.
Hi @naninamu
To merge Table A and Table B with a Left Outer Join while selecting only specific columns:
Remove Unnecessary Columns from Table A:
Merge the Tables:
Select Columns from Table B:
This way, you limit the columns used in the merge, improving performance.
Did I answer your question? Mark my post as a solution, this will help others!
If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂
Kind Regards,
Poojara
Data Analyst | MSBI Developer | Power BI Consultant
Consider Subscribing my YouTube for Beginners/Advance Concepts: https://youtube.com/@biconcepts?si=04iw9SYI2HN80HKS
Initially romve the columns you do not want and then apply the merge
But as I said in the above posts, I need some of the columns I'm removing down the track...
If possible, finish joins operation at the RDB side (SQL Server, MySQL, Oracle ...); not only because those "traditional RDB" are more performant in joining, but what's more practical is that they support flexible Non-equi joins.
If all you can do is only PQ, try Tables.SelectColumns() to choose necessary columns; then join the resultant tables.
Expertise = List.Accumulate( {Days as from Today}, {Skills and Knowledge}, (Current, Everyday) => Current & Day.LeanAndPractise(Everyday) ) |
Thanks. It's a good point. I do need to do more dev pre PQ. 😛
Thank you - if performance isn't impacted my question is redundant then.
However, I was aware of only selecting the columns.
THe problem is , in Table A for instance, what if I still need to refer to the full Table for certain visualisations, while the merged table doesn't need those columns? What I mean is, pratically, I'd go to Table A, pick the coumns I need, and then merge with Table B to create a new table. However, Table A now only has the reduced number of columns. What if one of the ones I removed to create the merge is one I actually need in Table A?
I see what you mean.
Make a new query as a reference to TableA, select your columns and do the merge.
This will keep your TableA as it is and will give you a new table with the merge result.
In real life,my Table A in extremely large - 500m rows +. Even if I make a reference, it's still hitting the dB twice isn't it? I was trying to avoid doing this by only selecting the columns needed to at least save some performance - but if columns don't matter I guess that's a moot point.
What I did try was to copy the table in DAX and merge it from there. This was successful, although I ran into another issue as I then tried to use this table as part of a scatter chart, and it wouldn't let me drag anything into the Y axis field. Not sure if it's related to how my new table is built, or something completely unrelated... but in theory, is this methond a lot quicker than doing a duplicate/reference in PQ? Cheers, Andrew
Performance of left outer join is not impacted by the number of columns.
BUT
In general, always start by selecting only the rows and the columns you need from every table you use.
Selecting the columns you need is easy:
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
20 | |
16 | |
13 | |
10 | |
9 |
User | Count |
---|---|
34 | |
32 | |
20 | |
19 | |
17 |