Data collection and standardization
The unique battery kinetics in different battery types are often high-dimensional and hard to characterize due to divergent operating cases, manufacturing variability, and historical usages52. To find a solution to this dilemma, we collected and standardized 130 retired batteries with 5 cathode material types from 7 manufacturers to construct an out-of-distribution, equivalently heterogeneous dataset. Given different historical usages, the capacities of the collected batteries are below 90% of the nominal capacity. The battery cathode materials are lithium cobalt oxide (LCO), nickel manganese cobalt (NMC), lithium ferrophosphate (LFP), nickel-cobalt-aluminum oxide (NCA), and NMC-LCO blended types, which are further grouped into 9 classes based on the manufacturers (Supplementary Table 1). We intentionally include batteries with divergent historical usages, from laboratory testing to electric vehicle driving profiles, to train a generalized model for the battery recycler independent of historical usages and battery types.
For standardization, all data required from the recycler are the currently-probed (field-testing) cycle with one charging and discharging test, which is easy to implement in practical cases. The as-probed data are first denoised by filling in missing values, replacing outliers, and performing median filtering. Human-induced and cathode-heterogeneity-induced noises are deliberately retained, though, to make the model robust to imperfect inputs. The data are then linearly interpolated for curve filling (Supplementary Fig. 1) and feature engineered for dimensionality reduction, with a shared set of standardization parameters (Supplementary Note 1). Features extracted from the standardization pipeline are well interpretable, a concern of significant commercial interest. To the best of our knowledge, it is the first time that heterogeneous battery data from multiple sources and historical usages are utilized to assist in the strategy design of battery recycling.
Figure 2a, b demonstrate the feature engineering process. We focus on the charging and discharging curve of the retired batteries in the last cycle, i.e., one charging and one discharging cycle (Supplementary Figs. 2–5). In the charging cycle, 15 features are extracted from the voltage-capacity and dQ/dV curves, where V and Q refer to the voltage and capacity values, respectively. The same set of features are extracted for the discharging cycle. As a result, 30 features are extracted in total, as indicated from F1 to F30. Refer to Supplementary Table 2 and Supplementary Note 2 for a detailed explanation of the features. Figure 2c showcases the absolute and relative feature values of the selected batteries from each class. Most relative feature values in different classes overlap in the −1 to 0 region (with the light green color) and are indistinguishable, illustrating the difficulty in classifying battery type using one cycle of battery data. The difficulty is expected because the divergent historical operation conditions can influence the charging-discharging kinetics of the batteries so that the extracted features can be largely correlated despite the different battery types (Supplementary Fig. 6). Rather than directly interpreting the extracted features using expert knowledge, we employ an alternative data-driven approach that automatically leverages the latent patterns across various battery types.
Fig. 2: The feature engineering result.
a For the charging process, 15 features are extracted from the voltage-capacity (left) and dQ/dV curve (right). b The same set of features are for the discharging process as F16 to F30. c Features are visualized by classes, following the format CxBn, indicating the nth battery from class x. The size of a circle maps the absolute feature value. Source data are provided as a Source Data file.
Retired battery sorting with homogeneous data access
We first consider a setting where the battery data are homogeneously distributed across the collaborators (namely, the clients). The homogeneity means that each client offers to share the battery data across all 9 classes, even though the specific number of batteries is not restricted (Supplementary Table 3). We train our federated machine learning model without requiring information on the historical use of the retired batteries. In our work, the recycler and the clients only need to test the retired batteries at the current (field-testing) cycle, specifically, with a complete charging-discharging cycle for a standard feature engineering process initiated by the recycler. Local models are trained based on features extracted from their private battery data. The federated machine learning framework aggregates the local model parameters, rather than the private battery data, for the recycler to classify the retired batteries.
Figure 3 shows the sorting results when clients contribute homogeneous battery data. Figure 3a compares two federated machine learning methods, i.e., the majority voting (MV) and our proposed Wasserstein distance voting (WDV), with the independent learning (IL) paradigm. It should be noted that the accuracy for the IL is averaged over all clients in a non-federated manner. Compared with the IL, the MV does not sacrifice sorting performance, with an average accuracy of 95%, while being capable of protecting data privacy and mitigating computational burden. However, 3 classes are missorted using the MV. For instance, 3 batteries in NMC (SNL, class 8, 15 in total) are missorted into NCA (SNL, class 7), resulting in a sorting accuracy of 80%. The sorting accuracy for NCA (UL-PUR, class 9) is 81%, with 2 batteries missorted into NMC (MICH_Form, class 4) and 1 battery missorted into NMC/LCO blended type (HNEI, class 2), respectively. In contrast, the WDV outperforms the MV since it only missorted one battery, resulting in a sorting accuracy of 99%. We also evaluate the prediction probability of each class for the MV and WDV, respectively. It turns out that the WDV makes a more confident sorting than the MV since the prediction probabilities of the WDV are generally right-skewed to a higher probability value. Therefore, our proposed WDV produces higher sorting accuracies across all classes, and the sorting is of richer probability confidence margins.
Fig. 3: Sorting results when clients have homogeneous data access.
a The confusion matrix for the majority voting (MV) and Wasserstein distance voting (WDV) methods, respectively. We consider the prediction probability distribution for each class. The sorting of independent learning (IL) is annotated. b Sorting accuracy distribution and privacy budget (PB) of the IL, MV, and WDV in the presence of random noise. The PB value is referenced at a 90% accuracy level. c Average F1-score of sorting results and PBs in each class using the IL, MV, and WDV. The PB values are all referenced at a 0.9 F1-score level. Data are presented as mean values ±1 standard deviation. d Feature importance, in descending order. The subplot shows the feature space spanned by the first two most salient features. Data are presented as mean values + 1standard deviation. Source data are provided as a Source Data file.
We also evaluate the privacy budget (PB, “Methods” section), considering that client data might be vulnerable to reverse engineering by eavesdropping on private data53. In this regard, we add random Gaussian noise to the client data with different intensities. The intensity of the randomness is controlled by a noise-to-signal ratio (NSR), ranging from 1% to 10%. Figure 3b shows the accuracy and privacy budget comparison when using IL, MV, and WDV, respectively. The sorting accuracy of the MV decreases from 95% to 82%, similar to that of the IL when the noise intensity increases from 1% to 10%. In this noise range, the median sorting accuracy of the MV and IL is 92% and 86%, respectively. In contrast, the sorting accuracy of the WDV is still above 90% in the presence of 10% noise, which is a stringent noise level in practical cases. The WDV has a median sorting accuracy of over 95% in the same noise range. Taking an average sorting accuracy of 90% as an acceptable reference level, the PB values of IL, MV, and WDV are 4, 6, and 10, respectively. Therefore, applying federated machine learning produces a more privacy-secure sorting than IL, hence being capable of preventing data eavesdropping. Furthermore, our proposed WDV is more accurate and performs much better (with nearly doubled PB values) in the privacy-accuracy trade-off than the MV. In addition, the robustness to stringent noise when using the WDV also implies a good tolerance of the battery measurement requirement, reducing the expensive battery testing disbursement.
Noticing that a high sorting accuracy does not necessarily imply an acceptable sorting for a specific class, we also consider within-class sorting performances. Figure 3c shows the F1-score and privacy budget of the IL, MV, and WDV in each predicted class; note that the privacy setting is identical to that in Fig. 3b. The result shows that the IL has smaller F1-scores than the federated machine learning manner in all the classes, making poor sortings. Regarding federated machine learning, WDV outperforms the MV in each predicted class by producing higher average F1 scores. The deviation range of the F1 scores for WDV is smaller than that of the MV, indicating that the WDV is more robust (Supplementary Fig. 7). Therefore, our proposed WDV not only has a better overall sorting accuracy among all nine classes (Fig. 3b) but also within each class, compared with the MV. Regarding the privacy budget, the PB value when using the non-federated IL, referenced at a 0.9 F1-score level, is significantly lower than the federated way (Supplementary Table 4) across all classes. This indicates a more severe risk of data leakage for IL compared with federated machine learning. When further applying our proposed WDV, the private budget can increase by 78% and 44% compared with the non-federated IL and the federated MV, respectively (Supplementary Table 4). The results demonstrate that the WDV successfully leverages the battery-chemistry-related insights hidden in clients while effectively preserving client data privacy.
We then interpret our federated machine learning model by evaluating the most salient features correlated with battery cathode chemistry. Figure 3d shows the importance of the features in descending order. The error bar indicates the importance deviation. Features F1 and F16 rank the top two features regarding out-of-bag importance (“Methods” section). Interestingly, these two features have a clear physical interpretation of the battery dynamics, which we will further discuss in later sections. Here, we rationalize these two features by plotting the grouped battery samples in the feature space spanned by features F1 and F16. The subplot of Fig. 3d shows that NMC/LCO blended type (HNEI, class 2), NMC (MICH_Expa, class 3), and LFP (SNL, class 6) (sharing the color with Fig. 3a) are clearly separable in the spanned feature space. For the remaining classes, the batteries are still separable (see the zoomed-in view), though in relatively more minor grains. On the contrary, the non-salient features have a relatively weaker sorting ability due to the non-separable feature space spanned (Supplementary Fig. 8). As a result, our federated machine learning framework successfully discovered useful mechanism insights to guarantee sorting accuracies. Such an insight could be further extended to simplify the model for light computation, hence less investment. Once the client models classify the batteries, the recycler can aggregate the client results to make a final decision on the battery cathode material types underpinned by the salient features.
Retired battery sorting with heterogeneous data access
We also consider an extreme, while a more actual situation where the data can be exclusively scattered among clients, i.e., the data distribution is heterogeneous. In this situation, the heterogeneity issue poses more challenges to battery type sorting since the clients are prone to train biased models and deteriorate global accuracy, which is still an open question in federated machine learning. In this section, we explore a more challenging situation rather than having homogeneous data access among each client (Supplementary Note 3). We demonstrate that our federated machine learning framework can still classify retired batteries based on the standard feature engineering process at the current (field-testing) cycle without any knowledge of the previous operation conditions.
Figure 4 shows the sorting results when clients have heterogeneous data access. We consider the heterogeneity index, defined as the minimum number of battery classes for each client in each Monte Carlo simulation run. A higher heterogeneity index indicates a less heterogeneous battery data distribution. The heterogeneity index is no smaller than two such that one client can train a local model for a sorting task. Figure 4a shows average sorting accuracy when the heterogeneity index varies. The average accuracies are plotted with solid lines, with the (pm 1) standard deviation range indicated in the shaded region. As the heterogeneity index decreases from 9 to 2, the performance of the MV and the IL rapidly deteriorates at a sublinear rate. The average sorting accuracy of the MV is 0.55, slightly better than the IL, equivalent to a random guess when the heterogeneity level is two. This observation shows that the MV can help little to aggregate the local models under heterogeneous data access. In contrast, the WDV outperforms its MV counterpart in all heterogeneity levels, successfully mitigating the heterogeneous data distribution issue. Moreover, the WDV shows an interesting asymptotic effect when the heterogeneity index increases. This indicates that the WDV can potentially support the optimal allocation/distribution of client battery data to reduce the collaboration cost in practical battery recycling situations.
Fig. 4: Sorting results when clients have heterogeneous data access.
a Sorting accuracy as a function of heterogeneity index. The results are averaged over 50 Monte Carlo runs ((n=50)), with one standard deviation region ((pm 1sigma)) indicated by shaded color. b The data distribution when benchmarking the best majority voting (MV) performance. c Class-wise (upper part) and client-wise (lower part) sorting accuracy corresponds to our federated and independent machine learning (IL) methods. The Sanky chart (middle) presents the heterogeneous data distribution among clients. Source data are provided as a Source Data file.
We select the best model using the MV when the heterogeneity index equals two and compare it with the sorting result of our proposed WDV under the same setting. The selected best model has an average sorting accuracy of 71%, as shown in Fig. 4a.
The detailed battery data distribution setting of the best model using MV is illustrated in Fig. 4b, which is heterogeneous (Supplementary Table 5). For instance, client 2 contributes to all battery classes except for NMC (MICH-Expa, class 3), while client 5 only contributes to NMC/ LCO blended type (HNEI, class 2) and NMC (SNL, class 8). Under the heterogeneous data distribution setting in Supplementary Table 5, we further compare the class-wise and client-wise sorting performance of the MV and the WDV to the non-federated IL with two considerations: (1) the significance of our federated machine learning framework and (2) why our proposed WDV outperforms the MV. First, we evaluate the client-wise sorting accuracy, shown in the lower side of Fig. 4c. Client 5 achieves an average sorting accuracy of 25%, ranking last among all clients. Meanwhile, client 2 achieves an average sorting accuracy of 86%, ranking first among all clients. However, the average sorting accuracy is only 55%, close to a random guess. Therefore, the client performance using the non-federated IL depends heavily on data access (Supplementary Fig. 9). In fact, without our federated machine learning framework, the battery recycler is equivalent to a single client, and the battery recycler can only make sortings on the battery types stored in its local database. This non-federated paradigm could not handle various types of retired batteries if the recycler did not build a database covering all the battery types it would handle. With our federated machine learning framework, the recycler can collaborate with several clients, even if under heterogeneous data situations.
We turn to analyze how to collaborate with clients under heterogeneous data access settings. The upper part of Fig. 4c shows the class-wise accuracy of the MV and WDV. It is noticed that the average sorting accuracy after using the MV is better than the non-federated way, which is 79%, as indicated in the lower side of Fig. 4c. It demonstrates the success of applying the federated machine learning framework to address the heterogeneous data distribution issue in this case. However, the MV totally missorted LFP (SNL, class 6) and NMC (SNL, class 8) with zero accuracy. The failure of the MV in specific classes can be rationalized by its core idea of giving more weight to the clients who contribute more battery samples while not guaranteeing diversity in battery types. For instance, the contribution of client 7 will be strengthened by the MV due to a large number of batteries (specifically, 195 augmented batteries, ranking second among clients), despite only contributing four classes of batteries. As a result, the MV will lead the aggregated model to be biased towards large client such as client 7 (Supplementary Table 4). The biased phenomenon is evidenced by the as-described zero sorting accuracy for LFP (SNL, class 6) since the large client, such as client 7, never contributed any batteries in class 6. Similarly, client 1, the largest client with 197 augmented batteries, failed to contribute helpful information to the recycler regarding classifying NMC (SNL, class 8), which is consistent with zero accuracy in class 8. In contrast to the MV, our proposed WDV focuses on the battery similarities between the recycler and each client by measuring the pairwise distance. We aim to assign fewer weightings to the clients with biased data distributions (equivalently, higher heterogeneity), whose batteries are of higher similarities with the recycler, such that the recycler can have generalized information from each client. The results show that our proposed WDV successfully leverages helpful information from heterogeneous data distribution among clients. The WDV achieves 100% and 89% sorting accuracy for the otherwise missorted batteries in LFP (SNL, class 6) and NMC (SNL, class 8), respectively. The overall sorting accuracy using the WDV is up to 97%, with only 5 batteries missorted out of 144 samples. In Supplementary Fig. 10, we also notice that the missorted batteries are of similar cathode materials. Specifically, 2 batteries with the NMC cathode material were missorted into the NMC/LCO blended type; while 1 battery with the NCA cathode material was correct in material type but missorted into another manufacturer. On the contrary, the missorted results produced by the MV can spread to either many irrelevant classes or manufacturers. Therefore, we conclude that the WDV can aggregate helpful client insights by distinguishing inherited differences in cathode material types. Inspired by this, the WDV also suggests that the clients are encouraged to contribute more battery data in diversity rather than more data in some specific classes. The recycler can optimize the benefit distribution based on helpful client information provided. Ultimately, our federated machine learning framework enables the recycler to know the battery cathode material type, even if without their own data access to various battery data, while preserving the data privacy of potential clients.
An economic evaluation of retired battery recycling
To help understand the relevance and necessity of battery sorting in actual recycling practice, also to verify the significance of our proposed WDV strategy, an economic evaluation is performed. Three recycling methods (pyrometallurgy, hydrometallurgy, and direct recycling), two battery cathode types (LFP-graphite and NMC-graphite), two recycling modes (individual, hybrid), three sorting accuracy levels (97%, 71%, 55%) induced by the federated and non-federated machine learning methods (WDV, MV, IL) are included in the evaluation. The notation of ML-direct in Fig. 5a denotes direct recycling enabled by our federated machine learning framework. The individual mode denotes that batteries have been previously sorted in a human-aided manner (Fig. 5b–d), which is used to compare different recycling methods given a known cathode type. The hybrid mode denotes that batteries are collected with mixed cathode types (Fig. 5e–g), which is used to analyze the significance of the battery sorting toward recycling profits. The detailed calculation procedure and numerical results can be found in Supplementary Note 4 and Supplementary Tables 6–15, respectively.
Fig. 5: An economic evaluation of retired battery recycling.
a Comparison of the Pyro- (pyrometallurgical), Hydro-(hydrometallurgical), and ML-direct (machine learning aided direct) recycling methods. b Cost analysis of Lithium Iron Phosphate (LFP) and Nickel Manganese Cobalt Oxide (NMC) batteries using different recycling methods in individual modes. c Cost analysis of LFP and NMC batteries using ML-direct recycling in individual mode. d Cost, revenue, and profit comparison of the individual battery type using different recycling methods in individual mode. e Cost, revenue, and profit comparison using Wasserstein distance voting (WDV), majority voting (MV), and independent learning (IL) methods in hybrid mode. The ratio is the amount of LFP battery to that of NMC battery. f Sensitivity analysis of the profit of WDV, MV, and IL methods in a hybrid model towards sorting accuracy in hybrid mode. The ratio is the amount of LFP to that of the NMC battery. g Comprehensive comparison of different battery recycling technologies in hybrid mode. Source data are provided as a Source Data file. The graphics in panel a were created using icons from Flaticon.com.
Figure 5a shows a schematic diagram of three recycling methods, including pyrometallurgy, hydrometallurgy, and ML-direct recycling. The final product of pyrometallurgy is metal alloy. While final products of hydrometallurgy are lithium salt and precursor, which should be further processed to assemble batteries, as indicated by red and blue arrows in Fig. 5a. Compared to the other two non-machine learning-aided methods, ML-direct recycling has the shortest process flow since the product is standard battery materials, which brings about the largest possible convenience and the least possible environmental footprints. It should be stressed that such convenience is enabled by accurate sorting, a vital link in pretreatment for actual battery recycling practice, thanks to our federated machine learning framework.
The cost analysis of LFP and NMC batteries using different recycling methods is shown in Fig. 5b, including raw material, reagent, average labor, electricity & water, equipment depreciation, and sewage treatment. It can be observed that the raw material accounts for the largest proportion of the cost. As a result, the cost of NMC is always higher than LFP in any method owing to the large price difference between NMC and LFP. Besides, for the same type of batteries, the cost of ML-direct recycling is the largest while the pyrometallurgy is the least, owing to the large expense of reagents. Considering the reagents can be heavily cathode material specific, the profitability of ML-direct recycling largely depends on the sorting accuracy of the mixed retired batteries. Further analysis of the detailed proportion of cost structure in ML-direct recycling is summarized in Fig. 5c. The outer and inner annuluses stand for NMC and LFP batteries, respectively. Except for raw material and reagent, the sum of the other costs is the same in price (5620 ¥/t) but more than twice the difference in percentage (NMC for 28%, LFP for 13%). The cost of raw materials for NMC (29900 ¥/t, accounting for 74%) is nearly three times that of LFP (9687.5 ¥/t, accounting for 54%), which again indicates the profit of ML-direct recycling is sensitive to the sorting accuracies. Figure 5d lists the cost, revenue, and profit of LFP and NMC batteries using different recycling methods. For the largest profit option, NMC battery using ML-direct recycling (29944.25 ¥/t) is 2.25 times the second largest profit option (LFP batteries using ML-direct recycling, 13279.51 ¥/t). It can be summarized that ML-direct recycling has the largest revenue and profit. Moreover, it is also noticed that the profit of recycling NMC is always larger than LFP, highlighting the significance of efficiently sorting high-value recycling candidates from a bulk of mixed retired batteries.
In a practical scenario, collected retired batteries could be expensive and even impossible to sort by human-aided pretreatment, especially when the recycling is scaling up. On the contrary, ML-direct recycling has the unique advantage of efficiently sorting the retired batteries by leveraging existing data sources from multiple battery recycling collaborators. An economic analysis using different machine learning paradigms (independent learning, i.e., IL; and federated machine learning, i.e., MV and WDV) is carried out in Fig. 5e, f. Due to the high sorting accuracy of WDV, the two types of batteries (LFP and NMC) can be completely sorted and the final product can be utilized to assemble new batteries directly. On the contrary, the MV and IL would produce significant errors in distinguishing cathode materials, thus leading to low-value products (impure materials) that are unable to be directly utilized, requiring further refining. As a result, the profit decreases asymptotically for MV and IL methods when sorting accuracy is lower than WDV, specifically 97%. NMC battery recycling using WDV-based ML-direct recycling has a high profit of 24389.33, 21611.88, and 18834.42 ¥/t for the LFP/NMC ratio of 1:2, 1:1 and 2:1, respectively, which are higher than those of pyrometallurgy (4372.32, 3994.46, and 3616.61 ¥/t) and hydrometallurgy (9957.45, 10039.27, and 10121.09 ¥/t). The profits of pyrometallurgy and hydrometallurgy are not sensitive to sorting accuracy since these methods do not require stringent retired battery cathode material information. Such a high profit from ML-direct recycling not merely stems from the inherited advantage of direct recycling but is enabled by our effective and accurate retired battery sorting. Finally, a qualitative comparison of different battery recycling technologies is illustrated in Fig. 5g. ML-direct recycling performs noticeable advantages in environmental protection7,54, operation simplicity, privacy, data sharing, and profit. Our ML-direct recycling method has huge socioeconomic values and can quickly accelerate the development of the battery recycling industry, especially when next-generation batteries are even more complex in cathode material diversities.