Salesforce 数据清洗

新系统上线后,须要导入历史数据,可是旧数据格式,数据缺失,数据错误,奇异值,属性归类与新系统有很大的gap。所以咱们须要创建一套数据动态清洗规则给Salesforce系统,经过这些规则自动清洗导入数据,清洗规则可让function本身配置。而不须要IT负责html

 

下面将详细举一个例子如何在salesforce中作数据处理。数据清洗须要分红5个步骤数组

1,创建2个关联数据的Object的和 一个数据清洗后台设置的Object的
2,数据导入页面csv
3,定义每一个字段的范围、属性,若是是错误的则自动从新分配,或者修改为临近值
4,数据清洗合并。
5,导出错误数据到Excel
 
第一步,新创建两个关联的Recruit 和 Recruit Department, 而且创建一个清洗规则的Object,当导入数据后咱们能够读取设置的清洗规则,并对导入的数据进行清洗
第二步,对于清洗规则,咱们只能有一条规则被激活,所以咱们在插入新规则和更改旧规则的时候,咱们须要添加一个tirgger针对Data_Washing_Setting,保证规则的惟一性。
 
 1 trigger IsActiveChecking on Data_Washing_Setting__c (before insert,before update) {
 2 
 3     List<Data_Washing_Setting__c> ListOldData =[select Id from Data_Washing_Setting__c 
 4                                                where Active_this_Rule__c = true];
 5     List<Data_Washing_Setting__c> ListNewData =trigger.new;
 6     
 7     //system.debug('ListNewData:'+ListNewData.size());
 8     integer itemNum = 0;
 9     if(trigger.isInsert)
10     {
11         if(trigger.isBefore)
12         {
13             for(Data_Washing_Setting__c dws : trigger.new)
14             {
15                 if(dws.Active_this_Rule__c)
16                 {
17                     itemNum++;
18                 }
19             }
20            itemNum +=ListOldData.size();
21            
22            if(itemNum>1)
23            {
24                for(Data_Washing_Setting__c dws : trigger.new){   
25                     dws.adderror('only one record can be actived! pls check your history data and try again.');
26                 }
27            }
28         }
29     }
30     else if(trigger.isUpdate)
31     {   
32         if(trigger.isBefore)
33         {
34             // 去掉更新的数据
35             for(Data_Washing_Setting__c dws : trigger.new)
36             {
37                 for(integer i=0;i<ListOldData.size();i++){
38                     if(dws.Id== ListOldData[i].Id)
39                     {
40                         ListOldData.remove(i);
41                     }
42                 }
43                 if(dws.Active_this_Rule__c)
44                 {
45                     itemNum++;
46                 }
47             }
48            itemNum +=ListOldData.size();
49            if(itemNum>1)
50            {
51                for(Data_Washing_Setting__c dws : trigger.new){   
52                     dws.adderror('only one record can be actived! pls check your history data and try again.');
53                 }
54            }
55           
56         }
57     }
58 }

第三步,咱们须要创建导入页面,并添加相应的验证按钮app

VF的代码ui

 1 <apex:page controller="BatchInsertByCsvController">
 2     <apex:form >
 3     <apex:sectionHeader title="Upload Recruit Data"/>
 4    <apex:pageMessages />
 5    <apex:pageblock >
 6         <center>
 7             <apex:inputFile value="{!contentFile}" fileName="{!fileName}" />
 8             <apex:commandButton action="{!LoadData}" value="Batch Insert"/>
 9             <apex:commandButton action="{!LoadBlankList}" value="Filter Blank Data"/>
10             <apex:commandButton action="{!ExportBlankToCSV}" value="Export CSV"/>
11             
12         </center>
13     </apex:pageblock>
14      <apex:pageBlock title="Import Data">
15          <apex:pageblocktable value="{!RecruitList}" var="ReList">
16               <apex:column value="{!ReList.Name}" />
17               <apex:column value="{!ReList.Position_Name__c}" />
18               <apex:column value="{!ReList.Recruit_Department__c}" />
19               <apex:column value="{!ReList.Recruit_Type__c}" />
20               <apex:column value="{!ReList.Recruit_Number__c}" />
21         </apex:pageblocktable>
22      </apex:pageBlock>
23      <apex:pageBlock title="Blank Data">
24          <apex:pageblocktable value="{!BlankList}" var="BList">
25               <apex:column value="{!BList.Name}" />
26               <apex:column value="{!BList.Position_Name__c}" />
27               <apex:column value="{!BList.Recruit_Department__c}" />
28               <apex:column value="{!BList.Recruit_Type__c}" />
29               <apex:column value="{!BList.Recruit_Number__c}" />
30         </apex:pageblocktable>
31      </apex:pageBlock>
32     </apex:form>
33 </apex:page>

后台APEX 导入代码this

  1 public class BatchInsertByCsvController {
  2     
  3     public string fileName{get;set;}
  4     //Blob:二进制对象类型。经过inputFile选中后的文件在后台获取的时候是一个Blob类型,
  5     public Blob contentFile{get;set;}
  6     public String[] filelines = new String[]{};
  7     public List<Recruit__c> RecruitList{get;set;}
  8     public List<Recruit__c> BlankList{get;set;}
  9     public List<Recruit__c> invaildList{get;set;}
 10     //初始化
 11     public PageReference LoadData()
 12     {
 13         try{
 14             filename = bitToString(contentFile,'ISO-8859-1');
 15             filelines = fileName.split('\n');
 16            // ApexPages.Message msgs = new ApexPages.Message(ApexPages.Severity.INFO, 'import account:'+filelines.size());
 17            // ApexPages.addMessage(msgs);
 18             RecruitList = new List<Recruit__c>();
 19             string[] inputvalues;
 20             string SwpNumber;
 21             
 22             for(Integer i=1;i<filelines.size();i++)
 23             {
 24                 inputvalues = new string[]{};
 25                 inputvalues = filelines[i].split(',');
 26                 Recruit__c recruits = new Recruit__c();
 27                 recruits.Name = inputvalues[0];
 28                 recruits.Position_Name__c = inputvalues[1];
 29                 recruits.Recruit_Department__c = [SELECT Id 
 30                                 FROM Recruit_Department__c 
 31                                 WHERE Name =:inputvalues[2] LIMIT 1].Id;
 32                 recruits.Recruit_Type__c = inputvalues[3];
 33                 SwpNumber = inputvalues[4];
 34                 recruits.Recruit_Number__c = Decimal.valueOf(SwpNumber.trim());
 35                 RecruitList.add(recruits);
 36             }
 37         }
 38         catch(exception e){
 39             ApexPages.Message errormsg = new ApexPages.Message(ApexPages.Severity.ERROR,'An error has occured reading the CSV file: '+e.getMessage());
 40             ApexPages.addMessage(errormsg);
 41         }
 42         try{
 43            // insert RecruitList;
 44           //   ApexPages.Message successMsg = new ApexPages.Message(ApexPages.severity.INFO,'import success');
 45             // ApexPages.addMessage(successMsg);
 46         }
 47         catch(Exception e)
 48         {
 49             //ApexPages.Message errormsg = new ApexPages.Message(ApexPages.severity.ERROR,'An error has occured inserting the records'+e.getMessage());
 50             //ApexPages.addMessage(errormsg);
 51         }
 52         return null;
 53     }
 54     //blob是二进制存储的,String是16进制存储的,因此使用此种方式加上编码解码等操做确定会更加适应,包括中文
 55     private String bitToString(Blob input, String inCharset){
 56          //转换成16进制
 57         String hex = EncodingUtil.convertToHex(input);
 58          //一个String类型两个字节 32位(bit),则一个String长度应该为两个16进制的长度,因此此处向右平移一个单位,即除以2
 59          //向右平移一个单位在正数状况下等同于除以2,负数状况下不等
 60          //eg 9  00001001  >>1 00000100   结果为4
 61          final Integer bytesCount = hex.length() >> 1;
 62          //声明String数组,长度为16进制转换成字符串的长度
 63          String[] bytes = new String[bytesCount];
 64          for(Integer i = 0; i < bytesCount; ++i) {
 65              //将相邻两位的16进制字符串放在一个String中
 66              bytes[i] =  hex.mid(i << 1, 2);
 67          }
 68          //解码成指定charset的字符串
 69          return EncodingUtil.urlDecode('%' + String.join(bytes, '%'), inCharset);
 70      }
 71     //筛选空值
 72     public PageReference LoadBlankList()
 73     {
 74         try
 75         {
 76             BlankList=new list<Recruit__c>();
 77             DataWashingSetting dws=new DataWashingSetting();
 78             string[] flines = dws.AddQuestionsData(filelines);
 79             string[] inputvalues;
 80             string SwpNumber;
 81             
 82             for(Integer i=0;i<flines.size();i++)
 83             {
 84                     inputvalues = new string[]{};
 85                     inputvalues = flines[i].split(',');
 86                     Recruit__c recruits = new Recruit__c();
 87                     recruits.Name = inputvalues[0];
 88                     recruits.Position_Name__c = inputvalues[1];
 89                     recruits.Recruit_Department__c = [SELECT Id 
 90                                     FROM Recruit_Department__c 
 91                                     WHERE Name =:inputvalues[2] LIMIT 1].Id;
 92                     recruits.Recruit_Type__c = inputvalues[3];
 93                     SwpNumber = inputvalues[4];
 94                     recruits.Recruit_Number__c = Decimal.valueOf(SwpNumber.trim());
 95                     BlankList.add(recruits);
 96             }
 97             ApexPages.Message msgs = new ApexPages.Message(ApexPages.Severity.INFO, 'blank num:'+BlankList.size());
 98             ApexPages.addMessage(msgs);
 99         }
100         catch(Exception e)
101         {
102             ApexPages.Message errormsg = new ApexPages.Message(ApexPages.Severity.ERROR,'An error has occured reading the CSV file: '+e.getMessage());
103             ApexPages.addMessage(errormsg);
104         }
105         return null;
106     }
107     public PageReference ExportBlankToCSV()
108     {
109          return new PageReference('/apex/ExportCSV');
110     }
111 }

后台调用的验证清洗代码,能够根据须要任意添加编码

 1 public class DataWashingSetting {
 2 
 3     //消除重复数据
 4     public List<Recruit__c> DelDuplicateData(List<Recruit__c> OriginalList)
 5     {        
 6         set<Recruit__c> myset= new set<Recruit__c>();
 7         List<Recruit__c> result = new List<Recruit__c>();
 8         
 9         myset.addAll(OriginalList);
10         result.addAll(myset);
11         
12         return result;
13     }
14     //筛选为空数据
15     public string[] AddQuestionsData(string[] filelines)
16     {
17         string[] result =new string[]{}; 
18         string[] inputvalues;
19         for(Integer i=1;i<filelines.size();i++)
20         {
21             inputvalues = new string[]{};
22             inputvalues = filelines[i].split(',');
23             if(inputvalues[0] == ''||inputvalues[1] == '' ||inputvalues[2] == '' 
24                 ||inputvalues[3] == '' ||inputvalues[4] == '')
25             {
26                 result.add(filelines[i]);   
27             }
28          }
29         return result;
30     }
31     //检测各个字段的合理性
32     public string[] CheckFiled(string[] filelines)
33     {
34         //读取规则
35         Data_Washing_Setting__c dws = [select Position_Name_Rule__c,
36                                        Recruit_End_Number__c,Recruit_Department_Rule__c,Recruit_Start_Number__c from Data_Washing_Setting__c where Active_this_Rule__c = true];
37         string PositionNameRule = dws.Position_Name_Rule__c; //部门规则是否容许重复
38         decimal startNumber= dws.Recruit_Start_Number__c; //招聘人数底线
39         decimal endNumber= dws.Recruit_End_Number__c; //招聘人数上线
40         string department = dws.Recruit_Department_Rule__c;//部门限制
41         
42         string[] result =new string[]{}; 
43         string[] inputvalues;
44         for(Integer i=1;i<filelines.size();i++)
45         {
46             inputvalues = new string[]{};
47             inputvalues = filelines[i].split(',');
48             //填写验证代码
49          }
50         return result; //返回不合格代码
51     }
52 }

出现问题数据直接导出问题数据到Excel,手动处理后再导入。url

 1 <apex:page controller="BatchInsertByCsvController" cache="true" contentType="application/x-excel# BlankList.xls" showHeader="false">
 2  <head>
 3       <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
 4  </head>
 5      <apex:pageBlock >
 6      <apex:pageblocktable value="{!BlankList}" var="BList">
 7               <apex:column value="{!BList.Name}" />
 8               <apex:column value="{!BList.Position_Name__c}" />
 9               <apex:column value="{!BList.Recruit_Department__c}" />
10               <apex:column value="{!BList.Recruit_Type__c}" />
11               <apex:column value="{!BList.Recruit_Number__c}" />
12         </apex:pageblocktable>
13     </apex:pageBlock>
14 </apex:page>

下面就是最终效果:spa

1,导入数据,自动筛选有缺失值的数据,并支持Excel导出debug

2,后台清洗的规则设置。excel