our challenge for bulkload reliability improvement
TRANSCRIPT
![Page 1: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/1.jpg)
Our challenge for Bulkload reliability improvement
Satoshi Akama
July. 14, 2016Treasure Data Tech Talk 201607 #tdtech
×
![Page 2: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/2.jpg)
Satoshi Akama
Embulk plugins embulk-input-gcs embulk-input-azure_blob_storage embulk-output-azure_blob_storage
embulk-output-dynamodb embulk-output-sftp
Software Engineer (Java/Scala/Ruby)
github.com/sakama/
@oreradio
Treasure Data, Inc.
![Page 3: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/3.jpg)
TopicsEmbulk plugin development
Retry! Retry!! Retry!!! Exception handling Battle with external service’s specs Write unit test Java or JRuby ?
Use embulk at Treasure Data Integration test Implement new API endpoint Infrastructure management
![Page 4: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/4.jpg)
We’re using Embulk as bulkload toolPluggable bulkload tool Released as OSS We’re using same version of OSS
![Page 5: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/5.jpg)
GUI interface is availableCurrently Output side only:)
![Page 6: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/6.jpg)
Data Connector(Import) - CUIguess/preview/import
$ td connector:guess seed.yml -o load.yml
$ td connector:preview load.yml
$ td connector:issue load.yml —database td_sample_db \ —table td_sample_table
Scheduled execution
$ td connector:create \ daily_import \ “10 5 * * * “ \ td_sample_db \ td_sample_table \ load.yml \ —time-column created_at
GUI will come in the near future
![Page 7: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/7.jpg)
Document and magazines Official website Qiita(JP only) Twitter
http://www.embulk.org/ http://qiita.com/search?q=embulk #embulk
![Page 8: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/8.jpg)
Plugin development
![Page 9: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/9.jpg)
Retry! Retry! Retry!!!
… Storage.Objects.Get getObject = client.objects().get(bucket, key); InputStream stream = getObject.executeMediaAsInputStream();
Embulk(embulk-core) provides RetryExecutor
Almost Official SDK contains retry logic, but not enough.
try { return retryExecutor() .withRetryLimit(3) .withInitialRetryWait(500) .withMaxRetryWait(30 * 1000) .runInterruptible(new Retryable<InputStream>() { @Override public InputStream call() throws InterruptedIOException, IOException { Storage.Objects.Get getObject = client.objects().get(bucket, key); return getObject.executeMediaAsInputStream(); } } } catch (RetryGiveupException ex) { … } catch (InterruptedException ex) {}
Fail
Retry with using Exponential Backoff
![Page 10: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/10.jpg)
Java or JRuby ? Embulk support both of Java and JRuby based plugin
Java based plugin
JRuby based plugin
High performance Filter / Parser / Formatter / Encoder / Decoder plugin
These plugin need high performance Some enterprise service/software support provides Java SDK.
write with Java7(MapReduce Executor needs Java7)
Easy to write Network is bottleneck ( like cloud service).
![Page 11: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/11.jpg)
Exception handling to avoid infinite retryConfigException
DataException
transaction method should validate all config values should throw ConfigException or its subclass when validation fails
public ConfigDiff transaction(ConfigSource config, FileInputPlugin.Control control) { … if (task.getFiles().isEmpty()) { throw new ConfigException(“File is empty”); } }
… } catch (CsvTokenizer.InvalidFormatException | CsvTokenizer.InvalidValueException … e) { if (stopOnInvalidRecord) { throw new DataException(“Invalid record”); // throw Exception if stopOnInvalidRecord : true } log.warn(“Invalid record”); // show warnings if stopOnInvalidRecord : false }
should throw DataException or its subclass when it finds an invalid record
![Page 12: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/12.jpg)
Battle with external service’s specs
Azure Blob Storage
Google Cloud Storage
AWS S3
String path = "/path/to/file";
String str = String.format("%06d", path.length()) + "!" + path + "!" + "000028" + "!" + "9999-12-31T23:59:59.9999999Z" + "!"; String encodedString = BaseEncoding.base64().encode(str); String nextToken = "2" + "!" + encodedString.length + "!" + encodedString;
String path = "/path/to/file"; // use path string as next token
String path = "/path/to/file"; byte[] encoding; byte[] utf8 = path.getBytes(Charsets.UTF_8);
encoding = new byte[utf8.length + 2]; encoding[0] = 0x0a; encoding[1] = new Byte(String.valueOf(path.length())); System.arraycopy(utf8, 0, encoding, 2, utf8.length);
String nextToken = BaseEncoding.base64().encode(encoding);
Example to get next token for object storage. next token : next start point while getting file list stored at bucket or container.
![Page 13: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/13.jpg)
Write unit testWe need 80% coverage to use at our platform. But difficult to write test for embulk plugin😞
SFTP : Create Java based virtual SFTP server at local machine.
DynamoDB : AWS provides downloadable version of DynamoDB.
Filter/Parser/Formatter/Encoder/Decoder plugin
80% coverage is difficult without connect to serviceSet confidential at environmental variables.
Use “Encryption keys” and “Encryption files” at Travis CI. Connect to remote service for each running test
Unit test without remote connection I’ve ever seen
![Page 14: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/14.jpg)
Use embulk at Treasure Data
![Page 15: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/15.jpg)
Architecture of Treasure Data
Load Balancer
TD API(API Servers)Web Console
td commands
Response
Response
Request
Request
Bulkload API (API Servers)
Perfect Queue
TD worker (worker process)
enqueue
dequeue
Submit Job (Retry if need)
Execute with MR / Local Executor
guess/preview MySQL
![Page 16: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/16.jpg)
TD API / Bulkload API
TD API(API Servers)
Bulkload API(API Servers)
guess/preview is processed at different API Servers.
ResponseRequest
guess/preview
data importPerfect Queue
Load Balancer
QueuingHttp Request/Responseguess/preview needs quick response
enqueue
![Page 17: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/17.jpg)
Comes huge data
Embulk Config with thousands of columns
Huge data
Need enough validation at transaction method Return clear error or warning messages at plugin
Retry logic of plugin is important Retry if retryable exception happens use MapReduce Executor
Reduce usage dirrerence at each instance.
![Page 18: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/18.jpg)
Write integration testWrite integration for each connector(result output) with RSpec
td connector:guess(embulk guess) works? td connector:preview(embulk preview) works? td connector:issue(embulk run) works expectedly?
works with LocalExecutor? works with MapReduce Executor?
works with filter plugin? scheduled execution works expectedly?
for each servicemany test cases ×
![Page 19: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/19.jpg)
Want to improve…
Target service is timeout 😞
Target service returns 50x error 😞
API limit exceeded 😞
CI failure
Long execution time
for each servicemany test cases ×
![Page 20: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/20.jpg)
Want to implement…API endpoint is not enough
guess
preview
issue(run)
GUI console
CUI
Unclear until user run jobs( or guess or preview) and plugin return result or ConfigException.
Username and Password is valid?
![Page 21: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/21.jpg)
Want to implement…
input
host port username password
valid?
new endpoint
GUI console
Validate before execute jobs
improve user experience reduce jobs at our platform
![Page 22: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/22.jpg)
Infrastructure ManagementChef
Monitoring
Datadog
Server configuration
More reliability with MapReduce Executor
incident resolution
PagerDuty
Chef requires not a few time to build server.
![Page 23: Our challenge for Bulkload reliability improvement](https://reader031.vdocuments.us/reader031/viewer/2022020119/587fe13d1a28ab46228b4657/html5/thumbnails/23.jpg)
Thank you!