2019 International Data Science Competition Featured

Rate this item
(4 votes)

We are pleased to cordially invite junior Data Scientists to attend the First International Data Science Competition, from March 15 to June 1, 2019, organized by DataAI@SG and Global Vietnamese Data Scientists Facebook Groups, faculty members across universities in USA and Asia, industry Data Scientists, The International Society of Data Scientists/Scriptedin Corporation. The competition creates a playground for young Data Scientists to grow skills, knowledge, experience, and collaboration for their career path.

 

Competition Committee Composition

VINBDI

Dr. Van Vu, Percey F. Smith Professor of Mathematics, Yale University, CT, USA

&  VinGroup Big Data Institute Director

Dr. Christopher Do, President of the International Society of Data Scientists, CT, USA

Dr. Nguyet Nguyen, Assistant Professor, Youngstown State University, OH, USA

Dr. Linh Nguyen, Associate Professor, University of Idaho, ID, USA

vinid member of vingroup logo

Dr. Nam Nguyen, Data Analytics Scientist, Schlumberger, TX, USA

Mr. Tinh H. Nguyen, Faculty of Information Technology, Industrial University of Ho Chi Minh City, Vietnam.

Dr. Hung Ta, Head of Mathematics and Computer Science, Hanoi University, Vietnam

Mr. Cory Wang, Chief Operating Officer, ScriptedIn Inc., USA

Dr. Vinh Nguyen, Senior Manager of Operations Research, Marriott International

Fulbright University Seal TRANSPARENT FE2h6sl

Dr. Tu-Anh Vu-Thanh, Dean of Fulbright School of Public Policy and Management, Fulbright University Vietnam, Vietnam

Dr. Hien Nguyen, Assistant Professor, University of Houston, TX, USA

 

Dataset Committee

fptMs. Phuong N. Y. Le, Data Scientist, Vietnam (Dataset Team)

Dr. Vinh Dang, Data Scientist, Vietnam (Dataset Team)

Dr. Phu Vu, Data Scientist, Vietnam (Dataset Team)

Mr. Yuan Xie, Data Scientist, DHL, China (Dataset Team)

Mr. Vy Bui, Data Scientist, Vietnam (Dataset Team)

isods

Mr. Khanh Tran, Data Scientist, FPT Telecom, Vietnam (Dataset Team)

 

University Representatives

ScriptedIn Corporation

Dr. Bao The Pham, Associate Professor, Saigon University (SGU)

Dr. Bay Dinh Vo, Associate Professor, HUTECH University of Technology

Dr. Loan T. T. Nguyen,  International University, VNU-HCM

Mr. Hien D. Nguyen, University of Information Technology, VNU-HCM

 

Industry Committee

Mr. Thai Nguyen, Project Director at FPT USA Corp, WA, USA

(*)

 

Prizes and Sponsorship

Prizes include:

First Place 

Second Place

Third Place

Winning Teams receive winner certificates with the names of all the committee members and organizers. 

There are sponsorships from the following sponsors:

Vingroup VinID sponsors internship positions to winners (Ms. Linh Pham, Lead Technical Recruiter, 0972702047, This email address is being protected from spambots. You need JavaScript enabled to view it.

Fulbright School of Public Policy and Management sponsors internships, RA in AI and Big Data Projects (Dr. Tu-Anh Vu-Thanh, Dean of Fulbright School of Public Policy and Management)

FPT internships (Mr. Thai Nguyen, FPT USA)

(*) The committee members, organizers, and ISODS/Scriptedin are not responsible for any award promised by any sponsor.

 

 Problem & Datasets

The data is collected from the Forex Market by Yuan Xie - a DHL Data Scientist. You are tasked to predict the future. Our hunting target is the most liquid (most traded) currency in the world -- EUR/USD, and fortunately you only need to prediction the Up or Down of this currency pair and we provide you with data containing information from 2008-01-01 to 2018-03-19, which was from www.dukascopy.com Historical Data Feed.

Given historical currency performance and a lot of pricing features and most basic knowledge about market hours, can you predict the up and down of that day without being deceived by all the noise? And Forex market comes different from stock market for its unique global market. It may stay at a price for a while without a single trade for several minutes or even hours and then move dramatically as people starting to trade it more frequently.

In our dataset, we collected 5-min Bid price of EUR/USD from 2008-01-01 to 2018-03-19, and each 5-min price comes with over 200 features containing pricing, volatility and volume information of different kinds. And to help you have a better understanding of the dataset and to just try out some experiments, we also provided a much smaller subset of data.

The list of fields is below:

Gmt time: timestamp, marked as the starting time of that 5 min period;
Open: Open price of that 5 min period;
High: High price of that 5 min period;
Low: Low price of that 5 min period;
Close: Close price of that 5 min period;
Volume: trading volume of that 5 min period, millions;
Body: the length of body of candlestick plot;
Upper_tail: the upper tail of candlestick plot;
Lower_tail: the lower tail of candlestick plot;
SMA_50: simple moving average of last 50 time periods;
SMA_20: simple moving average of last 20 time periods;
ATR: technical indicator ATR of last 50 time periods, a measure of volatility;
CCI: technical indicator CCI of last 20 time periods;
SAR: technical indicator SAR, a measure of trend;
Hour: which hour the data was collected;
min: which minute the data was collected;
Dayofweek: the day of week the data was collected;
JPY: if JPY started active trading;
AUD: if AUD started active trading;
EUR: if EUR started active trading;
GBP: if GBP started active trading;
USD: if USD started active trading;
return_1return_2,..., return_96: currency return during last 5 mins, 10 mins, 15 mins,…, 8 hours;
lag_return_1lag_return_2, …, lag_return_96: currency return before 5 mins, 10 mins, 15 mins,…, 8 hours;

 

Evaluation

Once the model can score a test data set, the result set should be submitted to the project/contest by selecting the Add a Submission button and upload the result.

The result will be scored and ranked based on the accuracy of the test result, which is the number of correct classified instances divided by the total number of instances.

Participants should predict labels using the public test dataset. Results can be submitted and scored unlimited times. The current ranking between submitters can be seen by clicking the Leaderboad button. Previous submission results can be seen by clicking the View Submissions button.

The format of the result CSV should be 2 columns. The number of instances must be the same as that in the test dataset. The first column is the row number (1, 2, 3, etc.). The second column is the predicted label (like 1, 0). The first row is always the column headers (such as num and label). See the example below:

num, label

1,        0

2,        0

3,        1

4         1

5         0

 

Rules

The competition is from March 15 to June 1, 2019. Each team may have up to 3 members. It is possible that multiple teams are from the same university. Team members should be either current students of a university by the time joining the competition, or have graduated within 5 years. Each team may have at most 1 member from a different university. Compliance to the rules is checked only if a team wins.

If you are the team lead, you need to create a team using the Create Team button, and invite other team members to your team using Invite Team button. Update your university information using the Update Info button.

Each team may choose to invite a coach, who may be a data scientist, a university professor, etc.

Each university may have a representative. However a team can participate without a university representative.

By the time of the deadline, each team must (1) submit a CSV file of predicted labels and get scored and ranked, (2) share a report/article with details using Write Article button (code included), and (3) optionally a notebook using Share Notebook button. Currenly only Jupyter Python notebooks are supported on the platform. 

The committee determines the winners by running the final version of code submitted by each team before the deadline on a private test set outside the platform. Results will be announced once the test is completed. Winning teams are encouraged to submit a demo, which should include proofs for compliance to the competition rules. The demo will be recorded as a video and posted on Youtube. The winning teams should record the videos within 1 week after the competition finishes, and post the videos on ISODS Q&A forum. The demo videos are going to be included in a summary of the competitition. 

The selected language is Python or R; Participants use data up to time t-1 to predict up/down movement at time t. Code will be included and checked when the competition finishes.

Please login the competition platform to download the datasets:

https://www.scriptedin.com/contests/view/25

 

Disclaimers

All competition participants are required to read the disclaimers

https://www.isods.org/?id=30

 

Questions and Answers

Please join the Q&A to post questions, and get them answered, or to help answer others' questions.

Social activities via DataAI@SG, Global Vietnamese Data Scientists Facebook Groups, International Data Science Competition Facebook Page.

See our Privacy Policy

https://www.isods.org/?id=29

 

Getting Involved

How to get involved? Please join the Get Involved section in the Questions and Answers menu. This section is for ones who are interested in being team advisors; for companies which may contribute insights, advisors, and sponsorships

(*) All the committee positions are voluntary.

 

 

Please publish modules in offcanvas position.